这篇文章继续介绍4个窗口函数。
lag
lag(column_name,n,default):用于统计窗口内往上第n行的值,第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上n行为null时,取默认值,若不指定,则为null)
lead
lead与lag想法,lead(column_name,n,default)用于统计窗口内向下取n行的值
first_value
first_value() 取分组排序后,截止到当前行的第一个值
last_value
last_value() 取分组排序后,截止到当前行的最后一个值
下面通过具体的实例介绍它们的用法
create table if not exists buy_info ( name string, buy_date string, buy_num int ) row format delimited fields terminated by ‘|‘; select * from buy_info;
name | buy_date | buy_num |
zhangsan | 2020-02-23 | 21 |
zhangsan | 2020-03-12 | 34 |
zhangsan | 2020-04-15 | 12 |
zhangsan | 2020-05-12 | 51 |
lisi | 2020-03-16 | 12 |
lisi | 2020-03-21 | 24 |
lisi | 2020-07-12 | 41 |
lisi | 2020-07-27 |
32 |
select name , buy_date,buy_num,
lag(buy_date,1,‘1970-01-01‘) over(partition by name order by buy_date) as last_date,
lead(buy_date,1,‘2020-12-31‘) over(partition by name order by buy_date) as next_date,
first_value () over(partition by name order by buy_date) as first_date,
last_value() over(partition by name order by buy_date) as last_date from buy_info;
查询结果如下
name | buy_date | buy_num | last_date | next_date | first_date | last_date |
zhangsan | 2020-02-23 | 21 | 1970-01-01 | 2020-03-12 | 2020-02-23 | 2020-05-12 |
zhangsan | 2020-03-12 | 34 | 2020-02-23 | 2020-04-15 | 2020-02-23 | 2020-05-12 |
zhangsan | 2020-04-15 | 12 | 2020-03-12 | 2020-05-12 | 2020-02-23 | 2020-05-12 |
zhangsan | 2020-05-12 | 51 | 2020-04-15 | 2020-12-31 | 2020-02-23 | 2020-05-12 |
lisi | 2020-03-16 | 12 | 1970-01-01 | 2020-03-21 | 2020-03-16 | 2020-07-27 |
lisi | 2020-03-21 | 24 | 2020-03-16 | 2020-07-12 | 2020-03-16 | 2020-07-27 |
lisi | 2020-07-12 | 41 | 2020-03-21 | 2020-07-27 | 2020-03-16 | 2020-07-27 |
lisi | 2020-07-27 | 32 | 2020-07-12 | 2020-12-31 | 2020-03-16 | 2020-07-27 |
Hive 窗口函数详细介绍3 lag,lead,first_value,last_value
原文:https://www.cnblogs.com/itachilearner/p/13392469.html