1)为什么要建宽表
需求目标,把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后进行,不同角度的统计分析
hive (gmall)> drop table if exists dws_user_action; create external table dws_user_action ( user_id string comment ‘用户 id‘, order_count bigint comment ‘下单次数 ‘, order_amount decimal(16,2) comment ‘下单金额 ‘, payment_count bigint comment ‘支付次数‘, payment_amount decimal(16,2) comment ‘支付金额 ‘, comment_count bigint comment ‘评论次数‘ ) COMMENT ‘每日用户行为宽表‘ PARTITIONED BY (`dt` string) stored as parquet location ‘/warehouse/gmall/dws/dws_user_action/‘;
1)导入数据
hive (gmall)> with tmp_order as ( select user_id, count(*) order_count, sum(oi.total_amount) order_amount from dwd_order_info oi where date_format(oi.create_time,‘yyyy-MM-dd‘)=‘2019-02-10‘ group by user_id ) , tmp_payment as ( select user_id, sum(pi.total_amount) payment_amount, count(*) payment_count from dwd_payment_info pi where date_format(pi.payment_time,‘yyyy-MM-dd‘)=‘2019-02-10‘ group by user_id ), tmp_comment as ( select user_id, count(*) comment_count from dwd_comment_log c where date_format(c.dt,‘yyyy-MM-dd‘)=‘2019-02-10‘ group by user_id ) insert overwrite table dws_user_action partition(dt=‘2019-02-10‘) select user_actions.user_id, sum(user_actions.order_count), sum(user_actions.order_amount), sum(user_actions.payment_count), sum(user_actions.payment_amount), sum(user_actions.comment_count) from ( select user_id, order_count, order_amount, 0 payment_count, 0 payment_amount, 0 comment_count from tmp_order union all select user_id, 0, 0, payment_count, payment_amount, 0 from tmp_payment union all select user_id, 0, 0, 0, 0, comment_count from tmp_comment ) user_actions group by user_id;
2)查询导入结果
hive (gmall)> select * from dws_user_action;
1)在/home/atguigu/bin目录下创建脚本dws_db_wide.sh
[atguigu@hadoop102 bin]$ vim dws_db_wide.sh
在脚本中填写如下内容
#!/bin/bash
# 定义变量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
with
tmp_order as
(
select
user_id,
sum(oi.total_amount) order_amount,
count(*) order_count
from "$APP".dwd_order_info oi
where date_format(oi.create_time,‘yyyy-MM-dd‘)=‘$do_date‘
group by user_id
) ,
tmp_payment as
(
select
user_id,
sum(pi.total_amount) payment_amount,
count(*) payment_count
from "$APP".dwd_payment_info pi
where date_format(pi.payment_time,‘yyyy-MM-dd‘)=‘$do_date‘
group by user_id
),
tmp_comment as
(
select
user_id,
count(*) comment_count
from "$APP".dwd_comment_log c
where date_format(c.dt,‘yyyy-MM-dd‘)=‘$do_date‘
group by user_id
)
Insert overwrite table "$APP".dws_user_action partition(dt=‘$do_date‘)
select
user_actions.user_id,
sum(user_actions.order_count),
sum(user_actions.order_amount),
sum(user_actions.payment_count),
sum(user_actions.payment_amount),
sum(user_actions.comment_count)
from
(
select
user_id,
order_count,
order_amount,
0 payment_count,
0 payment_amount,
0 comment_count
from tmp_order
union all
select
user_id,
0,
0,
payment_count,
payment_amount,
0
from tmp_payment
union all
select
user_id,
0,
0,
0,
0,
comment_count
from tmp_comment
) user_actions
group by user_id;
"
$hive -e "$sql"
2)增加脚本执行权限
[atguigu@hadoop102 bin]$ chmod 777 dws_db_wide.sh
3)执行脚本导入数据
[atguigu@hadoop102 bin]$ dws_db_wide.sh 2019-02-11
4)查看导入数据
hive (gmall)>
select * from dws_user_action where dt=‘2019-02-11‘ limit 2;
大数据实战(五十一):电商数仓(三十四)之系统业务数据仓库(七)数仓搭建(五)DWS层之用户行为宽表
原文:https://www.cnblogs.com/qiu-hua/p/13547364.html