首页 > 其他 > 详细

大数据实战(五十一):电商数仓(三十四)之系统业务数据仓库(七)数仓搭建(五)DWS层之用户行为宽表

时间:2020-08-22 23:02:02      阅读:183      评论:0      收藏:0      [点我收藏+]

1)为什么要建宽表

需求目标,把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后进行,不同角度的统计分析

1 创建用户行为宽表

 

hive (gmall)>
drop table if exists dws_user_action;
create external table dws_user_action 
(   
    user_id          string      comment 用户 id,
    order_count     bigint      comment 下单次数 ,
    order_amount    decimal(16,2)  comment 下单金额 ,
    payment_count   bigint      comment 支付次数,
    payment_amount  decimal(16,2) comment 支付金额 ,
    comment_count   bigint      comment 评论次数
) COMMENT 每日用户行为宽表
PARTITIONED BY (`dt` string)
stored as parquet
location /warehouse/gmall/dws/dws_user_action/;

 

 

2 向用户行为宽表导入数据

 

1导入数据

 

技术分享图片
hive (gmall)>
with 
tmp_order as
(
    select 
        user_id, 
count(*)  order_count,
        sum(oi.total_amount) order_amount
    from dwd_order_info oi
    where date_format(oi.create_time,yyyy-MM-dd)=2019-02-10
    group by user_id
) ,
tmp_payment as
(
    select
        user_id, 
        sum(pi.total_amount) payment_amount, 
        count(*) payment_count 
    from dwd_payment_info pi 
    where date_format(pi.payment_time,yyyy-MM-dd)=2019-02-10
    group by user_id
),
tmp_comment as
(
    select
        user_id,
        count(*) comment_count
    from dwd_comment_log c
    where date_format(c.dt,yyyy-MM-dd)=2019-02-10
    group by user_id
)

insert overwrite table dws_user_action partition(dt=2019-02-10)
select
    user_actions.user_id,
    sum(user_actions.order_count),
    sum(user_actions.order_amount),
    sum(user_actions.payment_count),
    sum(user_actions.payment_amount),
    sum(user_actions.comment_count)
from 
(
    select
        user_id,
        order_count,
        order_amount,
        0 payment_count,
        0 payment_amount,
        0 comment_count
    from tmp_order

    union all
    select
        user_id,
        0,
        0,
        payment_count,
        payment_amount,
        0
    from tmp_payment

    union all
    select
        user_id,
        0,
        0,
        0,
        0,
        comment_count
    from tmp_comment
 ) user_actions
group by user_id;
View Code

 

2)查询导入结果

hive (gmall)> select * from dws_user_action;

 

 

 

3 用户行为数据宽表导入脚本

 

1)在/home/atguigu/bin目录下创建脚本dws_db_wide.sh

[atguigu@hadoop102 bin]$ vim dws_db_wide.sh

脚本中填写如下内容

#!/bin/bash

 

# 定义变量方便修改

APP=gmall

hive=/opt/module/hive/bin/hive

 

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n "$1" ] ;then

do_date=$1

else

do_date=`date -d "-1 day" +%F`  

fi

 

sql="

 

with  

tmp_order as

(

    select

        user_id,

        sum(oi.total_amount) order_amount,

        count(*)  order_count

    from "$APP".dwd_order_info  oi

    where date_format(oi.create_time,‘yyyy-MM-dd‘)=‘$do_date‘

    group by user_id

)  ,

tmp_payment as

(

    select

        user_id,

        sum(pi.total_amount) payment_amount,

        count(*) payment_count

    from "$APP".dwd_payment_info pi

    where date_format(pi.payment_time,‘yyyy-MM-dd‘)=‘$do_date‘

    group by user_id

),

tmp_comment as

(  

    select  

        user_id,

        count(*) comment_count

    from "$APP".dwd_comment_log c

    where date_format(c.dt,‘yyyy-MM-dd‘)=‘$do_date‘

    group by user_id

)

 

Insert overwrite table "$APP".dws_user_action partition(dt=‘$do_date‘)

select

    user_actions.user_id,

    sum(user_actions.order_count),

    sum(user_actions.order_amount),

    sum(user_actions.payment_count),

    sum(user_actions.payment_amount),

    sum(user_actions.comment_count)

from

(

    select

        user_id,

        order_count,

        order_amount,

        0 payment_count,

        0 payment_amount,

        0 comment_count

    from tmp_order

 

    union all

    select

        user_id,

        0,

        0,

        payment_count,

        payment_amount,

        0

    from tmp_payment

 

    union all

    select

        user_id,

        0,

        0,

        0,

        0,

        comment_count

    from tmp_comment

 ) user_actions

group by user_id;

"

 

$hive -e "$sql"

2)增加脚本执行权限

[atguigu@hadoop102 bin]$ chmod 777 dws_db_wide.sh

3)执行脚本导入数据

[atguigu@hadoop102 bin]$ dws_db_wide.sh 2019-02-11

4)查看导入数据

hive (gmall)> 

select * from dws_user_action where dt=‘2019-02-11‘ limit 2;

 

大数据实战(五十一):电商数仓(三十四)之系统业务数据仓库(七)数仓搭建(五)DWS层之用户行为宽表

原文:https://www.cnblogs.com/qiu-hua/p/13547364.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!