首页 > 其他 > 详细

数仓1.1 分层| ODS& DWD层

时间:2019-03-24 10:28:49      阅读:230      评论:0      收藏:0      [点我收藏+]

数仓分层

ODS:Operation Data Store
原始数据

DWD(数据清洗/DWI) data warehouse detail
数据明细详情,去除空值,脏数据,超过极限范围的
明细解析
具体表

DWS(宽表-用户行为,轻度聚合) data warehouse service
服务层--留存-转化-GMV-复购率-日活
点赞、评论、收藏;
轻度聚合对DWD

ADS(APP/DAL/DF)-出报表结果 Application Data Store
做分析处理同步到RDS数据库里边

数据集市:狭义ADS层; 广义上指DWD DWS ADS 从hadoop同步到RDS的数据

数仓搭建之ODS & DWD

1)创建gmall数据库

hive (default)> create database gmall;

说明:如果数据库存在且有数据,需要强制删除时执行:drop database gmall cascade;

2)使用gmall数据库

hive (default)> use gmall;

1. ODS层

原始数据层,存放原始数据,直接加载原始日志、数据,数据保持原貌不做处理。

① 创建启动日志表ods_start_log

1)创建输入数据是lzo输出是text,支持json解析的分区表

hive (gmall)> 
drop table if exists ods_start_log;
CREATE EXTERNAL TABLE ods_start_log (`line` string)
PARTITIONED BY (`dt` string)
STORED AS
INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
LOCATION /warehouse/gmall/ods/ods_start_log;

Hive的LZO压缩:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO

加载数据;

时间格式都配置成YYYY-MM-DD格式,这是Hive默认支持的时间格式

hive (gmall)> load data inpath /origin_data/gmall/log/topic_start/2019-02-10 into table gmall.ods_start_log partition(dt="2019-02-10");
hive (gmall)> select * from ods_start_log limit 2;

② 创建事件日志表ods_event_log

创建输入数据是lzo输出是text,支持json解析的分区表

drop table if exists ods_event_log;
create external table ods_event_log
(`line` string) 
partitioned by (`dt` string)
stored as
INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location /warehouse/gmall/ods/ods_event_log;

hive (gmall)> load data inpath /origin_data/gmall/log/topic_event/2019-02-10 into table gmall.ods_event_log partition(dt="2019-02-10");

ODS层加载数据的脚本

1)在hadoop101的/home/kris/bin目录下创建脚本

[kris@hadoop101 bin]$ vim ods_log.sh

#!/bin/bash

# 定义变量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
 do_date=$1
else 
 do_date=`date -d "-1 day" +%F`  
fi 

echo "===日志日期为 $do_date==="
sql="
load data inpath /origin_data/gmall/log/topic_start/$do_date into table "$APP".ods_start_log partition(dt=$do_date);
load data inpath /origin_data/gmall/log/topic_event/$do_date into table "$APP".ods_event_log partition(dt=$do_date);
"

$hive -e "$sql"

[ -n 变量值 ] 判断变量的值,是否为空

-- 变量的值,非空,返回true

-- 变量的值,为空,返回false

查看date命令的使用,[kris@hadoop101  ~]$ date --help

增加脚本执行权限
[kris@hadoop101 bin]$ chmod 777 ods_log.sh
脚本使用
[kris@hadoop101 module]$ ods_log.sh 2019-02-11
查看导入数据
hive (gmall)> 
select * from ods_start_log where dt=2019-02-11 limit 2;
select * from ods_event_log where dt=2019-02-11 limit 2;
脚本执行时间
企业开发中一般在每日凌晨30分~1点

 

2. DWD层数据解析

对ODS层数据进行清洗(去除空值,脏数据,超过极限范围的数据,行式存储改为列存储,改压缩格式)

DWD解析过程,临时过程,两个临时表: dwd_base_event_log、dwd_base_start_log

建12张表外部表: 以日期分区,dwd_base_event_log在这张表中根据event_name将event_json中的字段通过get_json_object函数一个个解析开来;

DWD层创建基础明细表

明细表用于存储ODS层原始表转换过来的明细数据。

1) 创建启动日志基础明细表:

drop table if exists dwd_base_start_log;
create external table dwd_base_start_log(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string, 
`app_time` string, 
`network` string, 
`lng` string, 
`lat` string, 
`event_name` string, 
`event_json` string, 
`server_time` string)
partitioned by(`dt` string)
stored as parquet
location "/warehouse/gmall/dwd/dwd_base_start_log"

其中event_name和event_json用来对应事件名和整个事件。这个地方将原始日志1对多的形式拆分出来了。操作的时候我们需要将原始日志展平,需要用到UDF和UDTF。

2)创建事件日志基础明细表

drop table if exists dwd_base_event_log;
create external table dwd_base_event_log(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string, 
`app_time` string, 
`network` string, 
`lng` string, 
`lat` string, 
`event_name` string, 
`event_json` string, 
`server_time` string)
partitioned by(`dt` string)
stored as parquet
location "/warehouse/gmall/dwd/dwd_base_event_log"

 

自定义UDF函数(解析公共字段)

UDF:解析公共字段 + 事件et(json数组)+ 时间戳

自定义UDTF函数(解析具体事件字段) process 1进多出(可支持多进多出)

 UDTF:对传入的事件et(json数组)-->返回event_name| event_json(取出事件et里边的每个具体事件--json_Array)

解析启动日志基础明细表

将jar包添加到Hive的classpath

创建临时函数与开发好的java class关联

hive (gmall)> add jar /opt/module/hive/hivefunction-1.0-SNAPSHOT.jar;
hive (gmall)> create temporary function base_analizer as "com.atguigu.udf.BaseFieldUDF";
hive (gmall)> create temporary function flat_analizer as "com.atguigu.udtf.EventJsonUDTF";
hive (gmall)> set hive.exec.dynamic.partition.mode=nonstrict;

 

1)解析启动日志基础明细表

insert overwrite table dwd_base_start_log
partition(dt)
select mid_id,user_id,version_code,version_name,lang,source,os,area,model,brand,sdk_version,gmail,height_width,app_time,network,
lng,lat,event_name, event_json,server_time,dt from(
select split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[0] as mid_id,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[1]   as user_id,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[2]   as version_code,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[3]   as version_name,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[4]   as lang,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[5]   as source,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[6]   as os,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[7]   as area,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[8]   as model,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[9]   as brand,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[10]   as sdk_version,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[11]  as gmail,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[12]  as height_width,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[13]  as app_time,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[14]  as network,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[15]  as lng,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[16]  as lat,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[17]  as ops,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[18]  as server_time,
dt 
from ods_start_log where dt=2019-02-10 and base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la)<>‘‘ 
) sdk_log lateral view flat_analizer(ops) tmp_k as event_name, event_json;

 

将ops lateral view 成event_name和event_json;
+-------------+-------------------------------------------------------------------------------------------------------------------------+----------------+--+
| event_name  |                                                       event_json                                                        |  server_time   |
+-------------+-------------------------------------------------------------------------------------------------------------------------+----------------+--+
| start       | {"ett":"1549683362200","en":"start","kv":{"entry":"5","loading_time":"4","action":"1","open_ad_type":"1","detail":""}}  | 1549728087940  |
+-------------+-------------------------------------------------------------------------------------------------------------------------+----------------+--+

 

解析事件日志基础明细表

1)解析事件日志基础明细表

insert overwrite table dwd_base_event_log
partition(dt=2019-02-10)
select mid_id,user_id,version_code,version_name,lang,source,os,area,model,brand,sdk_version,gmail,height_width,app_time,network,
lng,lat,event_name, event_json,server_time from(
select split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[0] as mid_id,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[1]   as user_id,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[2]   as version_code,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[3]   as version_name,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[4]   as lang,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[5]   as source,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[6]   as os,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[7]   as area,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[8]   as model,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[9]   as brand,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[10]   as sdk_version,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[11]  as gmail,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[12]  as height_width,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[13]  as app_time,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[14]  as network,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[15]  as lng,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[16]  as lat,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[17]  as ops,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[18]  as server_time
from ods_event_log where dt=2019-02-10 and base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la)<>‘‘ 
) sdk_log lateral view flat_analizer(ops) tmp_k as event_name, event_json;
测试
hive (gmall)> select * from dwd_base_event_log limit 2;

 

DWD层加载数据脚本

1)在hadoop101的/home/kris/bin目录下创建脚本

[kris@hadoop101 bin]$ vim dwd.sh

技术分享图片
#编写脚本:
#!/bin/bash
APP=gmall
hive=/opt/module/hive/bin/hive
if [ -n "$1" ] ;then
    do_date=$1
else
    do_date=`date -d "-1 day" +%F`
fi
sql="
add jar /opt/module/hive/hivefunction-1.0-SNAPSHOT.jar;
create temporary function base_analizer as com.atguigu.udf.BaseFieldUDF;
create temporary function flat_analizer as com.atguigu.udtf.EventJsonUDTF;
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table "$APP".dwd_base_start_log
partition(dt)
select mid_id,user_id,version_code,version_name,lang,source,os,area,model,brand,sdk_version,gmail,height_width,app_time,network,
lng,lat,event_name, event_json,server_time,dt from(
select split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[0] as mid_id,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[1]   as user_id,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[2]   as version_code,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[3]   as version_name,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[4]   as lang,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[5]   as source,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[6]   as os,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[7]   as area,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[8]   as model,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[9]   as brand,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[10]   as sdk_version,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[11]  as gmail,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[12]  as height_width,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[13]  as app_time,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[14]  as network,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[15]  as lng,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[16]  as lat,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[17]  as ops,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[18]  as server_time,
dt 
from "$APP".ods_start_log where dt=$do_date and base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la)<>‘‘ 
) sdk_log lateral view flat_analizer(ops) tmp_k as event_name, event_json;

insert overwrite table "$APP".dwd_base_event_log
partition(dt=$do_date)
select mid_id,user_id,version_code,version_name,lang,source,os,area,model,brand,sdk_version,gmail,height_width,app_time,network,
lng,lat,event_name, event_json,server_time from(
select split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[0] as mid_id,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[1]   as user_id,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[2]   as version_code,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[3]   as version_name,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[4]   as lang,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[5]   as source,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[6]   as os,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[7]   as area,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[8]   as model,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[9]   as brand,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[10]   as sdk_version,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[11]  as gmail,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[12]  as height_width,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[13]  as app_time,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[14]  as network,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[15]  as lng,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[16]  as lat,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[17]  as ops,
split(base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la),\t)[18]  as server_time
from "$APP".ods_event_log where dt=$do_date and base_analizer(line,mid,uid,vc,vn,l,sr,os,ar,md,ba,sv,g,hw,t,nw,ln,la)<>‘‘ 
) sdk_log lateral view flat_analizer(ops) tmp_k as event_name, event_json;
"
$hive -e "$sql"
View Code
[kris@hadoop101 bin]$ chmod +x dwd_base.sh 
[kris@hadoop101 bin]$ dwd_base.sh 2019-02-11
查询导入结果
hive (gmall)> 
select * from dwd_start_log where dt=2019-02-11 limit 2;
select * from dwd_comment_log where dt=2019-02-11 limit 2;
脚本执行时间
企业开发中一般在每日凌晨30分~1点

3. DWD层

 1) 商品点击表

技术分享图片
建表
hive (gmall)> 
drop table if exists dwd_display_log;
CREATE EXTERNAL TABLE dwd_display_log(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string, 
`app_time` string, 
`network` string, 
`lng` string, 
`lat` string, 
`action` string,
`newsid` string,
`place` string,
`extend1` string,
`category` string,
`server_time` string
)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_display_log/;
导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_display_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.action) action,
get_json_object(event_json,$.kv.newsid) newsid,
get_json_object(event_json,$.kv.place) place,
get_json_object(event_json,$.kv.extend1) extend1,
get_json_object(event_json,$.kv.category) category,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=display;
测试
hive (gmall)> select * from dwd_display_log limit 2;
View Code

2 )商品详情页表

技术分享图片
建表语句
hive (gmall)> 
drop table if exists dwd_newsdetail_log;
CREATE EXTERNAL TABLE `dwd_newsdetail_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string, 
`app_time` string,  
`network` string, 
`lng` string, 
`lat` string, 
`entry` string,
`action` string,
`newsid` string,
`showtype` string,
`news_staytime` string,
`loading_time` string,
`type1` string,
`category` string,
`server_time` string)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_newsdetail_log/;
导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_newsdetail_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.entry) entry,
get_json_object(event_json,$.kv.action) action,
get_json_object(event_json,$.kv.newsid) newsid,
get_json_object(event_json,$.kv.showtype) showtype,
get_json_object(event_json,$.kv.news_staytime) news_staytime,
get_json_object(event_json,$.kv.loading_time) loading_time,
get_json_object(event_json,$.kv.type1) type1,
get_json_object(event_json,$.kv.category) category,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=newsdetail;
测试
hive (gmall)> select * from dwd_newsdetail_log limit 2;
View Code

3 )商品列表页表

技术分享图片
建表语句
hive (gmall)> 
drop table if exists dwd_loading_log;
CREATE EXTERNAL TABLE `dwd_loading_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string,
`height_width` string,  
`app_time` string,
`network` string, 
`lng` string, 
`lat` string, 
`action` string,
`loading_time` string,
`loading_way` string,
`extend1` string,
`extend2` string,
`type` string,
`type1` string,
`server_time` string)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_loading_log/;
导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_loading_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.action) action,
get_json_object(event_json,$.kv.loading_time) loading_time,
get_json_object(event_json,$.kv.loading_way) loading_way,
get_json_object(event_json,$.kv.extend1) extend1,
get_json_object(event_json,$.kv.extend2) extend2,
get_json_object(event_json,$.kv.type) type,
get_json_object(event_json,$.kv.type1) type1,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=loading;
测试
hive (gmall)> select * from dwd_loading_log limit 2;
View Code

4 广告表

技术分享图片
建表语句
hive (gmall)> 
drop table if exists dwd_ad_log;
CREATE EXTERNAL TABLE `dwd_ad_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string,  
`app_time` string,
`network` string, 
`lng` string, 
`lat` string, 
`entry` string,
`action` string,
`content` string,
`detail` string,
`ad_source` string,
`behavior` string,
`newstype` string,
`show_style` string,
`server_time` string)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_ad_log/;
导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_ad_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.entry) entry,
get_json_object(event_json,$.kv.action) action,
get_json_object(event_json,$.kv.content) content,
get_json_object(event_json,$.kv.detail) detail,
get_json_object(event_json,$.kv.source) ad_source,
get_json_object(event_json,$.kv.behavior) behavior,
get_json_object(event_json,$.kv.newstype) newstype,
get_json_object(event_json,$.kv.show_style) show_style,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=ad;
测试
hive (gmall)> select * from dwd_ad_log limit 2;
View Code

5 消息通知表

技术分享图片
建表语句
hive (gmall)> 
drop table if exists dwd_notification_log;
CREATE EXTERNAL TABLE `dwd_notification_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string,
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string,  
`app_time` string,
`network` string, 
`lng` string, 
`lat` string, 
`action` string,
`noti_type` string,
`ap_time` string,
`content` string,
`server_time` string
)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_notification_log/;
导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_notification_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.action) action,
get_json_object(event_json,$.kv.noti_type) noti_type,
get_json_object(event_json,$.kv.ap_time) ap_time,
get_json_object(event_json,$.kv.content) content,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=notification;
测试
hive (gmall)> select * from dwd_notification_log limit 2;
View Code

6 用户前台活跃表

技术分享图片
1)建表语句
hive (gmall)> 
drop table if exists dwd_active_foreground_log;
CREATE EXTERNAL TABLE `dwd_active_foreground_log`(
`mid_id` string,
`user_id` string, 
`version_code` string,
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string,
`app_time` string, 
`network` string, 
`lng` string, 
`lat` string, 
`active_source` string,
`server_time` string)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_foreground_log/;
2)导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_active_foreground_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.active_source) active_source,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=active_foreground;
3)测试
hive (gmall)> select * from dwd_active_foreground_log limit 2;
View Code

7 用户后台活跃表

技术分享图片
1)建表语句
hive (gmall)> 
drop table if exists dwd_active_background_log;
CREATE EXTERNAL TABLE `dwd_active_background_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string,
 `height_width` string,  
`app_time` string,
`network` string, 
`lng` string, 
`lat` string, 
`active_source` string,
`server_time` string
)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_background_log/;
2)导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_active_background_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.active_source) active_source,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=active_background;
3)测试
hive (gmall)> select * from dwd_active_background_log limit 2;
View Code

8 评论表

技术分享图片
1)建表语句
hive (gmall)> 
drop table if exists dwd_comment_log;
CREATE EXTERNAL TABLE `dwd_comment_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string,  
`app_time` string,
`network` string, 
`lng` string, 
`lat` string, 
`comment_id` int, 
`userid` int, 
`p_comment_id` int, 
`content` string, 
`addtime` string, 
`other_id` int, 
`praise_count` int,
`reply_count` int,
`server_time` string
)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_comment_log/;
2)导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_comment_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.comment_id) comment_id,
get_json_object(event_json,$.kv.userid) userid,
get_json_object(event_json,$.kv.p_comment_id) p_comment_id,
get_json_object(event_json,$.kv.content) content,
get_json_object(event_json,$.kv.addtime) addtime,
get_json_object(event_json,$.kv.other_id) other_id,
get_json_object(event_json,$.kv.praise_count) praise_count,
get_json_object(event_json,$.kv.reply_count) reply_count,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=comment;
3)测试
hive (gmall)> select * from dwd_comment_log limit 2;
View Code

9 收藏表

技术分享图片
1)建表语句
hive (gmall)> 
drop table if exists dwd_favorites_log;
CREATE EXTERNAL TABLE `dwd_favorites_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string,  
`app_time` string,
`network` string, 
`lng` string, 
`lat` string, 
`id` int, 
`course_id` int, 
`userid` int,
`add_time` string,
`server_time` string
)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_favorites_log/;
2)导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_favorites_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.id) id,
get_json_object(event_json,$.kv.course_id) course_id,
get_json_object(event_json,$.kv.userid) userid,
get_json_object(event_json,$.kv.add_time) add_time,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=favorites;
3)测试
hive (gmall)> select * from dwd_favorites_log limit 2;
View Code

10 点赞表

技术分享图片
1)建表语句
hive (gmall)> 
drop table if exists dwd_praise_log;
CREATE EXTERNAL TABLE `dwd_praise_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string,  
`app_time` string,
`network` string, 
`lng` string, 
`lat` string, 
`id` string, 
`userid` string, 
`target_id` string,
`type` string,
`add_time` string,
`server_time` string
)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_praise_log/;
2)导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_praise_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.id) id,
get_json_object(event_json,$.kv.userid) userid,
get_json_object(event_json,$.kv.target_id) target_id,
get_json_object(event_json,$.kv.type) type,
get_json_object(event_json,$.kv.add_time) add_time,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=praise;
3)测试
hive (gmall)> select * from dwd_praise_log limit 2;
View Code

11 启动日志表

技术分享图片
1)建表语句
hive (gmall)> 
drop table if exists dwd_start_log;
CREATE EXTERNAL TABLE `dwd_start_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string,  
`app_time` string,
`network` string, 
`lng` string, 
`lat` string, 
`entry` string, 
`open_ad_type` string, 
`action` string, 
`loading_time` string, 
`detail` string, 
`extend1` string, 
`server_time` string
)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_start_log/;
2)导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_start_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.entry) entry,
get_json_object(event_json,$.kv.open_ad_type) open_ad_type,
get_json_object(event_json,$.kv.action) action,
get_json_object(event_json,$.kv.loading_time) loading_time,
get_json_object(event_json,$.kv.detail) detail,
get_json_object(event_json,$.kv.extend1) extend1,
server_time,
dt
from dwd_base_start_log 
where dt=2019-02-10 and event_name=start;
3)测试
hive (gmall)> select * from dwd_start_log limit 2;
View Code

12 错误日志表

技术分享图片
1)建表语句
hive (gmall)> 
drop table if exists dwd_error_log;
CREATE EXTERNAL TABLE `dwd_error_log`(
`mid_id` string,
`user_id` string, 
`version_code` string, 
`version_name` string, 
`lang` string, 
`source` string, 
`os` string, 
`area` string, 
`model` string,
`brand` string, 
`sdk_version` string, 
`gmail` string, 
`height_width` string,  
`app_time` string,
`network` string, 
`lng` string, 
`lat` string, 
`errorBrief` string, 
`errorDetail` string, 
`server_time` string)
PARTITIONED BY (dt string)
location /warehouse/gmall/dwd/dwd_error_log/;
2)导入数据
hive (gmall)> 
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dwd_error_log
PARTITION (dt)
select 
mid_id,
user_id,
version_code,
version_name,
lang,
source,
os,
area,
model,
brand,
sdk_version,
gmail,
height_width,
app_time,
network,
lng,
lat,
get_json_object(event_json,$.kv.errorBrief) errorBrief,
get_json_object(event_json,$.kv.errorDetail) errorDetail,
server_time,
dt
from dwd_base_event_log 
where dt=2019-02-10 and event_name=error;
3)测试
hive (gmall)> select * from dwd_error_log limit 2;
View Code

DWD层加载数据脚本

1)在hadoop101的/home/kris/bin目录下创建脚本

[kris@hadoop101 bin]$ vim dwd.sh

技术分享图片
#!/bin/bash

# 定义变量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
    do_date=$1
else 
    do_date=`date  -d "-1 day"  +%F`  
fi 
sql="

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table "$APP".dwd_display_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.action) action,
    get_json_object(event_json,$.kv.newsid) newsid,
    get_json_object(event_json,$.kv.place) place,
    get_json_object(event_json,$.kv.extend1) extend1,
    get_json_object(event_json,$.kv.category) category,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=display;


insert overwrite table "$APP".dwd_newsdetail_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.entry) entry,
    get_json_object(event_json,$.kv.action) action,
    get_json_object(event_json,$.kv.newsid) newsid,
    get_json_object(event_json,$.kv.showtype) showtype,
    get_json_object(event_json,$.kv.news_staytime) news_staytime,
    get_json_object(event_json,$.kv.loading_time) loading_time,
    get_json_object(event_json,$.kv.type1) type1,
    get_json_object(event_json,$.kv.category) category,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=newsdetail;


insert overwrite table "$APP".dwd_loading_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.action) action,
    get_json_object(event_json,$.kv.loading_time) loading_time,
    get_json_object(event_json,$.kv.loading_way) loading_way,
    get_json_object(event_json,$.kv.extend1) extend1,
    get_json_object(event_json,$.kv.extend2) extend2,
    get_json_object(event_json,$.kv.type) type,
    get_json_object(event_json,$.kv.type1) type1,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=loading;


insert overwrite table "$APP".dwd_ad_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.entry) entry,
    get_json_object(event_json,$.kv.action) action,
    get_json_object(event_json,$.kv.content) content,
    get_json_object(event_json,$.kv.detail) detail,
    get_json_object(event_json,$.kv.source) ad_source,
    get_json_object(event_json,$.kv.behavior) behavior,
    get_json_object(event_json,$.kv.newstype) newstype,
    get_json_object(event_json,$.kv.show_style) show_style,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=ad;


insert overwrite table "$APP".dwd_notification_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.action) action,
    get_json_object(event_json,$.kv.noti_type) noti_type,
    get_json_object(event_json,$.kv.ap_time) ap_time,
    get_json_object(event_json,$.kv.content) content,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=notification;


insert overwrite table "$APP".dwd_active_foreground_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.active_source) active_source,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=active_background;


insert overwrite table "$APP".dwd_active_background_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.active_source) active_source,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=active_background;


insert overwrite table "$APP".dwd_comment_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.comment_id) comment_id,
    get_json_object(event_json,$.kv.userid) userid,
    get_json_object(event_json,$.kv.p_comment_id) p_comment_id,
    get_json_object(event_json,$.kv.content) content,
    get_json_object(event_json,$.kv.addtime) addtime,
    get_json_object(event_json,$.kv.other_id) other_id,
    get_json_object(event_json,$.kv.praise_count) praise_count,
    get_json_object(event_json,$.kv.reply_count) reply_count,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=comment;


insert overwrite table "$APP".dwd_favorites_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.id) id,
    get_json_object(event_json,$.kv.course_id) course_id,
    get_json_object(event_json,$.kv.userid) userid,
    get_json_object(event_json,$.kv.add_time) add_time,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=favorites;


insert overwrite table "$APP".dwd_praise_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.id) id,
    get_json_object(event_json,$.kv.userid) userid,
    get_json_object(event_json,$.kv.target_id) target_id,
    get_json_object(event_json,$.kv.type) type,
    get_json_object(event_json,$.kv.add_time) add_time,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=praise;


insert overwrite table "$APP".dwd_start_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.entry) entry,
    get_json_object(event_json,$.kv.open_ad_type) open_ad_type,
    get_json_object(event_json,$.kv.action) action,
    get_json_object(event_json,$.kv.loading_time) loading_time,
    get_json_object(event_json,$.kv.detail) detail,
    get_json_object(event_json,$.kv.extend1) extend1,
    server_time,
    dt
from "$APP".dwd_base_start_log 
where dt=$do_date and event_name=start;


insert overwrite table "$APP".dwd_error_log
PARTITION (dt)
select 
    mid_id,
    user_id,
    version_code,
    version_name,
    lang,
    source,
    os,
    area,
    model,
    brand,
    sdk_version,
    gmail,
    height_width,
    app_time,
    network,
    lng,
    lat,
    get_json_object(event_json,$.kv.errorBrief) errorBrief,
    get_json_object(event_json,$.kv.errorDetail) errorDetail,
    server_time,
    dt
from "$APP".dwd_base_event_log 
where dt=$do_date and event_name=error;

"
$hive -e "$sql"

2)增加脚本执行权限
[kris@hadoop101 bin]$ chmod 777 dwd.sh
3)脚本使用
[kris@hadoop101 module]$ dwd.sh 2019-02-11
4)查询导入结果
hive (gmall)> 
select * from dwd_start_log where dt=2019-02-11 limit 2;
select * from dwd_comment_log where dt=2019-02-11 limit 2;
5)脚本执行时间
企业开发中一般在每日凌晨30分~1点
View Code

 

数仓1.1 分层| ODS& DWD层

原文:https://www.cnblogs.com/shengyang17/p/10545198.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!