分享

数仓项目(九)数仓搭建 - DWS 层

问题导读

1.DWS 层有哪些专业术语?
2.系统函数有哪些?
3.nvl 函数基本语法是什么?
4.DWS 层(业务)包含哪些内容?


上一篇
企业级数据仓库构建(八):搭建DWD 层-业务数据
https://www.aboutyun.com/forum.php?mod=viewthread&tid=29557



一、数仓搭建 - DWS 层
1.1 业务术语
1)用户
用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户。Android
系统根据 IMEI 号,IOS 系统根据 OpenUDID 来标识一个独立用户,每部手机一个用户

2)新增用户
首次联网使用应用的用户。如果一个用户首次打开某 APP,那这个用户定义为新增用
户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月
新增用户

3)活跃用户
打开应用的用户即为活跃用户,不考虑用户的使用情况。每天一台设备打开多次会被计
为一个活跃用户

4)周(月)活跃用户
某个自然周(月)内启动过应用的用户,该周(月)内的多次启动只记一个活跃用户

5)月活跃率
月活跃用户与截止到该月累计的用户总和之间的比例

6)沉默用户
用户仅在安装当天(次日)启动一次,后续时间无再启动行为。该指标可以反映新增用
户质量和用户与 APP 的匹配程度

7)版本分布
不同版本的周内各天新增用户数,活跃用户数和启动次数。利于判断 APP 各个版本之
间的优劣和用户行为习惯

8)本周回流用户
上周未启动过应用,本周启动了应用的用户

9)连续 n 周活跃用户
连续 n 周,每周至少启动一次

10)忠诚用户
连续活跃 5 周以上的用户

11)连续活跃用户
连续 2 周及以上活跃的用户

12)近期流失用户
连续 n(2<= n <= 4)周没有启动应用的用户。(第 n+1 周没有启动过)

13)留存用户
某段时间内的新增用户,经过一段时间后,仍然使用应用的被认作是留存用户;这部分
用户占当时新增用户的比例即是留存率
例如,5 月份新增用户 200,这 200 人在 6 月份启动过应用的有 100 人,7 月份启动过应用的有 80 人,8 月份启动过应用的有 50 人;则 5 月份新增用户一个月后的留存率是 50%,二个月后的留存率是 40%,三个月后的留存率是 25%

14)用户新鲜度
每天启动应用的新老用户比例,即新增用户数占活跃用户数的比例

15)单次使用时长
每次启动使用的时间长度

16)日使用时长
累计一天内的使用时间长度

17)启动次数计算标准
IOS 平台应用退到后台就算一次独立的启动;Android 平台我们规定,两次启动之间的间隔小于 30 秒,被计算一次启动。用户在使用过程中,若因收发短信或接电话等退出应用30 秒又再次返回应用中,那这两次行为应该是延续而非独立的,所以可以被算作一次使用行为,即一次启动。业内大多使用 30 秒这个标准,但用户还是可以自定义此时间间隔


1.2 系统函数
1.2.1 collect_set 函数
1)创建原数据表

  1. drop table if exists stud;
  2. create table stud (name string, area string, course string, score int);
复制代码

2)向原数据表中插入数据
  1. insert into table stud values('zhang3','bj','math',88);
  2. insert into table stud values('li4','bj','math',99);
  3. insert into table stud values('wang5','sh','chinese',92);
  4. insert into table stud values('zhao6','sh','chinese',54);
  5. insert into table stud values('tian7','bj','chinese',91);
复制代码
3)查询表中数据
  1. select * from stud;
  2. stud.name stud.area stud.course stud.score
  3. zhang3 bj math 88
  4. li4 bj math 99
  5. wang5 sh chinese 92
  6. zhao6 sh chinese 54
  7. tian7 bj chinese 91
复制代码
4)把同一分组的不同行的数据聚合成一个集合
  1. select course, collect_set(area), avg(score) from stud group by course;
  2. chinese ["sh","bj"] 79.0
  3. math ["bj"] 93.5
复制代码

5) 用下标可以取某一个
  1. select course, collect_set(area)[0], avg(score) from
  2. stud group by course;
  3. chinese sh 79.0
  4. math bj 93.5
复制代码

1.2.2 nvl 函数
1)基本语法

NVL(表达式 1,表达式 2)

如果表达式 1 为空值,NVL 返回值为表达式 2 的值,否则返回表达式 1 的值。 该函数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式 1 和表达式 2 的数据类型必须为同一个类型

1.2.3 日期处理函数
1)date_format 函数(根据格式整理日期)

  1. hive (gmall)> select date_format('2020-03-10','yyyy-MM');
  2. 2020-03
复制代码
2)date_add 函数(加减日期)
  1. hive (gmall)> select date_add('2020-03-10',-1);
  2. 2020-03-09
  3. hive (gmall)> select date_add('2020-03-10',1);
  4. 2020-03-11
复制代码
3)next_day 函数
(1)取当前天的下一个周一


  1. hive (gmall)> select next_day('2020-03-12','MO');
  2. 2020-03-16
  3. 说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)
复制代码
(2)取当前周的周一
  1. hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7);
  2. 2020-03-11
复制代码
4)last_day 函数(求当月最后一天日期)
  1. hive (gmall)> select last_day('2020-03-10');
  2. 2020-03-31
复制代码


1.3 DWS 层(用户行为)
1.3.1 每日设备行为
每日设备行为,主要按照 设备 id 统计


1.png

1)建表语句
  1. drop table if exists dws_uv_detail_daycount;
  2. create external table dws_uv_detail_daycount
  3. (
  4. `mid_id` string COMMENT '设备唯一标识',
  5. `user_id` string COMMENT '用户标识',
  6. `version_code` string COMMENT '程序版本号',
  7. `version_name` string COMMENT '程序版本名',
  8. `lang` string COMMENT '系统语言',
  9. `source` string COMMENT '渠道号',
  10. `os` string COMMENT '安卓系统版本',
  11. `area` string COMMENT '区域',
  12. `model` string COMMENT '手机型号',
  13. `brand` string COMMENT '手机品牌',
  14. `sdk_version` string COMMENT 'sdkVersion',
  15. `gmail` string COMMENT 'gmail',
  16. `height_width` string COMMENT '屏幕宽高',
  17. `app_time` string COMMENT '客户端日志产生时的时间',
  18. `network` string COMMENT '网络模式',
  19. `lng` string COMMENT '经度',
  20. `lat` string COMMENT '纬度',
  21. `login_count` bigint COMMENT '活跃次数'
  22. )
  23. partitioned by(dt string)
  24. stored as parquet
  25. location '/warehouse/gmall/dws/dws_uv_detail_daycount';
复制代码
2)数据装载
  1. insert overwrite table dws_uv_detail_daycount partition(dt='2020-03-10')
  2. select
  3. mid_id,
  4. concat_ws('|', collect_set(user_id)) user_id,
  5. concat_ws('|', collect_set(version_code)) version_code,
  6. concat_ws('|', collect_set(version_name)) version_name,
  7. concat_ws('|', collect_set(lang))lang,
  8. concat_ws('|', collect_set(source)) source,
  9. concat_ws('|', collect_set(os)) os,
  10. concat_ws('|', collect_set(area)) area,
  11. concat_ws('|', collect_set(model)) model,
  12. concat_ws('|', collect_set(brand)) brand,
  13. concat_ws('|', collect_set(sdk_version)) sdk_version,
  14. concat_ws('|', collect_set(gmail)) gmail,
  15. concat_ws('|', collect_set(height_width)) height_width,
  16. concat_ws('|', collect_set(app_time)) app_time,
  17. concat_ws('|', collect_set(network)) network,
  18. concat_ws('|', collect_set(lng)) lng,
  19. concat_ws('|', collect_set(lat)) lat,
  20. count(*) login_count
  21. from dwd_start_log
  22. where dt='2020-03-10'
  23. group by mid_id;
复制代码

3)查询加载结果
  1. select * from dws_uv_detail_daycount where dt='2020-03-10';
复制代码


1.4 DWS 层(业务)
DWS 层的宽表字段,是站在不同维度的视角去看事实表,重点关注事实表的度量值


1.png

1.4.1 每日会员行为
1)建表语句

  1. drop table if exists dws_user_action_daycount;
  2. create external table dws_user_action_daycount
  3. (
  4. user_id string comment '用户 id',
  5. login_count bigint comment '登录次数',
  6. cart_count bigint comment '加入购物车次数',
  7. cart_amount double comment '加入购物车金额',
  8. order_count bigint comment '下单次数',
  9. order_amount decimal(16,2) comment '下单金额',
  10. payment_count bigint comment '支付次数',
  11. payment_amount decimal(16,2) comment '支付金额'
  12. ) COMMENT '每日用户行为'
  13. PARTITIONED BY (`dt` string)
  14. stored as parquet
  15. location '/warehouse/gmall/dws/dws_user_action_daycount/'
  16. tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
  1. with
  2. tmp_login as
  3. (
  4. select
  5. user_id,
  6. count(*) login_count
  7. from dwd_start_log
  8. where dt='2020-03-10'
  9. and user_id is not null
  10. group by user_id
  11. ),
  12. tmp_cart as
  13. (
  14. select
  15. user_id,
  16. count(*) cart_count,
  17. sum(cart_price*sku_num) cart_amount
  18. from dwd_fact_cart_info
  19. where dt='2020-03-10'
  20. and user_id is not null
  21. and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
  22. group by user_id
  23. ),
  24. tmp_order as
  25. (
  26. select
  27. user_id,
  28. count(*) order_count,
  29. sum(final_total_amount) order_amount
  30. from dwd_fact_order_info
  31. where dt='2020-03-10'
  32. group by user_id
  33. ) ,
  34. tmp_payment as
  35. (
  36. select
  37. user_id,
  38. count(*) payment_count,
  39. sum(payment_amount) payment_amount
  40. from dwd_fact_payment_info
  41. where dt='2020-03-10'
  42. group by user_id
  43. )
  44. insert overwrite table dws_user_action_daycount partition(dt='2020-03-10')
  45. select
  46. user_actions.user_id,
  47. sum(user_actions.login_count),
  48. sum(user_actions.cart_count),
  49. sum(user_actions.cart_amount),
  50. sum(user_actions.order_count),
  51. sum(user_actions.order_amount),
  52. sum(user_actions.payment_count),
  53. sum(user_actions.payment_amount)
  54. from
  55. (
  56. select
  57. user_id,
  58. login_count,
  59. 0 cart_count,
  60. 0 cart_amount,
  61. 0 order_count,
  62. 0 order_amount,
  63. 0 payment_count,
  64. 0 payment_amount
  65. from
  66. tmp_login
  67. union all
  68. select
  69. user_id,
  70. 0 login_count,
  71. cart_count,
  72. cart_amount,
  73. 0 order_count,
  74. 0 order_amount,
  75. 0 payment_count,
  76. 0 payment_amount
  77. from
  78. tmp_cart
  79. union all
  80. select
  81. user_id,
  82. 0 login_count,
  83. 0 cart_count,
  84. 0 cart_amount,
  85. order_count,
  86. order_amount,
  87. 0 payment_count,
  88. 0 payment_amount
  89. from tmp_order
  90. union all
  91. select
  92. user_id,
  93. 0 login_count,
  94. 0 cart_count,
  95. 0 cart_amount,
  96. 0 order_count,
  97. 0 order_amount,
  98. payment_count,
  99. payment_amount
  100. from tmp_payment
  101. ) user_actions
  102. group by user_id;
复制代码
3)查询加载结果
hive (gmall)> select * from dws_user_action_daycount where dt=‘2020-03-10’;

1.4.2 每日商品行为

1)建表语句
  1. drop table if exists dws_sku_action_daycount;
  2. create external table dws_sku_action_daycount
  3. (
  4. sku_id string comment 'sku_id',
  5. order_count bigint comment '被下单次数',
  6. order_num bigint comment '被下单件数',
  7. order_amount decimal(16,2) comment '被下单金额',
  8. payment_count bigint comment '被支付次数',
  9. payment_num bigint comment '被支付件数',
  10. payment_amount decimal(16,2) comment '被支付金额',
  11. refund_count bigint comment '被退款次数',
  12. refund_num bigint comment '被退款件数',
  13. refund_amount decimal(16,2) comment '被退款金额',
  14. cart_count bigint comment '被加入购物车次数',
  15. cart_num bigint comment '被加入购物车件数',
  16. favor_count bigint comment '被收藏次数',
  17. appraise_good_count bigint comment '好评数',
  18. appraise_mid_count bigint comment '中评数',
  19. appraise_bad_count bigint comment '差评数',
  20. appraise_default_count bigint comment '默认评价数'
  21. ) COMMENT '每日商品行为'
  22. PARTITIONED BY (`dt` string)
  23. stored as parquet
  24. location '/warehouse/gmall/dws/dws_sku_action_daycount/'
  25. tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
注意:如果是 23 点 59 下单,支付日期跨天。需要从订单详情里面取出支付时间是今天,订单时间是昨天或者今天的订单


  1. with
  2. tmp_order as
  3. (
  4. select
  5. sku_id,
  6. count(*) order_count,
  7. sum(sku_num) order_num,
  8. sum(total_amount) order_amount
  9. from dwd_fact_order_detail
  10. where dt='2020-03-10'
  11. group by sku_id
  12. ),
  13. tmp_payment as
  14. (
  15. select
  16. sku_id,
  17. count(*) payment_count,
  18. sum(sku_num) payment_num,
  19. sum(total_amount) payment_amount
  20. from dwd_fact_order_detail
  21. where dt='2020-03-10'
  22. and order_id in
  23. (
  24. select
  25. id
  26. from dwd_fact_order_info
  27. where (dt='2020-03-10' or dt=date_add('2020-03-10',-1))
  28. and date_format(payment_time,'yyyy-MM-dd')='2020-03-10'
  29. )
  30. group by sku_id
  31. ),
  32. tmp_refund as
  33. (
  34. select
  35. sku_id,
  36. count(*) refund_count,
  37. sum(refund_num) refund_num,
  38. sum(refund_amount) refund_amount
  39. from dwd_fact_order_refund_info
  40. where dt='2020-03-10'
  41. group by sku_id
  42. ),
  43. tmp_cart as
  44. (
  45. select
  46. sku_id,
  47. count(*) cart_count,
  48. sum(sku_num) cart_num
  49. from dwd_fact_cart_info
  50. where dt='2020-03-10'
  51. and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
  52. group by sku_id
  53. ),
  54. tmp_favor as
  55. (
  56. select
  57. sku_id,
  58. count(*) favor_count
  59. from dwd_fact_favor_info
  60. where dt='2020-03-10'
  61. and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
  62. group by sku_id
  63. ),
  64. tmp_appraise as
  65. (
  66. select
  67. sku_id,
  68. sum(if(appraise='1201',1,0)) appraise_good_count,
  69. sum(if(appraise='1202',1,0)) appraise_mid_count,
  70. sum(if(appraise='1203',1,0)) appraise_bad_count,
  71. sum(if(appraise='1204',1,0)) appraise_default_count
  72. from dwd_fact_comment_info
  73. where dt='2020-03-10'
  74. group by sku_id
  75. )
  76. insert overwrite table dws_sku_action_daycount partition(dt='2020-03-10')
  77. select
  78. sku_id,
  79. sum(order_count),
  80. sum(order_num),
  81. sum(order_amount),
  82. sum(payment_count),
  83. sum(payment_num),
  84. sum(payment_amount),
  85. sum(refund_count),
  86. sum(refund_num),
  87. sum(refund_amount),
  88. sum(cart_count),
  89. sum(cart_num),
  90. sum(favor_count),
  91. sum(appraise_good_count),
  92. sum(appraise_mid_count),
  93. sum(appraise_bad_count),
  94. sum(appraise_default_count)
  95. from
  96. (
  97. select
  98. sku_id,
  99. order_count,
  100. order_num,
  101. order_amount,
  102. 0 payment_count,
  103. 0 payment_num,
  104. 0 payment_amount,
  105. 0 refund_count,
  106. 0 refund_num,
  107. 0 refund_amount,
  108. 0 cart_count,
  109. 0 cart_num,
  110. 0 favor_count,
  111. 0 appraise_good_count,
  112. 0 appraise_mid_count,
  113. 0 appraise_bad_count,
  114. 0 appraise_default_count
  115. from tmp_order
  116. union all
  117. select
  118. sku_id,
  119. 0 order_count,
  120. 0 order_num,
  121. 0 order_amount,
  122. payment_count,
  123. payment_num,
  124. payment_amount,
  125. 0 refund_count,
  126. 0 refund_num,
  127. 0 refund_amount,
  128. 0 cart_count,
  129. 0 cart_num,
  130. 0 favor_count,
  131. 0 appraise_good_count,
  132. 0 appraise_mid_count,
  133. 0 appraise_bad_count,
  134. 0 appraise_default_count
  135. from tmp_payment
  136. union all
  137. select
  138. sku_id,
  139. 0 order_count,
  140. 0 order_num,
  141. 0 order_amount,
  142. 0 payment_count,
  143. 0 payment_num,
  144. 0 payment_amount,
  145. refund_count,
  146. refund_num,
  147. refund_amount,
  148. 0 cart_count,
  149. 0 cart_num,
  150. 0 favor_count,
  151. 0 appraise_good_count,
  152. 0 appraise_mid_count,
  153. 0 appraise_bad_count,
  154. 0 appraise_default_count
  155. from tmp_refund
  156. union all
  157. select
  158. sku_id,
  159. 0 order_count,
  160. 0 order_num,
  161. 0 order_amount,
  162. 0 payment_count,
  163. 0 payment_num,
  164. 0 payment_amount,
  165. 0 refund_count,
  166. 0 refund_num,
  167. 0 refund_amount,
  168. cart_count,
  169. cart_num,
  170. 0 favor_count,
  171. 0 appraise_good_count,
  172. 0 appraise_mid_count,
  173. 0 appraise_bad_count,
  174. 0 appraise_default_count
  175. from tmp_cart
  176. union all
  177. select
  178. sku_id,
  179. 0 order_count,
  180. 0 order_num,
  181. 0 order_amount,
  182. 0 payment_count,
  183. 0 payment_num,
  184. 0 payment_amount,
  185. 0 refund_count,
  186. 0 refund_num,
  187. 0 refund_amount,
  188. 0 cart_count,
  189. 0 cart_num,
  190. favor_count,
  191. 0 appraise_good_count,
  192. 0 appraise_mid_count,
  193. 0 appraise_bad_count,
  194. 0 appraise_default_count
  195. from tmp_favor
  196. union all
  197. select
  198. sku_id,
  199. 0 order_count,
  200. 0 order_num,
  201. 0 order_amount,
  202. 0 payment_count,
  203. 0 payment_num,
  204. 0 payment_amount,
  205. 0 refund_count,
  206. 0 refund_num,
  207. 0 refund_amount,
  208. 0 cart_count,
  209. 0 cart_num,
  210. 0 favor_count,
  211. appraise_good_count,
  212. appraise_mid_count,
  213. appraise_bad_count,
  214. appraise_default_count
  215. from tmp_appraise
  216. )tmp
  217. group by sku_id;
复制代码
3)查询加载结果
  1. select * from dws_sku_action_daycount where dt='2020-03-10';
复制代码

1.4.5 每日购买行为


1.png

1)建表语句
  1. drop table if exists dws_sale_detail_daycount;
  2. create external table dws_sale_detail_daycount
  3. (
  4. user_id string comment '用户 id',
  5. sku_id string comment '商品 id',
  6. user_gender string comment '用户性别',
  7. user_age string comment '用户年龄',
  8. user_level string comment '用户等级',
  9. order_price decimal(10,2) comment '商品价格',
  10. sku_name string comment '商品名称',
  11. sku_tm_id string comment '品牌 id',
  12. sku_category3_id string comment '商品三级品类 id',
  13. sku_category2_id string comment '商品二级品类 id',
  14. sku_category1_id string comment '商品一级品类 id',
  15. sku_category3_name string comment '商品三级品类名称',
  16. sku_category2_name string comment '商品二级品类名称',
  17. sku_category1_name string comment '商品一级品类名称',
  18. spu_id string comment '商品 spu',
  19. sku_num int comment '购买个数',
  20. order_count bigint comment '当日下单单数',
  21. order_amount decimal(16,2) comment '当日下单金额'
  22. ) COMMENT '每日购买行为'
  23. PARTITIONED BY (`dt` string)
  24. stored as parquet
  25. location '/warehouse/gmall/dws/dws_sale_detail_daycount/'
  26. tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
  1. insert overwrite table dws_sale_detail_daycount partition(dt='2020-03-10')
  2. select
  3. op.user_id,
  4. op.sku_id,
  5. ui.gender,
  6. months_between('2020-03-10', ui.birthday)/12 age,
  7. ui.user_level,
  8. si.price,
  9. si.sku_name,
  10. si.tm_id,
  11. si.category3_id,
  12. si.category2_id,
  13. si.category1_id,
  14. si.category3_name,
  15. si.category2_name,
  16. si.category1_name,
  17. si.spu_id,
  18. op.sku_num,
  19. op.order_count,
  20. op.order_amount
  21. from
  22. (
  23. select
  24. user_id,
  25. sku_id,
  26. sum(sku_num) sku_num,
  27. count(*) order_count,
  28. sum(total_amount) order_amount
  29. from dwd_fact_order_detail
  30. where dt='2020-03-10'
  31. group by user_id, sku_id
  32. )op
  33. join
  34. (
  35. select
  36. *
  37. from dwd_dim_user_info_his
  38. where end_date='9999-99-99'
  39. )ui on op.user_id = ui.id
  40. join
  41. (
  42. select
  43. *
  44. from dwd_dim_sku_info
  45. where dt='2020-03-10'
  46. )si on op.sku_id = si.id;
复制代码




3)查询加载结果
  1. select * from dws_sale_detail_daycount where dt='2020-03-10';
复制代码



1.5 DWS 层数据导入脚本
1)vim dwd_to_dws.sh
在脚本中填写如下内容

  1. #!/bin/bash
  2. APP=gmall
  3. hive=/opt/modules/hive/bin/hive
  4. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  5. if [ -n "$1" ] ;then
  6. do_date=$1
  7. else
  8. do_date=`date -d "-1 day" +%F`
  9. fi
  10. sql="
  11. insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')
  12. select
  13. mid_id,
  14. concat_ws('|', collect_set(user_id)) user_id,
  15. concat_ws('|', collect_set(version_code)) version_code,
  16. concat_ws('|', collect_set(version_name)) version_name,
  17. concat_ws('|', collect_set(lang))lang,
  18. concat_ws('|', collect_set(source)) source,
  19. concat_ws('|', collect_set(os)) os,
  20. concat_ws('|', collect_set(area)) area,
  21. concat_ws('|', collect_set(model)) model,
  22. concat_ws('|', collect_set(brand)) brand,
  23. concat_ws('|', collect_set(sdk_version)) sdk_version,
  24. concat_ws('|', collect_set(gmail)) gmail,
  25. concat_ws('|', collect_set(height_width)) height_width,
  26. concat_ws('|', collect_set(app_time)) app_time,
  27. concat_ws('|', collect_set(network)) network,
  28. concat_ws('|', collect_set(lng)) lng,
  29. concat_ws('|', collect_set(lat)) lat,
  30. count(*) login_count
  31. from ${APP}.dwd_start_log
  32. where dt='$do_date'
  33. group by mid_id;
  34. with
  35. tmp_login as
  36. (
  37. select
  38. user_id,
  39. count(*) login_count
  40. from ${APP}.dwd_start_log
  41. where dt='$do_date'
  42. and user_id is not null
  43. group by user_id
  44. ),
  45. tmp_cart as
  46. (
  47. select
  48. user_id,
  49. count(*) cart_count,
  50. sum(cart_price*sku_num) cart_amount
  51. from ${APP}.dwd_fact_cart_info
  52. where dt='$do_date'
  53. and user_id is not null
  54. and date_format(create_time,'yyyy-MM-dd')='$do_date'
  55. group by user_id
  56. ),
  57. tmp_order as
  58. (
  59. select
  60. user_id,
  61. count(*) order_count,
  62. sum(final_total_amount) order_amount
  63. from ${APP}.dwd_fact_order_info
  64. where dt='$do_date'
  65. group by user_id
  66. ) ,
  67. tmp_payment as
  68. (
  69. select
  70. user_id,
  71. count(*) payment_count,
  72. sum(payment_amount) payment_amount
  73. from ${APP}.dwd_fact_payment_info
  74. where dt='$do_date'
  75. group by user_id
  76. )
  77. insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
  78. select
  79. user_actions.user_id,
  80. sum(user_actions.login_count),
  81. sum(user_actions.cart_count),
  82. sum(user_actions.cart_amount),
  83. sum(user_actions.order_count),
  84. sum(user_actions.order_amount),
  85. sum(user_actions.payment_count),
  86. sum(user_actions.payment_amount)
  87. from
  88. (
  89. select
  90. user_id,
  91. login_count,
  92. 0 cart_count,
  93. 0 cart_amount,
  94. 0 order_count,
  95. 0 order_amount,
  96. 0 payment_count,
  97. 0 payment_amount
  98. from
  99. tmp_login
  100. union all
  101. select
  102. user_id,
  103. 0 login_count,
  104. cart_count,
  105. cart_amount,
  106. 0 order_count,
  107. 0 order_amount,
  108. 0 payment_count,
  109. 0 payment_amount
  110. from
  111. tmp_cart
  112. union all
  113. select
  114. user_id,
  115. 0 login_count,
  116. 0 cart_count,
  117. 0 cart_amount,
  118. order_count,
  119. order_amount,
  120. 0 payment_count,
  121. 0 payment_amount
  122. from tmp_order
  123. union all
  124. select
  125. user_id,
  126. 0 login_count,
  127. 0 cart_count,
  128. 0 cart_amount,
  129. 0 order_count,
  130. 0 order_amount,
  131. payment_count,
  132. payment_amount
  133. from tmp_payment
  134. ) user_actions
  135. group by user_id;
  136. with
  137. tmp_order as
  138. (
  139. select
  140. sku_id,
  141. count(*) order_count,
  142. sum(sku_num) order_num,
  143. sum(total_amount) order_amount
  144. from ${APP}.dwd_fact_order_detail
  145. where dt='$do_date'
  146. group by sku_id
  147. ),
  148. tmp_payment as
  149. (
  150. select
  151. sku_id,
  152. count(*) payment_count,
  153. sum(sku_num) payment_num,
  154. sum(total_amount) payment_amount
  155. from ${APP}.dwd_fact_order_detail
  156. where dt='$do_date'
  157. and order_id in
  158. (
  159. select
  160. id
  161. from ${APP}.dwd_fact_order_info
  162. where (dt='$do_date' or dt=date_add('$do_date',-1))
  163. and date_format(payment_time,'yyyy-MM-dd')='$do_date'
  164. )
  165. group by sku_id
  166. ),
  167. tmp_refund as
  168. (
  169. select
  170. sku_id,
  171. count(*) refund_count,
  172. sum(refund_num) refund_num,
  173. sum(refund_amount) refund_amount
  174. from ${APP}.dwd_fact_order_refund_info
  175. where dt='$do_date'
  176. group by sku_id
  177. ),
  178. tmp_cart as
  179. (
  180. select
  181. sku_id,
  182. count(*) cart_count,
  183. sum(sku_num) cart_num
  184. from ${APP}.dwd_fact_cart_info
  185. where dt='$do_date'
  186. and date_format(create_time,'yyyy-MM-dd')='$do_date'
  187. group by sku_id
  188. ),
  189. tmp_favor as
  190. (
  191. select
  192. sku_id,
  193. count(*) favor_count
  194. from ${APP}.dwd_fact_favor_info
  195. where dt='$do_date'
  196. and date_format(create_time,'yyyy-MM-dd')='$do_date'
  197. group by sku_id
  198. ),
  199. tmp_appraise as
  200. (
  201. select
  202. sku_id,
  203. sum(if(appraise='1201',1,0)) appraise_good_count,
  204. sum(if(appraise='1202',1,0)) appraise_mid_count,
  205. sum(if(appraise='1203',1,0)) appraise_bad_count,
  206. sum(if(appraise='1204',1,0)) appraise_default_count
  207. from ${APP}.dwd_fact_comment_info
  208. where dt='$do_date'
  209. group by sku_id
  210. )
  211. insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
  212. select
  213. sku_id,
  214. sum(order_count),
  215. sum(order_num),
  216. sum(order_amount),
  217. sum(payment_count),
  218. sum(payment_num),
  219. sum(payment_amount),
  220. sum(refund_count),
  221. sum(refund_num),
  222. sum(refund_amount),
  223. sum(cart_count),
  224. sum(cart_num),
  225. sum(favor_count),
  226. sum(appraise_good_count),
  227. sum(appraise_mid_count),
  228. sum(appraise_bad_count),
  229. sum(appraise_default_count)
  230. from
  231. (
  232. select
  233. sku_id,
  234. order_count,
  235. order_num,
  236. order_amount,
  237. 0 payment_count,
  238. 0 payment_num,
  239. 0 payment_amount,
  240. 0 refund_count,
  241. 0 refund_num,
  242. 0 refund_amount,
  243. 0 cart_count,
  244. 0 cart_num,
  245. 0 favor_count,
  246. 0 appraise_good_count,
  247. 0 appraise_mid_count,
  248. 0 appraise_bad_count,
  249. 0 appraise_default_count
  250. from tmp_order
  251. union all
  252. select
  253. sku_id,
  254. 0 order_count,
  255. 0 order_num,
  256. 0 order_amount,
  257. payment_count,
  258. payment_num,
  259. payment_amount,
  260. 0 refund_count,
  261. 0 refund_num,
  262. 0 refund_amount,
  263. 0 cart_count,
  264. 0 cart_num,
  265. 0 favor_count,
  266. 0 appraise_good_count,
  267. 0 appraise_mid_count,
  268. 0 appraise_bad_count,
  269. 0 appraise_default_count
  270. from tmp_payment
  271. union all
  272. select
  273. sku_id,
  274. 0 order_count,
  275. 0 order_num,
  276. 0 order_amount,
  277. 0 payment_count,
  278. 0 payment_num,
  279. 0 payment_amount,
  280. refund_count,
  281. refund_num,
  282. refund_amount,
  283. 0 cart_count,
  284. 0 cart_num,
  285. 0 favor_count,
  286. 0 appraise_good_count,
  287. 0 appraise_mid_count,
  288. 0 appraise_bad_count,
  289. 0 appraise_default_count
  290. from tmp_refund
  291. union all
  292. select
  293. sku_id,
  294. 0 order_count,
  295. 0 order_num,
  296. 0 order_amount,
  297. 0 payment_count,
  298. 0 payment_num,
  299. 0 payment_amount,
  300. 0 refund_count,
  301. 0 refund_num,
  302. 0 refund_amount,
  303. cart_count,
  304. cart_num,
  305. 0 favor_count,
  306. 0 appraise_good_count,
  307. 0 appraise_mid_count,
  308. 0 appraise_bad_count,
  309. 0 appraise_default_count
  310. from tmp_cart
  311. union all
  312. select
  313. sku_id,
  314. 0 order_count,
  315. 0 order_num,
  316. 0 order_amount,
  317. 0 payment_count,
  318. 0 payment_num,
  319. 0 payment_amount,
  320. 0 refund_count,
  321. 0 refund_num,
  322. 0 refund_amount,
  323. 0 cart_count,
  324. 0 cart_num,
  325. favor_count,
  326. 0 appraise_good_count,
  327. 0 appraise_mid_count,
  328. 0 appraise_bad_count,
  329. 0 appraise_default_count
  330. from tmp_favor
  331. union all
  332. select
  333. sku_id,
  334. 0 order_count,
  335. 0 order_num,
  336. 0 order_amount,
  337. 0 payment_count,
  338. 0 payment_num,
  339. 0 payment_amount,
  340. 0 refund_count,
  341. 0 refund_num,
  342. 0 refund_amount,
  343. 0 cart_count,
  344. 0 cart_num,
  345. 0 favor_count,
  346. appraise_good_count,
  347. appraise_mid_count,
  348. appraise_bad_count,
  349. appraise_default_count
  350. from tmp_appraise
  351. )tmp
  352. group by sku_id;
  353. insert overwrite table ${APP}.dws_sale_detail_daycount partition(dt='$do_date')
  354. select
  355. op.user_id,
  356. op.sku_id,
  357. ui.gender,
  358. months_between('$do_date', ui.birthday)/12 age,
  359. ui.user_level,
  360. si.price,
  361. si.sku_name,
  362. si.tm_id,
  363. si.category3_id,
  364. si.category2_id,
  365. si.category1_id,
  366. si.category3_name,
  367. si.category2_name,
  368. si.category1_name,
  369. si.spu_id,
  370. op.sku_num,
  371. op.order_count,
  372. op.order_amount
  373. from
  374. (
  375. select
  376. user_id,
  377. sku_id,
  378. sum(sku_num) sku_num,
  379. count(*) order_count,
  380. sum(total_amount) order_amount
  381. from ${APP}.dwd_fact_order_detail
  382. where dt='$do_date'
  383. group by user_id, sku_id
  384. )op
  385. join
  386. (
  387. select
  388. *
  389. from ${APP}.dwd_dim_user_info_his
  390. where end_date='9999-99-99'
  391. )ui on op.user_id = ui.id
  392. join
  393. (
  394. select
  395. *
  396. from ${APP}.dwd_dim_sku_info
  397. where dt='$do_date'
  398. )si on op.sku_id = si.id;
  399. "
  400. $hive -e "$sql"
复制代码
2)增加脚本执行权限
  1. chmod 770 dwd_to_dws.sh
复制代码

3)执行脚本导入数据
  1. dwd_to_dws.sh 2020-03-11
复制代码
4)查看导入数据
  1. select * from dws_uv_detail_daycount where dt='2020-03-11';
  2. select * from dws_user_action_daycount where dt='2020-03-11';
  3. select * from dws_sku_action_daycount where dt='2020-03-11';
  4. select * from dws_sale_detail_daycount where dt='2020-03-11';
复制代码


二、数仓搭建-DWT 层
2.1 设备主题宽表


1.png

1)建表语句
  1. drop table if exists dwt_uv_topic;
  2. create external table dwt_uv_topic
  3. (
  4. `mid_id` string COMMENT '设备唯一标识',
  5. `user_id` string COMMENT '用户标识',
  6. `version_code` string COMMENT '程序版本号',
  7. `version_name` string COMMENT '程序版本名',
  8. `lang` string COMMENT '系统语言',
  9. `source` string COMMENT '渠道号',
  10. `os` string COMMENT '安卓系统版本',
  11. `area` string COMMENT '区域',
  12. `model` string COMMENT '手机型号',
  13. `brand` string COMMENT '手机品牌',
  14. `sdk_version` string COMMENT 'sdkVersion',
  15. `gmail` string COMMENT 'gmail',
  16. `height_width` string COMMENT '屏幕宽高',
  17. `app_time` string COMMENT '客户端日志产生时的时间',
  18. `network` string COMMENT '网络模式',
  19. `lng` string COMMENT '经度',
  20. `lat` string COMMENT '纬度',
  21. `login_date_first` string comment '首次活跃时间',
  22. `login_date_last` string comment '末次活跃时间',
  23. `login_day_count` bigint comment '当日活跃次数',
  24. `login_count` bigint comment '累积活跃天数'
  25. )
  26. stored as parquet
  27. location '/warehouse/gmall/dwt/dwt_uv_topic';
复制代码

2)数据装载
  1. insert overwrite table dwt_uv_topic
  2. select
  3. nvl(new.mid_id,old.mid_id),
  4. nvl(new.user_id,old.user_id),
  5. nvl(new.version_code,old.version_code),
  6. nvl(new.version_name,old.version_name),
  7. nvl(new.lang,old.lang),
  8. nvl(new.source,old.source),
  9. nvl(new.os,old.os),
  10. nvl(new.area,old.area),
  11. nvl(new.model,old.model),
  12. nvl(new.brand,old.brand),
  13. nvl(new.sdk_version,old.sdk_version),
  14. nvl(new.gmail,old.gmail),
  15. nvl(new.height_width,old.height_width),
  16. nvl(new.app_time,old.app_time),
  17. nvl(new.network,old.network),
  18. nvl(new.lng,old.lng),
  19. nvl(new.lat,old.lat),
  20. if(old.mid_id is null,'2020-03-10',old.login_date_first),
  21. if(new.mid_id is not null,'2020-03-10',old.login_date_last),
  22. if(new.mid_id is not null, new.login_count,0),
  23. nvl(old.login_count,0)+if(new.login_count>0,1,0)
  24. from
  25. (
  26. select
  27. *
  28. from dwt_uv_topic
  29. )old
  30. full outer join
  31. (
  32. select
  33. *
  34. from dws_uv_detail_daycount
  35. where dt='2020-03-10'
  36. )new
  37. on old.mid_id=new.mid_id;
复制代码


3)查询加载结果
  1. select * from dwt_uv_topic limit 5;
复制代码
2.2 会员主题宽表
宽表字段怎么来?维度关联的事实表度量值+开头、结尾+累积+累积一个时间段



1.png


1)建表语句
  1. drop table if exists dwt_user_topic;
  2. create external table dwt_user_topic
  3. (
  4. user_id string comment '用户 id',
  5. login_date_first string comment '首次登录时间',
  6. login_date_last string comment '末次登录时间',
  7. login_count bigint comment '累积登录天数',
  8. login_last_30d_count bigint comment '最近 30 日登录天数',
  9. order_date_first string comment '首次下单时间',
  10. order_date_last string comment '末次下单时间',
  11. order_count bigint comment '累积下单次数',
  12. order_amount decimal(16,2) comment '累积下单金额',
  13. order_last_30d_count bigint comment '最近 30 日下单次数',
  14. order_last_30d_amount bigint comment '最近 30 日下单金额',
  15. payment_date_first string comment '首次支付时间',
  16. payment_date_last string comment '末次支付时间',
  17. payment_count decimal(16,2) comment '累积支付次数',
  18. payment_amount decimal(16,2) comment '累积支付金额',
  19. payment_last_30d_count decimal(16,2) comment '最近 30 日支付次数',
  20. payment_last_30d_amount decimal(16,2) comment '最近 30 日支付金额'
  21. )COMMENT '用户主题宽表'
  22. stored as parquet
  23. location '/warehouse/gmall/dwt/dwt_user_topic/'
  24. tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
  1. insert overwrite table dwt_user_topic
  2. select
  3. nvl(new.user_id,old.user_id),
  4. if(old.login_date_first is null and
  5. new.login_count>0,'2020-03-10',old.login_date_first),
  6. if(new.login_count>0,'2020-03-10',old.login_date_last),
  7. nvl(old.login_count,0)+if(new.login_count>0,1,0),
  8. nvl(new.login_last_30d_count,0),
  9. if(old.order_date_first is null and
  10. new.order_count>0,'2020-03-10',old.order_date_first),
  11. if(new.order_count>0,'2020-03-10',old.order_date_last),
  12. nvl(old.order_count,0)+nvl(new.order_count,0),
  13. nvl(old.order_amount,0)+nvl(new.order_amount,0),
  14. nvl(new.order_last_30d_count,0),
  15. nvl(new.order_last_30d_amount,0),
  16. if(old.payment_date_first is null and
  17. new.payment_count>0,'2020-03-10',old.payment_date_first),
  18. if(new.payment_count>0,'2020-03-10',old.payment_date_last),
  19. nvl(old.payment_count,0)+nvl(new.payment_count,0),
  20. nvl(old.payment_amount,0)+nvl(new.payment_amount,0),
  21. nvl(new.payment_last_30d_count,0),
  22. nvl(new.payment_last_30d_amount,0)
  23. from
  24. dwt_user_topic old
  25. full outer join
  26. (
  27. select
  28. user_id,
  29. sum(if(dt='2020-03-10',login_count,0)) login_count,
  30. sum(if(dt='2020-03-10',order_count,0)) order_count,
  31. sum(if(dt='2020-03-10',order_amount,0)) order_amount,
  32. sum(if(dt='2020-03-10',payment_count,0)) payment_count,
  33. sum(if(dt='2020-03-10',payment_amount,0)) payment_amount,
  34. sum(if(login_count>0,1,0)) login_last_30d_count,
  35. sum(order_count) order_last_30d_count,
  36. sum(order_amount) order_last_30d_amount,
  37. sum(payment_count) payment_last_30d_count,
  38. sum(payment_amount) payment_last_30d_amount
  39. from dws_user_action_daycount
  40. where dt>=date_add( '2020-03-10',-30)
  41. group by user_id
  42. )new
  43. on old.user_id=new.user_id;
复制代码
3)查询加载结果
  1. select * from dwt_user_topic limit 5;
复制代码
2.3 商品主题宽表
1)建表语句

  1. create external table dwt_sku_topic
  2. (
  3. sku_id string comment 'sku_id',
  4. spu_id string comment 'spu_id',
  5. order_last_30d_count bigint comment '最近 30 日被下单次数',
  6. order_last_30d_num bigint comment '最近 30 日被下单件数',
  7. order_last_30d_amount decimal(16,2) comment '最近 30 日被下单金额',
  8. order_count bigint comment '累积被下单次数',
  9. order_num bigint comment '累积被下单件数',
  10. order_amount decimal(16,2) comment '累积被下单金额',
  11. payment_last_30d_count bigint comment '最近 30 日被支付次数',
  12. payment_last_30d_num bigint comment '最近 30 日被支付件数',
  13. payment_last_30d_amount decimal(16,2) comment '最近 30 日被支付金额',
  14. payment_count bigint comment '累积被支付次数',
  15. payment_num bigint comment '累积被支付件数',
  16. payment_amount decimal(16,2) comment '累积被支付金额',
  17. refund_last_30d_count bigint comment '最近三十日退款次数',
  18. refund_last_30d_num bigint comment '最近三十日退款件数',
  19. refund_last_30d_amount decimal(10,2) comment '最近三十日退款金额',
  20. refund_count bigint comment '累积退款次数',
  21. refund_num bigint comment '累积退款件数',
  22. refund_amount decimal(10,2) comment '累积退款金额',
  23. cart_last_30d_count bigint comment '最近 30 日被加入购物车次数',
  24. cart_last_30d_num bigint comment '最近 30 日被加入购物车件数',
  25. cart_count bigint comment '累积被加入购物车次数',
  26. cart_num bigint comment '累积被加入购物车件数',
  27. favor_last_30d_count bigint comment '最近 30 日被收藏次数',
  28. favor_count bigint comment '累积被收藏次数',
  29. appraise_last_30d_good_count bigint comment '最近 30 日好评数',
  30. appraise_last_30d_mid_count bigint comment '最近 30 日中评数',
  31. appraise_last_30d_bad_count bigint comment '最近 30 日差评数',
  32. appraise_last_30d_default_count bigint comment '最近 30 日默认评价数',
  33. appraise_good_count bigint comment '累积好评数',
  34. appraise_mid_count bigint comment '累积中评数',
  35. appraise_bad_count bigint comment '累积差评数',
  36. appraise_default_count bigint comment '累积默认评价数'
  37. )COMMENT '商品主题宽表'
  38. stored as parquet
  39. location '/warehouse/gmall/dwt/dwt_sku_topic/'
  40. tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
  1. insert overwrite table dwt_sku_topic
  2. select
  3. nvl(new.sku_id,old.sku_id), sku_info.spu_id,
  4. nvl(new.order_count30,0),
  5. nvl(new.order_num30,0),
  6. nvl(new.order_amount30,0),
  7. nvl(old.order_count,0) + nvl(new.order_count,0),
  8. nvl(old.order_num,0) + nvl(new.order_num,0),
  9. nvl(old.order_amount,0) + nvl(new.order_amount,0),
  10. nvl(new.payment_count30,0),
  11. nvl(new.payment_num30,0),
  12. nvl(new.payment_amount30,0),
  13. nvl(old.payment_count,0) + nvl(new.payment_count,0),
  14. nvl(old.payment_num,0) + nvl(new.payment_count,0),
  15. nvl(old.payment_amount,0) + nvl(new.payment_count,0),
  16. nvl(new.refund_count30,0),
  17. nvl(new.refund_num30,0),
  18. nvl(new.refund_amount30,0),
  19. nvl(old.refund_count,0) + nvl(new.refund_count,0),
  20. nvl(old.refund_num,0) + nvl(new.refund_num,0),
  21. nvl(old.refund_amount,0) + nvl(new.refund_amount,0),
  22. nvl(new.cart_count30,0),
  23. nvl(new.cart_num30,0),
  24. nvl(old.cart_count,0) + nvl(new.cart_count,0),
  25. nvl(old.cart_num,0) + nvl(new.cart_num,0),
  26. nvl(new.favor_count30,0),
  27. nvl(old.favor_count,0) + nvl(new.favor_count,0),
  28. nvl(new.appraise_good_count30,0),
  29. nvl(new.appraise_mid_count30,0),
  30. nvl(new.appraise_bad_count30,0),
  31. nvl(new.appraise_default_count30,0) ,
  32. nvl(old.appraise_good_count,0) + nvl(new.appraise_good_count,0),
  33. nvl(old.appraise_mid_count,0) + nvl(new.appraise_mid_count,0),
  34. nvl(old.appraise_bad_count,0) + nvl(new.appraise_bad_count,0),
  35. nvl(old.appraise_default_count,0) + nvl(new.appraise_default_count,0)
  36. from
  37. (
  38. select
  39. sku_id,
  40. spu_id,
  41. order_last_30d_count,
  42. order_last_30d_num,
  43. order_last_30d_amount,
  44. order_count,
  45. order_num,
  46. order_amount ,
  47. payment_last_30d_count,
  48. payment_last_30d_num,
  49. payment_last_30d_amount,
  50. payment_count,
  51. payment_num,
  52. payment_amount,
  53. refund_last_30d_count,
  54. refund_last_30d_num,
  55. refund_last_30d_amount,
  56. refund_count,
  57. refund_num,
  58. refund_amount,
  59. cart_last_30d_count,
  60. cart_last_30d_num,
  61. cart_count,
  62. cart_num,
  63. favor_last_30d_count,
  64. favor_count,
  65. appraise_last_30d_good_count,
  66. appraise_last_30d_mid_count,
  67. appraise_last_30d_bad_count,
  68. appraise_last_30d_default_count,
  69. appraise_good_count,
  70. appraise_mid_count,
  71. appraise_bad_count,
  72. appraise_default_count
  73. from dwt_sku_topic
  74. )old
  75. full outer join
  76. (
  77. select
  78. sku_id,
  79. sum(if(dt='2020-03-10', order_count,0 )) order_count,
  80. sum(if(dt='2020-03-10',order_num ,0 )) order_num,
  81. sum(if(dt='2020-03-10',order_amount,0 )) order_amount ,
  82. sum(if(dt='2020-03-10',payment_count,0 )) payment_count,
  83. sum(if(dt='2020-03-10',payment_num,0 )) payment_num,
  84. sum(if(dt='2020-03-10',payment_amount,0 )) payment_amount,
  85. sum(if(dt='2020-03-10',refund_count,0 )) refund_count,
  86. sum(if(dt='2020-03-10',refund_num,0 )) refund_num,
  87. sum(if(dt='2020-03-10',refund_amount,0 )) refund_amount,
  88. sum(if(dt='2020-03-10',cart_count,0 )) cart_count,
  89. sum(if(dt='2020-03-10',cart_num,0 )) cart_num,
  90. sum(if(dt='2020-03-10',favor_count,0 )) favor_count,
  91. sum(if(dt='2020-03-10',appraise_good_count,0 )) appraise_good_count,
  92. sum(if(dt='2020-03-10',appraise_mid_count,0 ) ) appraise_mid_count ,
  93. sum(if(dt='2020-03-10',appraise_bad_count,0 )) appraise_bad_count,
  94. sum(if(dt='2020-03-10',appraise_default_count,0 )) appraise_default_count,
  95. sum(order_count) order_count30 ,
  96. sum(order_num) order_num30,
  97. sum(order_amount) order_amount30,
  98. sum(payment_count) payment_count30,
  99. sum(payment_num) payment_num30,
  100. sum(payment_amount) payment_amount30,
  101. sum(refund_count) refund_count30,
  102. sum(refund_num) refund_num30,
  103. sum(refund_amount) refund_amount30,
  104. sum(cart_count) cart_count30,
  105. sum(cart_num) cart_num30,
  106. sum(favor_count) favor_count30,
  107. sum(appraise_good_count) appraise_good_count30,
  108. sum(appraise_mid_count) appraise_mid_count30,
  109. sum(appraise_bad_count) appraise_bad_count30,
  110. sum(appraise_default_count) appraise_default_count30
  111. from dws_sku_action_daycount
  112. where dt >= date_add ('2020-03-10', -30)
  113. group by sku_id
  114. )new
  115. on new.sku_id = old.sku_id
  116. left join
  117. (select * from dwd_dim_sku_info where dt='2020-03-10') sku_info
  118. on nvl(new.sku_id,old.sku_id)= sku_info.id;
复制代码
3)查询加载结果
  1. select * from dwt_sku_topic limit 5;
复制代码
2.4 DWT 层数据导入脚本
1)vim dws_to_dwt.sh
在脚本中填写如下内容



  1. #!/bin/bash
  2. APP=gmall
  3. hive=/opt/modules/hive/bin/hive
  4. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  5. if [ -n "$1" ] ;then
  6. do_date=$1
  7. else
  8. do_date=`date -d "-1 day" +%F`
  9. fi
  10. sql="
  11. insert overwrite table ${APP}.dwt_uv_topic
  12. select
  13. nvl(new.mid_id,old.mid_id),
  14. nvl(new.user_id,old.user_id),
  15. nvl(new.version_code,old.version_code),
  16. nvl(new.version_name,old.version_name),
  17. nvl(new.lang,old.lang),
  18. nvl(new.source,old.source),
  19. nvl(new.os,old.os),
  20. nvl(new.area,old.area),
  21. nvl(new.model,old.model),
  22. nvl(new.brand,old.brand),
  23. nvl(new.sdk_version,old.sdk_version),
  24. nvl(new.gmail,old.gmail),
  25. nvl(new.height_width,old.height_width),
  26. nvl(new.app_time,old.app_time),
  27. nvl(new.network,old.network),
  28. nvl(new.lng,old.lng),
  29. nvl(new.lat,old.lat),
  30. nvl(old.login_date_first,'$do_date'),
  31. if(new.login_count>0,'$do_date',old.login_date_last),
  32. nvl(new.login_count,0),
  33. nvl(new.login_count,0)+nvl(old.login_count,0)
  34. from
  35. (
  36. select
  37. *
  38. from ${APP}.dwt_uv_topic
  39. )old
  40. full outer join
  41. (
  42. select
  43. *
  44. from ${APP}.dws_uv_detail_daycount
  45. where dt='$do_date'
  46. )new
  47. on old.mid_id=new.mid_id;
  48. insert overwrite table ${APP}.dwt_user_topic
  49. select
  50. nvl(new.user_id,old.user_id),
  51. if(old.login_date_first is null and
  52. new.login_count>0,'$do_date',old.login_date_first),
  53. if(new.login_count>0,'$do_date',old.login_date_last),
  54. nvl(old.login_count,0)+if(new.login_count>0,1,0),
  55. nvl(new.login_last_30d_count,0),
  56. if(old.order_date_first is null and
  57. new.order_count>0,'$do_date',old.order_date_first),
  58. if(new.order_count>0,'$do_date',old.order_date_last),
  59. nvl(old.order_count,0)+nvl(new.order_count,0),
  60. nvl(old.order_amount,0)+nvl(new.order_amount,0),
  61. nvl(new.order_last_30d_count,0),
  62. nvl(new.order_last_30d_amount,0),
  63. if(old.payment_date_first is null and
  64. new.payment_count>0,'$do_date',old.payment_date_first),
  65. if(new.payment_count>0,'$do_date',old.payment_date_last),
  66. nvl(old.payment_count,0)+nvl(new.payment_count,0),
  67. nvl(old.payment_amount,0)+nvl(new.payment_amount,0),
  68. nvl(new.payment_last_30d_count,0),
  69. nvl(new.payment_last_30d_amount,0)
  70. from
  71. (
  72. select
  73. *
  74. from ${APP}.dwt_user_topic
  75. )old
  76. full outer join
  77. (
  78. select
  79. user_id,
  80. sum(if(dt='$do_date',login_count,0)) login_count,
  81. sum(if(dt='$do_date',order_count,0)) order_count,
  82. sum(if(dt='$do_date',order_amount,0)) order_amount,
  83. sum(if(dt='$do_date',payment_count,0)) payment_count,
  84. sum(if(dt='$do_date',payment_amount,0)) payment_amount,
  85. sum(if(order_count>0,1,0)) login_last_30d_count,
  86. sum(order_count) order_last_30d_count,
  87. sum(order_amount) order_last_30d_amount,
  88. sum(payment_count) payment_last_30d_count,
  89. sum(payment_amount) payment_last_30d_amount
  90. from ${APP}.dws_user_action_daycount
  91. where dt>=date_add( '$do_date',-30)
  92. group by user_id
  93. )new
  94. on old.user_id=new.user_id;
  95. with
  96. sku_act as
  97. (
  98. select
  99. sku_id,
  100. sum(if(dt='$do_date', order_count,0 )) order_count,
  101. sum(if(dt='$do_date',order_num ,0 )) order_num,
  102. sum(if(dt='$do_date',order_amount,0 )) order_amount ,
  103. sum(if(dt='$do_date',payment_count,0 )) payment_count,
  104. sum(if(dt='$do_date',payment_num,0 )) payment_num,
  105. sum(if(dt='$do_date',payment_amount,0 )) payment_amount,
  106. sum(if(dt='$do_date',refund_count,0 )) refund_count,
  107. sum(if(dt='$do_date',refund_num,0 )) refund_num,
  108. sum(if(dt='$do_date',refund_amount,0 )) refund_amount,
  109. sum(if(dt='$do_date',cart_count,0 )) cart_count,
  110. sum(if(dt='$do_date',cart_num,0 )) cart_num,
  111. sum(if(dt='$do_date',favor_count,0 )) favor_count,
  112. sum(if(dt='$do_date',appraise_good_count,0 )) appraise_good_count,
  113. sum(if(dt='$do_date',appraise_mid_count,0 ) ) appraise_mid_count ,
  114. sum(if(dt='$do_date',appraise_bad_count,0 )) appraise_bad_count,
  115. sum(if(dt='$do_date',appraise_default_count,0 )) appraise_default_count,
  116. sum( order_count ) order_count30 ,
  117. sum( order_num ) order_num30,
  118. sum(order_amount ) order_amount30,
  119. sum(payment_count ) payment_count30,
  120. sum(payment_num ) payment_num30,
  121. sum(payment_amount ) payment_amount30,
  122. sum(refund_count ) refund_count30,
  123. sum(refund_num ) refund_num30,
  124. sum(refund_amount ) refund_amount30,
  125. sum(cart_count ) cart_count30,
  126. sum(cart_num ) cart_num30,
  127. sum(favor_count ) favor_count30,
  128. sum(appraise_good_count ) appraise_good_count30,
  129. sum(appraise_mid_count ) appraise_mid_count30,
  130. sum(appraise_bad_count ) appraise_bad_count30,
  131. sum(appraise_default_count ) appraise_default_count30
  132. from ${APP}.dws_sku_action_daycount
  133. where dt>=date_add ( '$do_date',-30)
  134. group by sku_id
  135. ),
  136. sku_topic
  137. as
  138. (
  139. select
  140. sku_id,
  141. spu_id,
  142. order_last_30d_count,
  143. order_last_30d_num,
  144. order_last_30d_amount,
  145. order_count,
  146. order_num,
  147. order_amount ,
  148. payment_last_30d_count,
  149. payment_last_30d_num,
  150. payment_last_30d_amount,
  151. payment_count,
  152. payment_num,
  153. payment_amount,
  154. refund_last_30d_count,
  155. refund_last_30d_num,
  156. refund_last_30d_amount ,
  157. refund_count ,
  158. refund_num ,
  159. refund_amount ,
  160. cart_last_30d_count ,
  161. cart_last_30d_num ,
  162. cart_count ,
  163. cart_num ,
  164. favor_last_30d_count ,
  165. favor_count ,
  166. appraise_last_30d_good_count ,
  167. appraise_last_30d_mid_count ,
  168. appraise_last_30d_bad_count ,
  169. appraise_last_30d_default_count ,
  170. appraise_good_count ,
  171. appraise_mid_count ,
  172. appraise_bad_count ,
  173. appraise_default_count
  174. from ${APP}.dwt_sku_topic
  175. )
  176. insert overwrite table ${APP}.dwt_sku_topic
  177. select
  178. nvl(sku_act.sku_id,sku_topic.sku_id) ,
  179. sku_info.spu_id,
  180. nvl (sku_act.order_count30,0) ,
  181. nvl (sku_act.order_num30,0) ,
  182. nvl (sku_act.order_amount30,0) ,
  183. nvl(sku_topic.order_count,0)+ nvl (sku_act.order_count,0) ,
  184. nvl(sku_topic.order_num,0)+ nvl (sku_act.order_num,0) ,
  185. nvl(sku_topic.order_amount,0)+ nvl (sku_act.order_amount,0),
  186. nvl (sku_act.payment_count30,0),
  187. nvl (sku_act.payment_num30,0),
  188. nvl (sku_act.payment_amount30,0),
  189. nvl(sku_topic.payment_count,0)+ nvl (sku_act.payment_count,0) ,
  190. nvl(sku_topic.payment_num,0)+ nvl (sku_act.payment_count,0) ,
  191. nvl(sku_topic.payment_amount,0)+ nvl (sku_act.payment_count,0) ,
  192. nvl (refund_count30,0),
  193. nvl (sku_act.refund_num30,0),
  194. nvl (sku_act.refund_amount30,0),
  195. nvl(sku_topic.refund_count,0)+ nvl (sku_act.refund_count,0),
  196. nvl(sku_topic.refund_num,0)+ nvl (sku_act.refund_num,0),
  197. nvl(sku_topic.refund_amount,0)+ nvl (sku_act.refund_amount,0),
  198. nvl(sku_act.cart_count30,0) ,
  199. nvl(sku_act.cart_num30,0) ,
  200. nvl(sku_topic.cart_count ,0)+ nvl (sku_act.cart_count,0),
  201. nvl( sku_topic.cart_num ,0)+ nvl (sku_act.cart_num,0),
  202. nvl(sku_act.favor_count30 ,0) ,
  203. nvl (sku_topic.favor_count ,0)+ nvl (sku_act.favor_count,0),
  204. nvl (sku_act.appraise_good_count30 ,0) ,
  205. nvl (sku_act.appraise_mid_count30 ,0) ,
  206. nvl (sku_act.appraise_bad_count30 ,0) ,
  207. nvl (sku_act.appraise_default_count30 ,0) ,
  208. nvl (sku_topic.appraise_good_count ,0)+ nvl
  209. (sku_act.appraise_good_count,0) ,
  210. nvl (sku_topic.appraise_mid_count ,0)+ nvl
  211. (sku_act.appraise_mid_count,0) ,
  212. nvl (sku_topic.appraise_bad_count ,0)+ nvl
  213. (sku_act.appraise_bad_count,0) ,
  214. nvl (sku_topic.appraise_default_count ,0)+ nvl
  215. (sku_act.appraise_default_count,0)
  216. from sku_act
  217. full outer join sku_topic
  218. on sku_act.sku_id =sku_topic.sku_id
  219. left join
  220. (select * from ${APP}.dwd_dim_sku_info where dt='$do_date') sku_info
  221. on nvl(sku_topic.sku_id,sku_act.sku_id)= sku_info.id;
  222. "
  223. $hive -e "$sql"
复制代码

2)增加脚本执行权限
  1. chmod 770 dws_to_dwt.sh
复制代码

3)执行脚本导入数据
  1. dws_to_dwt.sh 2020-03-11
复制代码

4)查看导入数据
  1. select * from dwt_uv_topic limit 5;
  2. select * from dwt_user_topic limit 5;
  3. select * from dwt_sku_topic limit 5;
复制代码


获取更多资源:
领取100本书+1T资源
http://www.aboutyun.com/forum.php?mod=viewthread&tid=26480

大数据5个项目视频
http://www.aboutyun.com/forum.php?mod=viewthread&tid=25235

名企资源、名企面试题、最新BAT面试题、专题面试题等资源汇总
https://www.aboutyun.com/forum.php?mod=viewthread&tid=27732


名企资源、名企面试题、最新BAT面试题、专题面试题等资源汇总
https://www.aboutyun.com/forum.php?mod=viewthread&tid=27732


原文链接
https://blog.csdn.net/qq_43733123/article/details/105943032

已有(2)人评论

跳转到指定楼层
为梦狂野 发表于 2020-9-22 13:27:24
博主码字不易,谢谢分享。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条