分享

大数据项目之电商数仓(总结)(七):系统业务数据仓库

问题导读:
1、如何统计商品每日销量排行Top10?
2、如何统计每个月订单付款率?
3、如何设置数据可视化漏斗分析?
4、如何统计全国商品销售情况?


上一篇:大数据项目之电商数仓(总结)(六):系统业务数据仓库

第11章 需求八:商品每日销量排行Top10
11.1 DWS层
使用用户购买商品明细宽表作为DWS数据

11.2 ADS层
11.2.1 建表语句
  1. drop table if exists ads_goods_order_count_day;
  2. create external table ads_goods_order_count_day(   
  3.     dt string comment '统计日期',
  4.     sku_id   string  comment '商品id',
  5.     order_count bigint comment '下单次数'
  6. ) COMMENT '商品下单top10'
  7. stored as parquet
  8. location '/warehouse/gmall/dws/ads_goods_order_count_day/';
复制代码




11.2.2 导入数据
  1. insert into table ads_goods_order_count_day
  2. select
  3.     '2019-02-10',
  4.     sku_id,
  5.     sum(order_count) order_totla_count
  6. from
  7.     dws_sale_detail_daycount
  8. where
  9.     dt='2019-02-10'
  10. group by
  11.     sku_id
  12. order by
  13.     order_totla_count
  14. desc
  15. limit 10;
复制代码




11.2.3 导入数据脚本
1)在/home/kgg/bin目录下创建脚本ads_goods_order_count_day.sh
  1. [kgg@hadoop102 bin]$ vim ads_goods_order_count_day.sh
  2. 在脚本中填写如下内容
  3. #!/bin/bash
  4. # 定义变量方便修改
  5. APP=gmall
  6. hive=/opt/module/hive/bin/hive
  7. hadoop=/opt/module/hadoop-2.7.2/bin/hadoop
  8. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  9. if [ -n "$1" ] ;then
  10.    do_date=$1
  11. else
  12.    do_date=`date -d "-1 day" +%F`
  13. fi
  14. echo "===日志日期为 $do_date==="
  15. sql="
  16. insert into table "$APP".ads_goods_order_count_day
  17. select
  18.     '$do_date',
  19.     sku_id,
  20.     sum(order_count) order_totla_count
  21. from
  22.     "$APP".dws_sale_detail_daycount
  23. where
  24.     dt='$do_date'
  25. group by
  26.     sku_id
  27. order by
  28.     order_totla_count
  29. desc
  30. limit 10;
  31. "
  32. $hive -e "$sql"
复制代码

2)增加脚本执行权限
  1. [kgg@hadoop102 bin]$ chmod 777 ads_goods_order_count_day.sh
复制代码

3)脚本使用
  1. [kgg@hadoop102 module]$ ads_goods_order_count_day.sh 2019-02-20
复制代码

4)查询结果
  1. hive (gmall)> select * from ads_goods_order_count_day;
复制代码

5)脚本执行时间
  1. 企业开发中一般在每天凌晨30分~1点
复制代码



第12章 需求九:统计每个月订单付款率

12.1 DWS层
采用用户行为宽表作为DWS层

12.2 ADS层
12.2.1 建表语句
  1. <div>drop table if exists ads_order2pay_mn;
  2. create external  table ads_order2pay_mn (
  3.     `dt` string COMMENT '统计日期',
  4.     `order_u_count` bigint     COMMENT '下单人数',
  5.     `payment_u_count` bigint     COMMENT '支付人数',
  6.     `order2payment_convert_ratio` decimal(10,2) COMMENT '下单到支付的转化率'
  7. ) COMMENT ''
  8. row format delimited  fields terminated by '\t'
  9. location '/warehouse/gmall/ads/ ads_order2pay_mn /';</div>
复制代码


12.2.2 导入数据
  1. insert into table ads_order2pay_mn
  2. select
  3.     '2019-02-10',
  4.     ua.order_count,
  5.     ua.payment_count,
  6.     cast(ua.payment_count/ua.order_count as  decimal(10,2)) order2payment_convert_ratio
  7. from  
  8. (
  9.     select
  10.         dt,
  11.         sum(if(order_count>0,1,0)) order_count,
  12.         sum(if(payment_count>0,1,0)) payment_count
  13.     from
  14.         dws_user_action
  15.     where
  16.         date_format(dt,'yyyy-MM')= date_format('2019-02-10','yyyy-MM')
  17.     group by dt
  18. )ua;
复制代码


12.2.3 导入数据脚本
1)在/home/kgg/bin目录下创建脚本ads_order2pay_mn.sh
  1. [kgg@hadoop102 bin]$ vim ads_order2pay_mn.sh
  2. 在脚本中填写如下内容
  3. #!/bin/bash
  4. # 定义变量方便修改
  5. APP=gmall
  6. hive=/opt/module/hive/bin/hive
  7. hadoop=/opt/module/hadoop-2.7.2/bin/hadoop
  8. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  9. if [ -n "$1" ] ;then
  10.    do_date=$1
  11. else
  12.    do_date=`date -d "-1 day" +%F`
  13. fi
  14. echo "===日志日期为 $do_date==="
  15. sql="
  16. insert into table "$APP".ads_order2pay_mn
  17. select
  18.     '$do_date',
  19.     ua.order_count,
  20.     ua.payment_count,
  21.     cast(ua.payment_count/ua.order_count as  decimal(10,2)) order2payment_convert_ratio
  22. from  
  23. (
  24. select
  25.         dt,
  26.         sum(if(order_count>0,1,0)) order_count,
  27.         sum(if(payment_count>0,1,0)) payment_count
  28. from
  29. "$APP".dws_user_action
  30. where
  31. date_format(dt,'yyyy-MM')=date_format('$do_date','yyyy-MM')
  32. group by dt
  33. )ua;
  34. "
  35. $hive -e "$sql"
复制代码

2)增加脚本执行权限
  1. [kgg@hadoop102 bin]$ chmod 777 ads_order2pay_mn.sh
复制代码

3)脚本使用
  1. [kgg@hadoop102 module]$ ads_order2pay_mn.sh 2019-02-20
复制代码

4)查询结果
  1. hive (gmall)> select * from ads_order2pay_mn;
复制代码

5)脚本执行时间
  1. 企业开发中一般在每天凌晨30分~1点
复制代码



第13章 数据可视化
13.1 在MySQL中创建表
13.1.1 每日活跃统计


微信截图_20210126194343.png

1)在MySQL中创建ads_uv_count表
  1. DROP TABLE IF EXISTS `ads_uv_count`;
  2. CREATE TABLE `ads_uv_count`  (
  3.   `dt` varchar(255) DEFAULT NULL COMMENT '统计日期',
  4.   `day_count` bigint(200) DEFAULT NULL COMMENT '当日用户数量',
  5.   `wk_count` bigint(200) DEFAULT NULL COMMENT '当周用户数量',
  6.   `mn_count` bigint(200) DEFAULT NULL COMMENT '当月用户数量',
  7.   `is_weekend` varchar(200) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL COMMENT 'Y,N是否是周末,用于得到本周最终结果',
  8.   `is_monthend` varchar(200) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL COMMENT 'Y,N是否是月末,用于得到本月最终结果'
  9. ) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '每日活跃用户数量' ROW_FORMAT = Dynamic;
复制代码


2)向MySQL中插入如下数据
  1. INSERT INTO `ads_uv_count` VALUES ('2019-03-01 14:10:04', 20, 30, 100, 'N', 'N');
  2. INSERT INTO `ads_uv_count` VALUES ('2019-03-02 14:12:48', 35, 50, 100, 'N', 'N');
  3. INSERT INTO `ads_uv_count` VALUES ('2019-03-03 14:14:07', 25, 640, 3300, 'Y', 'Y');
  4. INSERT INTO `ads_uv_count` VALUES ('2019-03-04 14:14:14', 10, 23, 123, 'N', 'N');
  5. INSERT INTO `ads_uv_count` VALUES ('2019-03-05 14:14:21', 80, 121, 131, 'N', 'N');
  6. INSERT INTO `ads_uv_count` VALUES ('2019-03-06 14:14:38', 30, 53, 453, 'N', 'N');
  7. INSERT INTO `ads_uv_count` VALUES ('2019-03-07 14:33:27', 20, 31, 453, 'N', 'N');
  8. INSERT INTO `ads_uv_count` VALUES ('2019-03-08 14:33:39', 10, 53, 453, 'N', 'N');
  9. INSERT INTO `ads_uv_count` VALUES ('2019-03-09 14:33:47', 10, 34, 453, 'N', 'N');
  10. INSERT INTO `ads_uv_count` VALUES ('2019-03-10 14:33:54', 10, 653, 8453, 'Y', 'Y');
  11. INSERT INTO `ads_uv_count` VALUES ('2019-03-11 14:34:04', 100, 453, 1453, 'N', 'N');
  12. INSERT INTO `ads_uv_count` VALUES ('2019-03-12 14:34:10', 101, 153, 134, 'N', 'N');
  13. INSERT INTO `ads_uv_count` VALUES ('2019-03-13 14:34:16', 100, 286, 313, 'N', 'N');
  14. INSERT INTO `ads_uv_count` VALUES ('2019-03-14 14:34:22', 100, 45, 453, 'N', 'N');
  15. INSERT INTO `ads_uv_count` VALUES ('2019-03-15 14:34:29', 100, 345, 3453, 'N', 'N');
  16. INSERT INTO `ads_uv_count` VALUES ('2019-03-16 14:34:35', 101, 453, 453, 'N', 'N');
  17. INSERT INTO `ads_uv_count` VALUES ('2019-03-17 14:34:41', 100, 678, 9812, 'Y', 'Y');
  18. INSERT INTO `ads_uv_count` VALUES ('2019-03-18 14:34:46', 100, 186, 193, 'N', 'N');
  19. INSERT INTO `ads_uv_count` VALUES ('2019-03-19 14:34:53', 453, 686, 712, 'N', 'N');
  20. INSERT INTO `ads_uv_count` VALUES ('2019-03-20 14:34:57', 452, 786, 823, 'N', 'N');
  21. INSERT INTO `ads_uv_count` VALUES ('2019-03-21 14:35:02', 214, 58, 213, 'N', 'N');
  22. INSERT INTO `ads_uv_count` VALUES ('2019-03-22 14:35:08', 76, 78, 95, 'N', 'N');
  23. INSERT INTO `ads_uv_count` VALUES ('2019-03-23 14:35:13', 76, 658, 745, 'N', 'N');
  24. INSERT INTO `ads_uv_count` VALUES ('2019-03-24 14:35:19', 76, 687, 9300, 'Y', 'Y');
  25. INSERT INTO `ads_uv_count` VALUES ('2019-03-25 14:35:25', 76, 876, 923, 'N', 'N');
  26. INSERT INTO `ads_uv_count` VALUES ('2019-03-26 14:35:30', 76, 456, 511, 'N', 'N');
  27. INSERT INTO `ads_uv_count` VALUES ('2019-03-27 14:35:35', 76, 456, 623, 'N', 'N');
  28. INSERT INTO `ads_uv_count` VALUES ('2019-03-28 14:35:41', 43, 753, 4000, 'N', 'N');
  29. INSERT INTO `ads_uv_count` VALUES ('2019-03-29 14:35:47', 76, 876, 4545, 'N', 'N');
  30. INSERT INTO `ads_uv_count` VALUES ('2019-03-30 14:35:57', 76, 354, 523, 'N', 'N');
  31. INSERT INTO `ads_uv_count` VALUES ('2019-03-31 14:36:02', 43, 634, 6213, 'Y', 'Y');
复制代码


13.1.2 留存率统计
1)在MySQL中创建ads_user_retention_day_rate表
  1. DROP TABLE IF EXISTS `ads_user_retention_day_rate`;
  2. CREATE TABLE `ads_user_retention_day_rate`  (
  3.   `stat_date` varchar(255)  DEFAULT NULL COMMENT '统计日期',
  4.   `create_date` varchar(255) DEFAULT NULL COMMENT '设备新增日期',
  5.   `retention_day` bigint(200) DEFAULT NULL COMMENT '截止当前日期留存天数',
  6.   `retention_count` bigint(200) DEFAULT NULL COMMENT '留存数量',
  7.   `new_mid_count` bigint(200) DEFAULT NULL COMMENT '当日设备新增数量',
  8.   `retention_ratio` decimal(10, 2) DEFAULT NULL COMMENT '留存率'
  9. ) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '每日用户留存情况' ROW_FORMAT = Dynamic;
复制代码
2)向MySQL中插入数据
  1. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-09','2019-03-08', 1,88,  99,  0.78);
  2. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-10','2019-03-08', 2,77,  88,  0.68);
  3. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-11','2019-03-08', 3,66,  77,  0.58);
  4. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-12','2019-03-08', 4,55,  66,  0.48);
  5. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-13','2019-03-08', 5,44,  55,  0.38);
  6. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-14','2019-03-08', 6,33,  44,  0.28);
  7. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-10','2019-03-09', 1,77,  88,  0.56);
  8. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-11','2019-03-09', 2,66,  77,  0.46);
  9. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-12','2019-03-09', 3,55,  66,  0.36);
  10. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-13','2019-03-09', 4,44,  55,  0.26);
  11. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-14','2019-03-09', 5,33,  44,  0.16);
  12. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-11','2019-03-10', 1,66,  77,  0.55);
  13. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-12','2019-03-10', 2,55,  66,  0.45);
  14. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-13','2019-03-10', 3,44,  55,  0.35);
  15. INSERT INTO `ads_user_retention_day_rate` VALUES ('2019-03-14','2019-03-10', 4,33,  44,  0.25);
复制代码

13.1.3 漏斗分析


微信截图_20210126194456.png

1)在MySQL中创建ads_user_action_convert_day表
  1. DROP TABLE IF EXISTS `ads_user_action_convert_day`;
  2. CREATE TABLE `ads_user_action_convert_day`  (
  3.   `dt` varchar(200) DEFAULT NULL COMMENT '统计日期',
  4.   `total_visitor_m_count` bigint(20) DEFAULT NULL COMMENT '总访问人数',
  5.   `order_u_count` bigint(20) DEFAULT NULL COMMENT '下单人数',
  6.   `visitor2order_convert_ratio` decimal(10, 2) DEFAULT NULL COMMENT '购物车到下单转化率',
  7.   `payment_u_count` bigint(20) DEFAULT NULL COMMENT '支付人数',
  8.   `order2payment_convert_ratio` decimal(10, 2) DEFAULT NULL COMMENT '下单到支付的转化率'
  9. ) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '每日用户行为转化率统计' ROW_FORMAT = Dynamic;
复制代码

2)向MySQL中插入数据
  1. INSERT INTO `ads_user_action_convert_day` VALUES ('2019-04-28 19:36:18', 10000, 3000, 0.25, 2000, 0.15);
复制代码

13.1.4 GMV统计
微信截图_20210126194537.png

1)在MySQL中创建ads_gmv_sum_day表
  1. DROP TABLE IF EXISTS ads_gmv_sum_day;
  2. CREATE TABLE ads_gmv_sum_day(
  3.   `dt` varchar(200) DEFAULT NULL COMMENT '统计日期',
  4.   `gmv_count` bigint(20) DEFAULT NULL COMMENT '当日gmv订单个数',
  5.   `gmv_amount` decimal(16, 2) DEFAULT NULL COMMENT '当日gmv订单总金额',
  6.   `gmv_payment` decimal(16, 2) DEFAULT NULL COMMENT '当日支付金额'
  7. ) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '每日活跃用户数量' ROW_FORMAT = Dynamic;
复制代码

2)向MySQL中插入数据
  1. INSERT INTO `ads_gmv_sum_day` VALUES ('2019-03-01 22:51:37', 1000, 210000.00, 2000.00);
  2. INSERT INTO `ads_gmv_sum_day` VALUES ('2019-05-08 22:52:32', 3434, 12413.00, 1.00);
  3. INSERT INTO `ads_gmv_sum_day` VALUES ('2019-07-13 22:52:51', 1222, 324345.00, 1.00);
  4. INSERT INTO `ads_gmv_sum_day` VALUES ('2019-09-13 22:53:08', 2344, 12312.00, 1.00);
复制代码

13.1.5 全国商品销售
微信截图_20210126194621.png

1)在MySQL中创建ads_gmv_sum_province表
  1. DROP TABLE IF EXISTS `ads_gmv_sum_province`;
  2. CREATE TABLE `ads_gmv_sum_province`  (
  3.   `province` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  4.   `gmv` bigint(255) DEFAULT NULL,
  5.   `remark` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL
  6. ) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
复制代码


2)向MySQL中插入数据
  1. INSERT INTO `ads_gmv_sum_province` VALUES ('北京', 2000, '');
  2. INSERT INTO `ads_gmv_sum_province` VALUES ('辽宁', 30000, '沈阳:21.1%,大连:20%,鞍山:35%');
  3. INSERT INTO `ads_gmv_sum_province` VALUES ('浙江', 8002, '杭州:20%,舟山:50%');
复制代码


13.2 WEB页面查看
1)运行spring-boot-echarts-master程序
2)在web页面上查看显示结果
  1. http://localhost:8080/active
复制代码


13.3 Sqoop导出脚本
1)编写Sqoop导出脚本
在/home/kgg/bin目录下创建脚本sqoop_export.sh
  1. [kgg@hadoop102 bin]$ vim sqoop_export.sh
  2.     在脚本中填写如下内容
  3. #!/bin/bash
  4. db_name=gmall
  5. export_data() {
  6. /opt/module/sqoop/bin/sqoop export \
  7. --connect "jdbc:mysql://hadoop102:3306/${db_name}?useUnicode=true&characterEncoding=utf-8"  \
  8. --username root \
  9. --password 000000 \
  10. --table $1 \
  11. --num-mappers 1 \
  12. --export-dir /warehouse/$db_name/ads/$1 \
  13. --input-fields-terminated-by "\t" \
  14. --update-mode allowinsert \
  15. --update-key $2 \
  16. --input-null-string '\\N'    \
  17. --input-null-non-string '\\N'
  18. }
  19. case $1 in
  20.   "ads_uv_count")
  21.      export_data "ads_uv_count" "dt"
  22. ;;
  23.   "ads_user_action_convert_day")
  24.      export_data "ads_user_action_convert_day" "dt"
  25. ;;
  26.   "ads_gmv_sum_day")
  27.      export_data "ads_gmv_sum_day" "dt"
  28. ;;
  29.    "all")
  30.      export_data "ads_uv_count" "dt"
  31.      export_data "ads_user_action_convert_day" "dt"
  32.      export_data "ads_gmv_sum_day" "dt"
  33. ;;
  34. esac
复制代码


关于导出update还是insert的问题
--update-mode:
updateonly   只更新,无法插入新数据
allowinsert   允许新增
--update-key:允许更新的情况下,指定哪些字段匹配视为同一条数据,进行更新而不增加。多个字段用逗号分隔。
--input-null-string和--input-null-non-string:
分别表示,将字符串列和非字符串列的空串和“null”转义。
官网地址:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html
Sqoop will by default import NULL values as string null. Hive is however using string \N to denote NULL values and therefore predicates dealing with NULL(like IS NULL) will not work correctly. You should append parameters --null-string and --null-non-string in case of import job or --input-null-string and --input-null-non-string in case of an export job if you wish to properly preserve NULL values. Because sqoop is using those parameters in generated code, you need to properly escape value \N to \\N:
Hive中的Null在底层是以“\N”来存储,而MySQL中的Null在底层就是Null,为了保证数据两端的一致性。在导出数据时采用--input-null-string和--input-null-non-string两个参数。导入数据时采用--null-string和--null-non-string。

3)执行Sqoop导出脚本
  1. [kgg@hadoop102 bin]$ chmod 777 sqoop_export.sh
  2. [kgg@hadoop102 bin]$ sqoop_export.sh all
复制代码

4)在MySQL中查看结果
  1. SELECT * FROM ads_uv_count;
  2. SELECT * FROM ads_user_retention_day_rate;
  3. SELECT * FROM ads_user_action_convert_day;
  4. SELECT * FROM ads_gmv_sum_day;
  5. SELECT * FROM ads_gmv_sum_province;
复制代码

最新经典文章,欢迎关注公众号



没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条