分享

大数据项目之电商数仓(总结)(七):用户行为数据仓库

问题导读:
1、如何设计DWD层启动表数据解析?
2、如何设计DWD层事件表数据解析?
3、如何设计DWD层事件表获取?
4、如何设计错误日志表?


上一篇:大数据项目之电商数仓(总结)(六):用户行为数据仓库

第4章 数仓搭建之DWD层
对ODS层数据进行清洗(去除空值,脏数据,超过极限范围的数据,行式存储改为列存储,改压缩格式)。

4.1 DWD层启动表数据解析
4.1.1 创建启动表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_start_log;
  3. CREATE EXTERNAL TABLE dwd_start_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,  
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `entry` string,
  22. `open_ad_type` string,
  23. `action` string,
  24. `loading_time` string,
  25. `detail` string,
  26. `extend1` string
  27. )
  28. PARTITIONED BY (dt string)
  29. stored as parquet
  30. location '/warehouse/gmall/dwd/dwd_start_log/'
  31. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


4.1.2 向启动表导入数据
  1. hive (gmall)>
  2. insert overwrite table dwd_start_log
  3. PARTITION (dt='2020-10-14')
  4. select
  5.     get_json_object(line,'$.mid') mid_id,
  6.     get_json_object(line,'$.uid') user_id,
  7.     get_json_object(line,'$.vc') version_code,
  8.     get_json_object(line,'$.vn') version_name,
  9.     get_json_object(line,'$.l') lang,
  10.     get_json_object(line,'$.sr') source,
  11.     get_json_object(line,'$.os') os,
  12.     get_json_object(line,'$.ar') area,
  13.     get_json_object(line,'$.md') model,
  14.     get_json_object(line,'$.ba') brand,
  15.     get_json_object(line,'$.sv') sdk_version,
  16.     get_json_object(line,'$.g') gmail,
  17.     get_json_object(line,'$.hw') height_width,
  18.     get_json_object(line,'$.t') app_time,
  19.     get_json_object(line,'$.nw') network,
  20.     get_json_object(line,'$.ln') lng,
  21.     get_json_object(line,'$.la') lat,
  22.     get_json_object(line,'$.entry') entry,
  23.     get_json_object(line,'$.open_ad_type') open_ad_type,
  24.     get_json_object(line,'$.action') action,
  25.     get_json_object(line,'$.loading_time') loading_time,
  26.     get_json_object(line,'$.detail') detail,
  27.     get_json_object(line,'$.extend1') extend1
  28. from ods_start_log
  29. where dt='2020-10-14';
复制代码


3)测试
  1. hive (gmall)> select * from dwd_start_log limit 2;
复制代码


4.1.3 DWD层启动表加载数据脚本
1)在hadoop102的/home/kgg/bin目录下创建脚本
  1. [kgg@hadoop102 bin]$ vim dwd_start_log.sh
  2.     在脚本中编写如下内容
  3. #!/bin/bash
  4. # 定义变量方便修改
  5. APP=gmall
  6. hive=/opt/module/hive/bin/hive
  7. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  8. if [ -n "$1" ] ;then
  9.     do_date=$1
  10. else
  11.     do_date=`date -d "-1 day" +%F`  
  12. fi
  13. echo "===日志日期为 $do_date==="
  14. sql="
  15. set hive.exec.dynamic.partition.mode=nonstrict;
  16. insert overwrite table "$APP".dwd_start_log
  17. PARTITION (dt='$do_date')
  18. select
  19.     get_json_object(line,'$.mid') mid_id,
  20.     get_json_object(line,'$.uid') user_id,
  21.     get_json_object(line,'$.vc') version_code,
  22.     get_json_object(line,'$.vn') version_name,
  23.     get_json_object(line,'$.l') lang,
  24.     get_json_object(line,'$.sr') source,
  25.     get_json_object(line,'$.os') os,
  26.     get_json_object(line,'$.ar') area,
  27.     get_json_object(line,'$.md') model,
  28.     get_json_object(line,'$.ba') brand,
  29.     get_json_object(line,'$.sv') sdk_version,
  30.     get_json_object(line,'$.g') gmail,
  31.     get_json_object(line,'$.hw') height_width,
  32.     get_json_object(line,'$.t') app_time,
  33.     get_json_object(line,'$.nw') network,
  34.     get_json_object(line,'$.ln') lng,
  35.     get_json_object(line,'$.la') lat,
  36.     get_json_object(line,'$.entry') entry,
  37.     get_json_object(line,'$.open_ad_type') open_ad_type,
  38.     get_json_object(line,'$.action') action,
  39.     get_json_object(line,'$.loading_time') loading_time,
  40.     get_json_object(line,'$.detail') detail,
  41.     get_json_object(line,'$.extend1') extend1
  42. from "$APP".ods_start_log
  43. where dt='$do_date';"
  44. $hive -e "$sql"
复制代码


2)增加脚本执行权限
  1. [kgg@hadoop102 bin]$ chmod 777 dwd_start_log.sh
复制代码


3)脚本使用
  1. [kgg@hadoop102 module]$ dwd_start_log.sh 2019-02-11
复制代码


4)查询导入结果
  1. hive (gmall)>
  2. select * from dwd_start_log where dt='2019-02-11' limit 2;
复制代码


5)脚本执行时间
企业开发中一般在每日凌晨30分~1点

4.2 DWD层事件表数据解析
4.2.1 创建基础明细表
明细表用于存储ODS层原始表转换过来的明细数据。

2020-12-02_203109.jpg

1)创建事件日志基础明细表
  1. hive (gmall)>
  2. drop table if exists dwd_base_event_log;
  3. CREATE EXTERNAL TABLE dwd_base_event_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `event_name` string,
  22. `event_json` string,
  23. `server_time` string)
  24. PARTITIONED BY (`dt` string)
  25. stored as parquet
  26. location '/warehouse/gmall/dwd/dwd_base_event_log/'
  27. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)说明:其中event_name和event_json用来对应事件名和整个事件。这个地方将原始日志1对多的形式拆分出来了。操作的时候我们需要将原始日志展平,需要用到UDF和UDTF。

4.2.2 自定义UDF函数(解析公共字段)
2020-12-02_203143.jpg

1)创建一个maven工程:hivefunction
2)创建包名:com.kgg.udf
3)在pom.xml文件中添加如下内容
  1. <properties>
  2.     <project.build.sourceEncoding>UTF8</project.build.sourceEncoding>
  3.     <hive.version>1.2.1</hive.version>
  4. </properties>
  5. <dependencies>
  6.     <!--添加hive依赖-->
  7.     <dependency>
  8.         <groupId>org.apache.hive</groupId>
  9.         <artifactId>hive-exec</artifactId>
  10.         <version>${hive.version}</version>
  11.     </dependency>
  12. </dependencies>
  13. <build>
  14.     <plugins>
  15.         <plugin>
  16.             <artifactId>maven-compiler-plugin</artifactId>
  17.             <version>2.3.2</version>
  18.             <configuration>
  19.                 <source>1.8</source>
  20.                 <target>1.8</target>
  21.             </configuration>
  22.         </plugin>
  23.         <plugin>
  24.             <artifactId>maven-assembly-plugin</artifactId>
  25.             <configuration>
  26.                 <descriptorRefs>
  27.                     <descriptorRef>jar-with-dependencies</descriptorRef>
  28.                 </descriptorRefs>
  29.             </configuration>
  30.             <executions>
  31.                 <execution>
  32.                     <id>make-assembly</id>
  33.                     <phase>package</phase>
  34.                     <goals>
  35.                         <goal>single</goal>
  36.                     </goals>
  37.                 </execution>
  38.             </executions>
  39.         </plugin>
  40.     </plugins>
  41. </build>
复制代码


4)UDF用于解析公共字段
  1. package com.kgg.udf;
  2. import org.apache.commons.lang.StringUtils;
  3. import org.apache.hadoop.hive.ql.exec.UDF;
  4. import org.json.JSONException;
  5. import org.json.JSONObject;
  6. public class BaseFieldUDF extends UDF {
  7.     public String evaluate(String line, String key) throws JSONException {
  8.         // 1 处理line   服务器时间 | json
  9.         String[] log = line.split("\\|");
  10.         //2 合法性校验
  11.         if (log.length != 2 || StringUtils.isBlank(log[1])) {
  12.             return "";
  13.         }
  14.         // 3 开始处理json
  15.         JSONObject baseJson = new JSONObject(log[1].trim());
  16.         String result = "";
  17.         // 4 根据传进来的key查找相应的value
  18.         if ("et".equals(key)) {
  19.             if (baseJson.has("et")) {
  20.                 result = baseJson.getString("et");
  21.             }
  22.         } else if ("st".equals(key)) {
  23.             result = log[0].trim();
  24.         } else {
  25.             JSONObject cm =“” baseJson.getJSONObject("cm");
  26.             if (cm.has(key)) {
  27.                 result = cm.getString(key);
  28.             }
  29.         }
  30.         return result;
  31.     }
  32.     public static void main(String[] args) throws JSONException {
  33.         String line = "1541217850324|{"cm":{"mid":"m7856","uid":"u8739","ln":"-74.8","sv":"V2.2.2","os":"8.1.3","g":"P7XC9126@gmail.com","nw":"3G","l":"es","vc":"6","hw":"640*960","ar":"MX","t":"1541204134250","la":"-31.7","md":"huawei-17","vn":"1.1.2","sr":"O","ba":"Huawei"},"ap":"weather","et":[{"ett":"1541146624055","en":"display","kv":{"goodsid":"n4195","copyright":"ESPN","content_provider":"CNN","extend2":"5","action":"2","extend1":"2","place":"3","showtype":"2","category":"72","newstype":"5"}},{"ett":"1541213331817","en":"loading","kv":{"extend2":"","loading_time":"15","action":"3","extend1":"","type1":"","type":"3","loading_way":"1"}},{"ett":"1541126195645","en":"ad","kv":{"entry":"3","show_style":"0","action":"2","detail":"325","source":"4","behavior":"2","content":"1","newstype":"5"}},{"ett":"1541202678812","en":"notification","kv":{"ap_time":"1541184614380","action":"3","type":"4","content":""}},{"ett":"1541194686688","en":"active_background","kv":{"active_source":"3"}}]}";
  34.         String x = new BaseFieldUDF().evaluate(line, "mid");
  35.         System.out.println(x);
  36.     }
  37. }
复制代码


注意:使用main函数主要用于模拟数据测试。
4.2.3 自定义UDTF函数(解析具体事件字段)
2020-12-02_203218.jpg
1)创建包名:com.kgg.udtf
2)在com.kgg.udtf包下创建类名:EventJsonUDTF
3)用于展开业务字段
  1. package com.kgg.udtf;
  2. import org.apache.commons.lang.StringUtils;
  3. import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
  4. import org.apache.hadoop.hive.ql.metadata.HiveException;
  5. import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
  6. import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
  7. import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
  8. import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
  9. import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
  10. import org.json.JSONArray;
  11. import org.json.JSONException;
  12. import java.util.ArrayList;
  13. public class EventJsonUDTF extends GenericUDTF {
  14.     //该方法中,我们将指定输出参数的名称和参数类型:
  15.     @Override
  16.     public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
  17.         ArrayList<String> fieldNames = new ArrayList<String>();
  18.         ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
  19.         fieldNames.add("event_name");
  20.         fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
  21.         fieldNames.add("event_json");
  22.         fieldOIs.add(PrimitiveObjectInspectorFactor    y.javaStringObjectInspector);
  23.         return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
  24.     }
  25.     //输入1条记录,输出若干条结果
  26.     @Override
  27.     public void process(Object[] objects) throws HiveException {
  28.         // 获取传入的et
  29.         String input = objects[0].toString();
  30.         // 如果传进来的数据为空,直接返回过滤掉该数据
  31.         if (StringUtils.isBlank(input)) {
  32.             return;
  33.         } else {
  34.             try {
  35.                 // 获取一共有几个事件(ad/facoriters)
  36.                 JSONArray ja = new JSONArray(input);
  37.                 if (ja == null)
  38.                     return;
  39.                 // 循环遍历每一个事件
  40.                 for (int i = 0; i < ja.length(); i++) {
  41.                     String[] result = new String[2];
  42.                     try {
  43.                         // 取出每个的事件名称(ad/facoriters)
  44.                         result[0] = ja.getJSONObject(i).getString("en");
  45.                         // 取出每一个事件整体
  46.                         result[1] = ja.getString(i);
  47.                     } catch (JSONException e) {
  48.                         continue;
  49.                     }
  50.                     // 将结果返回
  51.                     forward(result);
  52.                 }
  53.             } catch (JSONException e) {
  54.                 e.printStackTrace();
  55.             }
  56.         }
  57.     }
  58.     //当没有记录处理的时候该方法会被调用,用来清理代码或者产生额外的输出
  59.     @Override
  60.     public void close() throws HiveException {
  61.     }
  62. }
复制代码


2)打包
3)将hivefunction-1.0-SNAPSHOT.jar上传到HDFS上的/user/hive/jars路径下
4)创建永久函数与开发好的java class关联
  1. hive (gmall)>
  2. create function base_analizer as 'com.kgg.udf.BaseFieldUDF' using jar 'hdfs://hadoop101:9000/user/hive/jars/hive-func-1.0-SNAPSHOT.jar';
  3. create function flat_analizer as 'com.kgg.udtf.EventJsonUDTF' using jar 'hdfs://hadoop101:9000/user/hive/jars/hive-func-1.0-SNAPSHOT.jar';
复制代码


4.2.4 解析事件日志基础明细表
1)解析事件日志基础明细表
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_base_event_log partition(dt='2020-10-14')
  4. select
  5.     base_analizer(line,'mid') as mid_id,
  6.     base_analizer(line,'uid') as user_id,
  7.     base_analizer(line,'vc') as version_code,
  8.     base_analizer(line,'vn') as version_name,
  9.     base_analizer(line,'l') as lang,
  10.     base_analizer(line,'sr') as source,
  11.     base_analizer(line,'os') as os,
  12.     base_analizer(line,'ar') as area,
  13.     base_analizer(line,'md') as model,
  14.     base_analizer(line,'ba') as brand,
  15.     base_analizer(line,'sv') as sdk_version,
  16.     base_analizer(line,'g') as gmail,
  17.     base_analizer(line,'hw') as height_width,
  18.     base_analizer(line,'t') as app_time,
  19.     base_analizer(line,'nw') as network,
  20.     base_analizer(line,'ln') as lng,
  21.     base_analizer(line,'la') as lat,
  22.     event_name,
  23.     event_json,
  24.     base_analizer(line,'st') as server_time
  25. from ods_event_log lateral view flat_analizer(base_analizer(line,'et')) tmp_flat as event_name,event_json
  26. where dt='2020-10-14' and base_analizer(line,'et')<>'';
复制代码


2)测试
  1. hive (gmall)> select * from dwd_base_event_log limit 2;
复制代码



4.2.5 DWD层数据解析脚本
1)在hadoop102的/home/kgg/bin目录下创建脚本
  1. [kgg@hadoop102 bin]$ vim dwd_base_log.sh
  2.     在脚本中编写如下内容
  3. #!/bin/bash
  4. # 定义变量方便修改
  5. APP=gmall
  6. hive=/opt/module/hive/bin/hive
  7. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  8. if [ -n "$1" ] ;then
  9.     do_date=$1
  10. else
  11.     do_date=`date -d "-1 day" +%F`  
  12. fi
  13. echo "===日志日期为 $do_date==="
  14. sql="
  15. use gmall;
  16. insert overwrite table "$APP".dwd_base_event_log partition(dt='$do_date')
  17. select
  18.     base_analizer(line,'mid') as mid_id,
  19.     base_analizer(line,'uid') as user_id,
  20.     base_analizer(line,'vc') as version_code,
  21.     base_analizer(line,'vn') as version_name,
  22.     base_analizer(line,'l') as lang,
  23.     base_analizer(line,'sr') as source,
  24.     base_analizer(line,'os') as os,
  25.     base_analizer(line,'ar') as area,
  26.     base_analizer(line,'md') as model,
  27.     base_analizer(line,'ba') as brand,
  28.     base_analizer(line,'sv') as sdk_version,
  29.     base_analizer(line,'g') as gmail,
  30.     base_analizer(line,'hw') as height_width,
  31.     base_analizer(line,'t') as app_time,
  32.     base_analizer(line,'nw') as network,
  33.     base_analizer(line,'ln') as lng,
  34.     base_analizer(line,'la') as lat,
  35.     event_name,
  36.     event_json,
  37.     base_analizer(line,'st') as server_time
  38. from "$APP".ods_event_log lateral view flat_analizer(base_analizer(line,'et')) tem_flat as event_name,event_json
  39. where dt='$do_date'  and base_analizer(line,'et')<>'';
  40. "
  41. $hive -e "$sql"
复制代码

2)增加脚本执行权限
  1. [kgg@hadoop102 bin]$ chmod 777 dwd_base_log.sh
复制代码

3)脚本使用
  1. [kgg@hadoop102 module]$ dwd_base_log.sh 2019-02-11
复制代码

4)查询导入结果
  1. hive (gmall)>
  2. select * from dwd_base_event_log where dt='2019-02-11' limit 2;
复制代码

5)脚本执行时间
企业开发中一般在每日凌晨30分~1点

4.3 DWD层事件表获取
2020-12-02_203252.jpg
4.3.1 商品点击表
2020-12-02_203317.jpg
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_display_log;
  3. CREATE EXTERNAL TABLE dwd_display_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `action` string,
  22. `goodsid` string,
  23. `place` string,
  24. `extend1` string,
  25. `category` string,
  26. `server_time` string
  27. )
  28. PARTITIONED BY (dt string)
  29. stored as parquet
  30. location '/warehouse/gmall/dwd/dwd_display_log/'
  31. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_display_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.action') action,
  24. get_json_object(event_json,'$.kv.goodsid') goodsid,
  25. get_json_object(event_json,'$.kv.place') place,
  26. get_json_object(event_json,'$.kv.extend1') extend1,
  27. get_json_object(event_json,'$.kv.category') category,
  28. server_time
  29. from dwd_base_event_log
  30. where dt='2020-10-14' and event_name='display';
复制代码


3)测试
  1. hive (gmall)> select * from dwd_display_log limit 2;
复制代码


4.3.2 商品详情页表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_newsdetail_log;
  3. CREATE EXTERNAL TABLE dwd_newsdetail_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,
  17. `app_time` string,  
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `entry` string,
  22. `action` string,
  23. `goodsid` string,
  24. `showtype` string,
  25. `news_staytime` string,
  26. `loading_time` string,
  27. `type1` string,
  28. `category` string,
  29. `server_time` string)
  30. PARTITIONED BY (dt string)
  31. stored as parquet
  32. location '/warehouse/gmall/dwd/dwd_newsdetail_log/'
  33. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_newsdetail_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.entry') entry,
  24. get_json_object(event_json,'$.kv.action') action,
  25. get_json_object(event_json,'$.kv.goodsid') goodsid,
  26. get_json_object(event_json,'$.kv.showtype') showtype,
  27. get_json_object(event_json,'$.kv.news_staytime') news_staytime,
  28. get_json_object(event_json,'$.kv.loading_time') loading_time,
  29. get_json_object(event_json,'$.kv.type1') type1,
  30. get_json_object(event_json,'$.kv.category') category,
  31. server_time
  32. from dwd_base_event_log
  33. where dt='2020-10-14' and event_name='newsdetail';
复制代码


3)测试
  1. hive (gmall)> select * from dwd_newsdetail_log limit 2;
复制代码


4.3.3 商品列表页表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_loading_log;
  3. CREATE EXTERNAL TABLE dwd_loading_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,  
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `action` string,
  22. `loading_time` string,
  23. `loading_way` string,
  24. `extend1` string,
  25. `extend2` string,
  26. `type` string,
  27. `type1` string,
  28. `server_time` string)
  29. PARTITIONED BY (dt string)
  30. stored as parquet
  31. location '/warehouse/gmall/dwd/dwd_loading_log/'
  32. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_loading_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.action') action,
  24. get_json_object(event_json,'$.kv.loading_time') loading_time,
  25. get_json_object(event_json,'$.kv.loading_way') loading_way,
  26. get_json_object(event_json,'$.kv.extend1') extend1,
  27. get_json_object(event_json,'$.kv.extend2') extend2,
  28. get_json_object(event_json,'$.kv.type') type,
  29. get_json_object(event_json,'$.kv.type1') type1,
  30. server_time
  31. from dwd_base_event_log
  32. where dt='2020-10-14' and event_name='loading';
复制代码


3)测试
  1. hive (gmall)> select * from dwd_loading_log limit 2;
复制代码


4.3.4 广告表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_ad_log;
  3. CREATE EXTERNAL TABLE dwd_ad_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,  
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `entry` string,
  22. `action` string,
  23. `content` string,
  24. `detail` string,
  25. `ad_source` string,
  26. `behavior` string,
  27. `newstype` string,
  28. `show_style` string,
  29. `server_time` string)
  30. PARTITIONED BY (dt string)
  31. stored as parquet
  32. location '/warehouse/gmall/dwd/dwd_ad_log/'
  33. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_ad_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.entry') entry,
  24. get_json_object(event_json,'$.kv.action') action,
  25. get_json_object(event_json,'$.kv.content') content,
  26. get_json_object(event_json,'$.kv.detail') detail,
  27. get_json_object(event_json,'$.kv.source') ad_source,
  28. get_json_object(event_json,'$.kv.behavior') behavior,
  29. get_json_object(event_json,'$.kv.newstype') newstype,
  30. get_json_object(event_json,'$.kv.show_style') show_style,
  31. server_time
  32. from dwd_base_event_log
  33. where dt='2020-10-14' and event_name='ad';
复制代码

3)测试
  1. hive (gmall)> select * from dwd_ad_log limit 2;
复制代码


4.3.5 消息通知表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_notification_log;
  3. CREATE EXTERNAL TABLE dwd_notification_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,  
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `action` string,
  22. `noti_type` string,
  23. `ap_time` string,
  24. `content` string,
  25. `server_time` string
  26. )
  27. PARTITIONED BY (dt string)
  28. stored as parquet
  29. location '/warehouse/gmall/dwd/dwd_notification_log/'
  30. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_notification_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.action') action,
  24. get_json_object(event_json,'$.kv.noti_type') noti_type,
  25. get_json_object(event_json,'$.kv.ap_time') ap_time,
  26. get_json_object(event_json,'$.kv.content') content,
  27. server_time
  28. from dwd_base_event_log
  29. where dt='2020-10-14' and event_name='notification';
复制代码

3)测试
  1. hive (gmall)> select * from dwd_notification_log limit 2;
复制代码

4.3.6 用户前台活跃表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_active_foreground_log;
  3. CREATE EXTERNAL TABLE dwd_active_foreground_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `push_id` string,
  22. `access` string,
  23. `server_time` string)
  24. PARTITIONED BY (dt string)
  25. stored as parquet
  26. location '/warehouse/gmall/dwd/dwd_foreground_log/'
  27. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_active_foreground_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.push_id') push_id,
  24. get_json_object(event_json,'$.kv.access') access,
  25. server_time
  26. from dwd_base_event_log
  27. where dt='2020-10-14' and event_name='active_foreground';
复制代码

3)测试
  1. hive (gmall)> select * from dwd_active_foreground_log limit 2;
复制代码

4.3.7 用户后台活跃表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_active_background_log;
  3. CREATE EXTERNAL TABLE dwd_active_background_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `active_source` string,
  22. `server_time` string
  23. )
  24. PARTITIONED BY (dt string)
  25. stored as parquet
  26. location '/warehouse/gmall/dwd/dwd_background_log/'
  27. TBLPROPERTIES('parquet.compression'='lzo');
复制代码

2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_active_background_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.active_source') active_source,
  24. server_time
  25. from dwd_base_event_log
  26. where dt='2020-10-14' and event_name='active_background';
复制代码


3)测试
  1. hive (gmall)> select * from dwd_active_background_log limit 2;
复制代码


4.3.8 评论表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_comment_log;
  3. CREATE EXTERNAL TABLE dwd_comment_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `comment_id` int,
  22. `userid` int,
  23. `p_comment_id` int,
  24. `content` string,
  25. `addtime` string,
  26. `other_id` int,
  27. `praise_count` int,
  28. `reply_count` int,
  29. `server_time` string
  30. )
  31. PARTITIONED BY (dt string)
  32. stored as parquet
  33. location '/warehouse/gmall/dwd/dwd_comment_log/'
  34. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_comment_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.comment_id') comment_id,
  24. get_json_object(event_json,'$.kv.userid') userid,
  25. get_json_object(event_json,'$.kv.p_comment_id') p_comment_id,
  26. get_json_object(event_json,'$.kv.content') content,
  27. get_json_object(event_json,'$.kv.addtime') addtime,
  28. get_json_object(event_json,'$.kv.other_id') other_id,
  29. get_json_object(event_json,'$.kv.praise_count') praise_count,
  30. get_json_object(event_json,'$.kv.reply_count') reply_count,
  31. server_time
  32. from dwd_base_event_log
  33. where dt='2020-10-14' and event_name='comment';
复制代码


3)测试
  1. hive (gmall)> select * from dwd_comment_log limit 2;
复制代码


4.3.9 收藏表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_favorites_log;
  3. CREATE EXTERNAL TABLE dwd_favorites_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,  
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `id` int,
  22. `course_id` int,
  23. `userid` int,
  24. `add_time` string,
  25. `server_time` string
  26. )
  27. PARTITIONED BY (dt string)
  28. stored as parquet
  29. location '/warehouse/gmall/dwd/dwd_favorites_log/'
  30. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_favorites_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.id') id,
  24. get_json_object(event_json,'$.kv.course_id') course_id,
  25. get_json_object(event_json,'$.kv.userid') userid,
  26. get_json_object(event_json,'$.kv.add_time') add_time,
  27. server_time
  28. from dwd_base_event_log
  29. where dt='2020-10-14' and event_name='favorites';
复制代码

3)测试
  1. hive (gmall)> select * from dwd_favorites_log limit 2;
复制代码

4.3.10 点赞表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_praise_log;
  3. CREATE EXTERNAL TABLE dwd_praise_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,  
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `id` string,
  22. `userid` string,
  23. `target_id` string,
  24. `type` string,
  25. `add_time` string,
  26. `server_time` string
  27. )
  28. PARTITIONED BY (dt string)
  29. stored as parquet
  30. location '/warehouse/gmall/dwd/dwd_praise_log/'
  31. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_praise_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.id') id,
  24. get_json_object(event_json,'$.kv.userid') userid,
  25. get_json_object(event_json,'$.kv.target_id') target_id,
  26. get_json_object(event_json,'$.kv.type') type,
  27. get_json_object(event_json,'$.kv.add_time') add_time,
  28. server_time
  29. from dwd_base_event_log
  30. where dt='2020-10-14' and event_name='praise';
复制代码


3)测试
  1. hive (gmall)> select * from dwd_praise_log limit 2;
复制代码


4.3.11 错误日志表
1)建表语句
  1. hive (gmall)>
  2. drop table if exists dwd_error_log;
  3. CREATE EXTERNAL TABLE dwd_error_log(
  4. `mid_id` string,
  5. `user_id` string,
  6. `version_code` string,
  7. `version_name` string,
  8. `lang` string,
  9. `source` string,
  10. `os` string,
  11. `area` string,
  12. `model` string,
  13. `brand` string,
  14. `sdk_version` string,
  15. `gmail` string,
  16. `height_width` string,  
  17. `app_time` string,
  18. `network` string,
  19. `lng` string,
  20. `lat` string,
  21. `errorBrief` string,
  22. `errorDetail` string,
  23. `server_time` string)
  24. PARTITIONED BY (dt string)
  25. stored as parquet
  26. location '/warehouse/gmall/dwd/dwd_error_log/'
  27. TBLPROPERTIES('parquet.compression'='lzo');
复制代码


2)导入数据
  1. hive (gmall)>
  2. set hive.exec.dynamic.partition.mode=nonstrict;
  3. insert overwrite table dwd_error_log
  4. PARTITION (dt='2020-10-14')
  5. select
  6. mid_id,
  7. user_id,
  8. version_code,
  9. version_name,
  10. lang,
  11. source,
  12. os,
  13. area,
  14. model,
  15. brand,
  16. sdk_version,
  17. gmail,
  18. height_width,
  19. app_time,
  20. network,
  21. lng,
  22. lat,
  23. get_json_object(event_json,'$.kv.errorBrief') errorBrief,
  24. get_json_object(event_json,'$.kv.errorDetail') errorDetail,
  25. server_time
  26. from dwd_base_event_log
  27. where dt='2020-10-14' and event_name='error';
复制代码


3)测试
  1. hive (gmall)> select * from dwd_error_log limit 2;
复制代码


4.3.12 DWD层事件表加载数据脚本
1)在hadoop102的/home/kgg/bin目录下创建脚本
  1. [kgg@hadoop102 bin]$ vim dwd_event_log.sh
  2.     在脚本中编写如下内容
  3. #!/bin/bash
  4. # 定义变量方便修改
  5. APP=gmall
  6. hive=/opt/module/hive/bin/hive
  7. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  8. if [ -n "$1" ] ;then
  9.     do_date=$1
  10. else
  11.     do_date=`date -d "-1 day" +%F`  
  12. fi
  13. echo "===日志日期为 $do_date==="
  14. sql="
  15. set hive.exec.dynamic.partition.mode=nonstrict;
  16. insert overwrite table "$APP".dwd_display_log
  17. PARTITION (dt='$do_date')
  18. select
  19.     mid_id,
  20.     user_id,
  21.     version_code,
  22.     version_name,
  23.     lang,
  24.     source,
  25.     os,
  26.     area,
  27.     model,
  28.     brand,
  29.     sdk_version,
  30.     gmail,
  31.     height_width,
  32.     app_time,
  33.     network,
  34.     lng,
  35.     lat,
  36.     get_json_object(event_json,'$.kv.action') action,
  37.     get_json_object(event_json,'$.kv.goodsid') goodsid,
  38.     get_json_object(event_json,'$.kv.place') place,
  39.     get_json_object(event_json,'$.kv.extend1') extend1,
  40.     get_json_object(event_json,'$.kv.category') category,
  41.     server_time
  42. from "$APP".dwd_base_event_log
  43. where dt='$do_date' and event_name='display';
  44. insert overwrite table "$APP".dwd_newsdetail_log
  45. PARTITION (dt='$do_date')
  46. select
  47.     mid_id,
  48.     user_id,
  49.     version_code,
  50.     version_name,
  51.     lang,
  52.     source,
  53.     os,
  54.     area,
  55.     model,
  56.     brand,
  57.     sdk_version,
  58.     gmail,
  59.     height_width,
  60.     app_time,
  61.     network,
  62.     lng,
  63.     lat,
  64.     get_json_object(event_json,'$.kv.entry') entry,
  65.     get_json_object(event_json,'$.kv.action') action,
  66.     get_json_object(event_json,'$.kv.goodsid') goodsid,
  67.     get_json_object(event_json,'$.kv.showtype') showtype,
  68.     get_json_object(event_json,'$.kv.news_staytime') news_staytime,
  69.     get_json_object(event_json,'$.kv.loading_time') loading_time,
  70.     get_json_object(event_json,'$.kv.type1') type1,
  71.     get_json_object(event_json,'$.kv.category') category,
  72.     server_time
  73. from "$APP".dwd_base_event_log
  74. where dt='$do_date' and event_name='newsdetail';
  75. insert overwrite table "$APP".dwd_loading_log
  76. PARTITION (dt='$do_date')
  77. select
  78.     mid_id,
  79.     user_id,
  80.     version_code,
  81.     version_name,
  82.     lang,
  83.     source,
  84.     os,
  85.     area,
  86.     model,
  87.     brand,
  88.     sdk_version,
  89.     gmail,
  90.     height_width,
  91.     app_time,
  92.     network,
  93.     lng,
  94.     lat,
  95.     get_json_object(event_json,'$.kv.action') action,
  96.     get_json_object(event_json,'$.kv.loading_time') loading_time,
  97.     get_json_object(event_json,'$.kv.loading_way') loading_way,
  98.     get_json_object(event_json,'$.kv.extend1') extend1,
  99.     get_json_object(event_json,'$.kv.extend2') extend2,
  100.     get_json_object(event_json,'$.kv.type') type,
  101.     get_json_object(event_json,'$.kv.type1') type1,
  102.     server_time
  103. from "$APP".dwd_base_event_log
  104. where dt='$do_date' and event_name='loading';
  105. insert overwrite table "$APP".dwd_ad_log
  106. PARTITION (dt='$do_date')
  107. select
  108.     mid_id,
  109.     user_id,
  110.     version_code,
  111.     version_name,
  112.     lang,
  113.     source,
  114.     os,
  115.     area,
  116.     model,
  117.     brand,
  118.     sdk_version,
  119.     gmail,
  120.     height_width,
  121.     app_time,
  122.     network,
  123.     lng,
  124.     lat,
  125.     get_json_object(event_json,'$.kv.entry') entry,
  126.     get_json_object(event_json,'$.kv.action') action,
  127.     get_json_object(event_json,'$.kv.content') content,
  128.     get_json_object(event_json,'$.kv.detail') detail,
  129.     get_json_object(event_json,'$.kv.source') ad_source,
  130.     get_json_object(event_json,'$.kv.behavior') behavior,
  131.     get_json_object(event_json,'$.kv.newstype') newstype,
  132.     get_json_object(event_json,'$.kv.show_style') show_style,
  133.     server_time
  134. from "$APP".dwd_base_event_log
  135. where dt='$do_date' and event_name='ad';
  136. insert overwrite table "$APP".dwd_notification_log
  137. PARTITION (dt='$do_date')
  138. select
  139.     mid_id,
  140.     user_id,
  141.     version_code,
  142.     version_name,
  143.     lang,
  144.     source,
  145.     os,
  146.     area,
  147.     model,
  148.     brand,
  149.     sdk_version,
  150.     gmail,
  151.     height_width,
  152.     app_time,
  153.     network,
  154.     lng,
  155.     lat,
  156.     get_json_object(event_json,'$.kv.action') action,
  157.     get_json_object(event_json,'$.kv.noti_type') noti_type,
  158.     get_json_object(event_json,'$.kv.ap_time') ap_time,
  159.     get_json_object(event_json,'$.kv.content') content,
  160.     server_time
  161. from "$APP".dwd_base_event_log
  162. where dt='$do_date' and event_name='notification';
  163. insert overwrite table "$APP".dwd_active_foreground_log
  164. PARTITION (dt='$do_date')
  165. select
  166.     mid_id,
  167.     user_id,
  168.     version_code,
  169.     version_name,
  170.     lang,
  171.     source,
  172.     os,
  173.     area,
  174.     model,
  175.     brand,
  176.     sdk_version,
  177.     gmail,
  178.     height_width,
  179.     app_time,
  180.     network,
  181.     lng,
  182.     lat,
  183. get_json_object(event_json,'$.kv.push_id') push_id,
  184. get_json_object(event_json,'$.kv.access') access,
  185.     server_time
  186. from "$APP".dwd_base_event_log
  187. where dt='$do_date' and event_name='active_foreground';
  188. insert overwrite table "$APP".dwd_active_background_log
  189. PARTITION (dt='$do_date')
  190. select
  191.     mid_id,
  192.     user_id,
  193.     version_code,
  194.     version_name,
  195.     lang,
  196.     source,
  197.     os,
  198.     area,
  199.     model,
  200.     brand,
  201.     sdk_version,
  202.     gmail,
  203.     height_width,
  204.     app_time,
  205.     network,
  206.     lng,
  207.     lat,
  208.     get_json_object(event_json,'$.kv.active_source') active_source,
  209.     server_time
  210. from "$APP".dwd_base_event_log
  211. where dt='$do_date' and event_name='active_background';
  212. insert overwrite table "$APP".dwd_comment_log
  213. PARTITION (dt='$do_date')
  214. select
  215.     mid_id,
  216.     user_id,
  217.     version_code,
  218.     version_name,
  219.     lang,
  220.     source,
  221.     os,
  222.     area,
  223.     model,
  224.     brand,
  225.     sdk_version,
  226.     gmail,
  227.     height_width,
  228.     app_time,
  229.     network,
  230.     lng,
  231.     lat,
  232.     get_json_object(event_json,'$.kv.comment_id') comment_id,
  233.     get_json_object(event_json,'$.kv.userid') userid,
  234.     get_json_object(event_json,'$.kv.p_comment_id') p_comment_id,
  235.     get_json_object(event_json,'$.kv.content') content,
  236.     get_json_object(event_json,'$.kv.addtime') addtime,
  237.     get_json_object(event_json,'$.kv.other_id') other_id,
  238.     get_json_object(event_json,'$.kv.praise_count') praise_count,
  239.     get_json_object(event_json,'$.kv.reply_count') reply_count,
  240.     server_time
  241. from "$APP".dwd_base_event_log
  242. where dt='$do_date' and event_name='comment';
  243. insert overwrite table "$APP".dwd_favorites_log
  244. PARTITION (dt='$do_date')
  245. select
  246.     mid_id,
  247.     user_id,
  248.     version_code,
  249.     version_name,
  250.     lang,
  251.     source,
  252.     os,
  253.     area,
  254.     model,
  255.     brand,
  256.     sdk_version,
  257.     gmail,
  258.     height_width,
  259.     app_time,
  260.     network,
  261.     lng,
  262.     lat,
  263.     get_json_object(event_json,'$.kv.id') id,
  264.     get_json_object(event_json,'$.kv.course_id') course_id,
  265.     get_json_object(event_json,'$.kv.userid') userid,
  266.     get_json_object(event_json,'$.kv.add_time') add_time,
  267.     server_time
  268. from "$APP".dwd_base_event_log
  269. where dt='$do_date' and event_name='favorites';
  270. insert overwrite table "$APP".dwd_praise_log
  271. PARTITION (dt='$do_date')
  272. select
  273.     mid_id,
  274.     user_id,
  275.     version_code,
  276.     version_name,
  277.     lang,
  278.     source,
  279.     os,
  280.     area,
  281.     model,
  282.     brand,
  283.     sdk_version,
  284.     gmail,
  285.     height_width,
  286.     app_time,
  287.     network,
  288.     lng,
  289.     lat,
  290.     get_json_object(event_json,'$.kv.id') id,
  291.     get_json_object(event_json,'$.kv.userid') userid,
  292.     get_json_object(event_json,'$.kv.target_id') target_id,
  293.     get_json_object(event_json,'$.kv.type') type,
  294.     get_json_object(event_json,'$.kv.add_time') add_time,
  295.     server_time
  296. from "$APP".dwd_base_event_log
  297. where dt='$do_date' and event_name='praise';
  298. insert overwrite table "$APP".dwd_error_log
  299. PARTITION (dt='$do_date')
  300. select
  301.     mid_id,
  302.     user_id,
  303.     version_code,
  304.     version_name,
  305.     lang,
  306.     source,
  307.     os,
  308.     area,
  309.     model,
  310.     brand,
  311.     sdk_version,
  312.     gmail,
  313.     height_width,
  314.     app_time,
  315.     network,
  316.     lng,
  317.     lat,
  318.     get_json_object(event_json,'$.kv.errorBrief') errorBrief,
  319.     get_json_object(event_json,'$.kv.errorDetail') errorDetail,
  320.     server_time
  321. from "$APP".dwd_base_event_log
  322. where dt='$do_date' and event_name='error';
  323. "
  324. $hive -e "$sql"
复制代码

2)增加脚本执行权限
  1. [kgg@hadoop102 bin]$ chmod 777 dwd_event_log.sh
复制代码

3)脚本使用
  1. [kgg@hadoop102 module]$ dwd_event_log.sh 2019-02-11
复制代码

4)查询导入结果
  1. hive (gmall)>
  2. select * from dwd_comment_log where dt='2019-02-11' limit 2;
复制代码

5)脚本执行时间
企业开发中一般在每日凌晨30分~1点

最新经典文章,欢迎关注公众号


本帖被以下淘专辑推荐:

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条