问题导读
1.hive实现统计的查询语句是什么?
2.生产环境中为什么建议使用外部表?
3.hadoop mapreduce创建类DataWritable的作用是什么?
4.为什么创建类DataWritable?
5.如何实现统计手机流量?
6.对比hive与mapreduce统计手机流量的区别?
1.使用Hive进行手机流量统计
很多公司在使用hive对数据进行处理。
hive是hadoop家族成员,是一种解析like sql语句的框架。它封装了常用MapReduce任务,让你像执行sql一样操作存储在HDFS的表。
hive的表分为两种,内表和外表。
Hive 创建内部表时,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不对数据的位置做任何改变。
在删除表的时候,内部表的元数据和数据会被一起删除, 而外部表只删除元数据,不删除数据。这样外部表相对来说更加安全些,数据组织也更加灵活,方便共享源数据。
Hive的内外表,还有一个Partition的分区的知识点,用于避免全表扫描,快速检索。后期的文章会提到。
接下来开始正式开始《Hive统计手机流量》
原始数据:
1363157985066 13726230503 00-FD-07-A4-72-B8:CMCC 120.196.100.82 i02.c.aliimg.com 24 27 2481 24681 200
1363157995052 13826544101 5C-0E-8B-C7-F1-E0:CMCC 120.197.40.4 4 0 264 0 200
1363157991076 13926435656 20-10-7A-28-CC-0A:CMCC 120.196.100.99 2 4 132 1512 200
1363154400022 13926251106 5C-0E-8B-8B-B1-50:CMCC 120.197.40.4 4 0 240 0 200
1363157993044 18211575961 94-71-AC-CD-E6-18:CMCC-EASY 120.196.100.99 iface.qiyi.com 瑙.?缃.. 15 2 1527 2106 200
1363157995074 84138413 5C-0E-8B-8C-E8-20:7DaysInn 120.197.40.4 122.72.52.12 20 16 4116 1432 200
1363157993055 13560439658 C4-17-FE-BA-DE-D9:CMCC 120.196.100.99 18 15 1116 954 200
1363157995033 15920133257 5C-0E-8B-C7-BA-20:CMCC 120.197.40.4 sug.so.360.cn 淇℃.瀹.. 20 20 156 2936 200
操作步骤:
- #配置好Hive之后,使用hive命令启动hive框架。hive启动属于懒加载模式,会比较慢
- hive;
- #使用show databases命令查看当前数据库信息
- hive> show databases;
- OK
- default
- hive
- Time taken: 3.389 seconds
- #使用 use hive命令,使用指定的数据库 hive数据库是我之前创建的
- use hive;
- #创建表,这里是创建内表。内表加载hdfs上的数据,会将被加载文件中的内容剪切走。
- #外表没有这个问题,所以在实际的生产环境中,建议使用外表。
- create table ll(reportTime string,msisdn string,apmac string,acmac string,host string,siteType string,upPackNum bigint,downPackNum bigint,upPayLoad bigint,downPayLoad bigint,httpStatus string)row format delimited fields terminated by '\t';
- #加载数据,这里是从hdfs加载数据,也可用linux下加载数据 需要local关键字
- load data inpath'/HTTP_20130313143750.dat' into table ll;
- #数据加载完毕之后,hdfs的
- #执行hive 的like sql语句,对数据进行统计
- select msisdn,sum(uppacknum),sum(downpacknum),sum(uppayload),sum(downpayload) from ll group by msisdn;
复制代码
执行结果如下:
- hive> select msisdn,sum(uppacknum),sum(downpacknum),sum(uppayload),sum(downpayload) from ll group by msisdn;
- Total MapReduce jobs = 1
- Launching Job 1 out of 1
- Number of reduce tasks not specified. Estimated from input data size: 1
- In order to change the average load for a reducer (in bytes):
- set hive.exec.reducers.bytes.per.reducer=<number>
- In order to limit the maximum number of reducers:
- set hive.exec.reducers.max=<number>
- In order to set a constant number of reducers:
- set mapred.reduce.tasks=<number>
- Starting Job = job_201307160252_0006, Tracking URL = http://hadoop0:50030/jobdetails.jsp?jobid=job_201307160252_0006
- Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=hadoop0:9001 -kill job_201307160252_0006
- Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
- 2013-07-17 19:51:42,599 Stage-1 map = 0%, reduce = 0%
- 2013-07-17 19:52:40,474 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:41,690 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:42,693 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:43,698 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:44,702 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:45,707 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:46,712 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:47,715 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:48,721 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:49,758 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:50,763 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec
- 2013-07-17 19:52:51,772 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 50.0 sec
- 2013-07-17 19:52:52,775 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 50.0 sec
- 2013-07-17 19:52:53,779 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 50.0 sec
- MapReduce Total cumulative CPU time: 50 seconds 0 msec
- Ended Job = job_201307160252_0006
- MapReduce Jobs Launched:
- Job 0: Map: 1 Reduce: 1 Cumulative CPU: 50.0 sec HDFS Read: 2787075 HDFS Write: 16518 SUCCESS
- Total MapReduce CPU Time Spent: 50 seconds 0 msec
- OK
- 13402169727 171 108 11286 130230
- 13415807477 2067 1683 169668 1994181
- 13416127574 1501 1094 161963 802756
- 13416171820 113 99 10630 32120
- 13417106524 160 128 18688 13088
- 13418002498 240 256 22136 86896
- 13418090588 456 351 98934 67470
- 13418117364 264 152 29436 49966
- 13418173218 37680 48348 2261286 73159722
- 13418666750 22432 26482 1395648 39735552
- 13420637670 20 20 1480 1480
- ......
- Time taken: 75.24 seconds
复制代码
2.Hadoop MapReduce手机流量统计
自定义一个writable
- package cn.maoxiangyi.hadoop.wordcount;
-
- import java.io.DataInput;
- import java.io.DataOutput;
- import java.io.IOException;
-
- import org.apache.hadoop.io.Writable;
-
- public class DataWritable implements Writable {
-
- private int upPackNum;
- private int downPackNum;
- private int upPayLoad;
- private int downPayLoad;
-
- public DataWritable() {
- super();
- }
-
- public DataWritable(int upPackNum, int downPackNum, int upPayLoad,
- int downPayLoad) {
- super();
- this.upPackNum = upPackNum;
- this.downPackNum = downPackNum;
- this.upPayLoad = upPayLoad;
- this.downPayLoad = downPayLoad;
- }
-
- @Override
- public void write(DataOutput out) throws IOException {
- out.writeInt(upPackNum);
- out.writeInt(downPackNum);
- out.writeInt(upPayLoad);
- out.writeInt(downPayLoad);
- }
-
- @Override
- public void readFields(DataInput in) throws IOException {
- upPackNum = in.readInt();
- downPackNum = in.readInt();
- upPayLoad = in.readInt();
- downPayLoad =in.readInt();
- }
-
- public int getUpPackNum() {
- return upPackNum;
- }
-
- public void setUpPackNum(int upPackNum) {
- this.upPackNum = upPackNum;
- }
-
- public int getDownPackNum() {
- return downPackNum;
- }
-
- public void setDownPackNum(int downPackNum) {
- this.downPackNum = downPackNum;
- }
-
- public int getUpPayLoad() {
- return upPayLoad;
- }
-
- public void setUpPayLoad(int upPayLoad) {
- this.upPayLoad = upPayLoad;
- }
-
- public int getDownPayLoad() {
- return downPayLoad;
- }
-
- public void setDownPayLoad(int downPayLoad) {
- this.downPayLoad = downPayLoad;
- }
-
- @Override
- public String toString() {
- return " " + upPackNum + " "
- + downPackNum + " " + upPayLoad + " "
- + downPayLoad;
- }
-
-
- }
复制代码
MapReduc函数
- package cn.maoxiangyi.hadoop.wordcount;
-
- import java.io.IOException;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.Mapper;
- import org.apache.hadoop.mapreduce.Reducer;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
-
- public class DataTotalMapReduce {
-
- public static void main(String[] args) throws Exception {
- Configuration configuration = new Configuration();
- Job job = new Job(configuration);
- job.setJarByClass(DataTotalMapReduce.class);
- job.setMapperClass(DataTotalMapper.class);
- job.setReducerClass(DataTotalReducer.class);
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(DataWritable.class);
- job.setCombinerClass(DataTotalReducer.class);
- Path inputDir = new Path("hdfs://hadoop0:9000/HTTP_20130313143750.dat");
- FileInputFormat.addInputPath(job, inputDir);
- Path outputDir = new Path("hdfs://hadoop0:9000/dataTotal");
- FileOutputFormat.setOutputPath(job, outputDir);
- job.waitForCompletion(true);
- }
-
- }
-
- /**
- *
- *
- *
- * 1363157985066 13726230503 00-FD-07-A4-72-B8:CMCC 120.196.100.82
- * i02.c.aliimg.com 24 27 2481 24681 200 1363157995052 13826544101
- * 5C-0E-8B-C7-F1-E0:CMCC 120.197.40.4 4 0 264 0 200 1363157991076 13926435656
- * 20-10-7A-28-CC-0A:CMCC 120.196.100.99 2 4 132 1512 200
- *
- *
- */
-
- class DataTotalMapper extends Mapper<LongWritable, Text, Text, DataWritable> {
-
- @Override
- protected void map(LongWritable key, Text value, Context context)
- throws IOException, InterruptedException {
- String lineStr = value.toString();
- String[] strArr = lineStr.split("\t");
- String phpone = strArr[1];
- String upPackNum = strArr[6];
- String downPackNum = strArr[7];
- String upPayLoad = strArr[8];
- String downPayLoad = strArr[9];
- context.write(
- new Text(phpone),
- new DataWritable(Integer.parseInt(upPackNum), Integer
- .parseInt(downPackNum), Integer.parseInt(upPayLoad),
- Integer.parseInt(downPayLoad)));
- }
-
- }
-
- class DataTotalReducer extends Reducer<Text, DataWritable, Text, DataWritable> {
-
- @Override
- protected void reduce(Text k2, Iterable<DataWritable> v2, Context context)
- throws IOException, InterruptedException {
- int upPackNumSum = 0;
- int downPackNumSum = 0;
- int upPayLoadSum = 0;
- int downPayLoadSum = 0;
- for (DataWritable dataWritable : v2) {
- upPackNumSum += dataWritable.getUpPackNum();
- downPackNumSum += dataWritable.getDownPackNum();
- upPayLoadSum += dataWritable.getUpPayLoad();
- downPayLoadSum += dataWritable.getDownPayLoad();
- }
- context.write(k2, new DataWritable(upPackNumSum, downPackNumSum, upPayLoadSum, downPayLoadSum));
- }
-
- }
复制代码
结果节选
- 13402169727 171 108 11286 130230
- 13415807477 2067 1683 169668 1994181
- 13416127574 1501 1094 161963 802756
- 13416171820 113 99 10630 32120
- 13417106524 160 128 18688 13088
- 13418002498 240 256 22136 86896
- 13418090588 456 351 98934 67470
- 13418117364 264 152 29436 49966
- 13418173218 37680 48348 2261286 73159722
- 13418666750 22432 26482 1395648 39735552
- 13420637670 20 20 1480 1480
- 13422149173 40 32 4000 3704
- 13422311151 465 535 33050 661790
- 13424077835 84 72 15612 9948
- 13424084200 765 690 60930 765675
- 13428887537 43892 44830 2925330 65047620
- 13430219372 454 352 33792 192876
- 13430234524 27852 39056 1767220 52076614
- 13430237899 1293 1165 166346 808613
- 13430258776 4681 4783 350511 6609423
- 13430266620 10544 9377 11600817 5728002
- 13432023893 40 0 2400 0
复制代码
|