使用Hive命令行及内置服务

问题导读

1.hive下如何执行host命令？
2.hive如何查看hdfs文件？
3.‘set’与set -v作用是什么？
4.在hive中如何执行一个命令脚本文件?
5.hive如何在后台运行？
6.如何启动hiveserver2？
7.如何关闭在后台运行的hiveserver2？
8.如何查看Hive内置服务？

1. hive命令行技巧

在hive中,直接执行$HIVE_HOME/bin/hive或者执行$HIVE_HOME/bin/hive –service cli就进入了Hive的命令行模式。
常常用于直接进行交互式命令的执行。
(1). 在hive中，使用可以得到类似bash的自动补全功能。
(2). 历史命令，保存在 $HOME/.hivehistory
(3). 在hive的命令行下输入！来输入host命令（和sqlplus一样）。
(4). hive下面查看dfs的内容:
hive> dfs -ls /tmp;
Found 2 items
-rw-r–r– 3 ambari-qa hdfs 1916 2014-03-12 06:14 /tmp/ida8c0d201_date141214
-rw-r–r– 3 ambari-qa hdfs 1958 2014-03-12 07:45 /tmp/ida8c0d201_date441214

(5). hive的注释，同SQL注释’–‘
(6). 显示列表，需要设置hive.cli.print.header变量
hive> set hive.cli.print.header=true;
(7). 查看变量
hive> set;
…
hive> set-v;
… even more output!…
‘set’输出hivevar,hiveconf,system和env命名空间下的所有变量。
‘set -v’包括了输出Hadoop定义的全部变量。

hive> set hivevar:foo=hello;
hive> set hivevar:foo;
hivevar:foo=hello
(8). 使用变量：
hive> create table toss1(i int, ${hivevar:foo} string);
OK
Time taken: 0.652 seconds
hive> desc toss1;
OK
i int None
hello string None
Time taken: 0.055 seconds, Fetched: 2 row(s)
(9). 变量属于不同的命名空间。这些命名空间分别是：
namespace – access – description
hivevar – Read/Write – User-defined custom variables.
hiveconf – Read/Write – Hive-specific configuration properties.
system – Read/Write – Configuration properties defined by Java.
env – Read only – Environment variables defined by the shell environment (e.g., bash).
(10). 设置hiveconf命名空间变量
hiveconf是hive的配置变量，这里用hiveconf hive.cli.print.current.db=true ( It turns on printing of the current working database name in the CLI prompt.)
[hadoop@cloud011 ~]$ hive –hiveconf hive.cli.print.current.db=true
hive (default)> set hiveconf:hive.cli.print.current.db=false;
hive>

2. Hive batch mode

并使用hive直接执行SQL语句

hive> create table test(a string, b string) row format delimited fields terminated by ‘ ‘ stored as textfile;
OK
复制代码

构造一些数据并load

SHELL$ \
for (( i = 0; i < MAX ; i ++ ))
do
echo "a$i b$1" >> a1
done
Time taken: 0.565 seconds
复制代码

hive> load data local inpath ‘/home/hadoop/a1.txt’ into table test;
Copying data from file:/home/hadoop/a1.txt
Copying file: file:/home/hadoop/a1.txt
Loading data to table default.test
Table default.test stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 2603212700, raw_data_size: 0]
OK
Time taken: 22.882 seconds
复制代码

下面获取前三行记录, use -S for slient mode

SHELL$ hive -S -e “select * FROM test LIMIT 3″ > /tmp/data.1
SHELL$ cat /tmp/data.1
a0 b0
a1 b1
a2 b2
复制代码

也可以把要执行的命令写到文件中，使用-f参数来指定这个文件：

SHELL$ hive -S -f query.hql
复制代码

在hive提示符中执行一个命令脚本

hive> source query.hql;
复制代码

获取hive的配置参数也可以使用类似的方式：

SHELL$ hive -S -e “set” | grep warehouse
hive.metastore.warehouse.dir=/user/hive/warehouse
hive.warehouse.subdir.inherit.perms=false
复制代码

hive的常用参数如下：

-d,–define Variable substitution to apply to hive commands. e.g. -d A=B or –define A=B
-e SQL from command line
-f SQL from files
-i Initialization SQL file
-S,–silent Silent mode in interactive shell
-v,–verbose Verbose mode (echo executed SQL to the console)
复制代码

3. hive web模式

hive提供了一个web的界面，这个功能不是很常用。一般情况下需要将让该进程放到后台运行（&）。hive启动了一个jetty服务器，web端口是9999。
SHELL$ hive –service hwi
14/03/15 10:59:44 INFO mortbay.log: jetty-6.1.26
14/03/15 10:59:44 INFO mortbay.log: Extract /home/hadoop/hive-0.12.0/lib/hive-hwi-0.12.0.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.0.12.0.war__hwi__ow27i/webapp
14/03/15 10:59:44 INFO mortbay.log:Started SocketConnector@0.0.0.0:9999
……
这时候在可以查到该进程:

/usr/local/jdk1.7.0_51/bin/java -Xmx256m
-Djava.library.path=/home/hadoop/hadoop-2.3.0/lib/native
-Djava.net.preferIPv4Stack=true
-Dhadoop.log.dir=/home/hadoop/hadoop-2.3.0/logs
-Dhadoop.log.file=hadoop.log
-Dhadoop.home.dir=/home/hadoop/hadoop-2.3.0
-Dhadoop.id.str=hadoop
-Dhadoop.root.logger=INFO,console
-Dhadoop.policy.file=hadoop-policy.xml
-Djava.net.preferIPv4Stack=true
-Xmx512m
-Dhadoop.security.logger=INFO,NullAppender
org.apache.hadoop.util.RunJar /home/hadoop/hive-0.12.0/lib/hive-hwi-0.12.0.jar org.apache.hadoop.hive.hwi.HWIServer
复制代码

4. thrift hiveserver模式

该模式是hive最常用的，启动了hiveserver以后，应用程序就可以通过jdbc等驱动来访问hive。
(HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results.)
SHELL$ hive –service hiveserver &
或者自己指定端口：
SHELL$ hive –service hiveserver -p 50000 &
/usr/local/jdk1.7.0_51/bin/java -Xmx256m
-Djava.library.path=/home/hadoop/hadoop-2.3.0/lib/native
-Djava.net.preferIPv4Stack=true
-Dhadoop.log.dir=/home/hadoop/hadoop-2.3.0/logs
-Dhadoop.log.file=hadoop.log
-Dhadoop.home.dir=/home/hadoop/hadoop-2.3.0
-Dhadoop.id.str=hadoop
-Dhadoop.root.logger=INFO,console
-Dhadoop.policy.file=hadoop-policy.xml
-Djava.net.preferIPv4Stack=true -Xmx512m
-Dhadoop.security.logger=INFO,NullAppender
org.apache.hadoop.util.RunJar /home/hadoop/hive-0.12.0/lib/hive-service-0.12.0.jar org.apache.hadoop.hive.service.HiveServer -p 50000

5. hiveserver2

hive 0.11以后加入了hiveserver2的功能，这个是hiveserver的升级版，解决了daemon不稳定、并发请求(HiveServer cannot handle concurrent requests from more than one client)、session管理等问题。hiveserver2的相关参数：

SHELL$ hive -S -e “set” | grep server2
复制代码

hive.server2.async.exec.shutdown.timeout=10
hive.server2.async.exec.threads=50
hive.server2.authentication=NONE
hive.server2.enable.doAs=true
hive.server2.table.type.mapping=CLASSIC
hive.server2.thrift.bind.host=localhost
hive.server2.thrift.http.max.worker.threads=500
hive.server2.thrift.http.min.worker.threads=5
hive.server2.thrift.http.path=cliservice
hive.server2.thrift.http.port=10001
hive.server2.thrift.max.worker.threads=500
hive.server2.thrift.min.worker.threads=5
hive.server2.thrift.port=10000
hive.server2.thrift.sasl.qop=auth
hive.server2.transport.mode=binary
复制代码

由此可以看到，默认的port=10000 & bond host=localhost，最大线程数为500。
下面启动hiveserver2:

SHELL$ hive –service hiveserver2 &
复制代码

….
查看ps进程，可以看出hiveserver2被启动：

java … org.apache.hadoop.util.RunJar /home/hadoop/hive-0.12.0/lib/hive-service-0.12.0.jar org.apache.hive.service.server.HiveServer2
复制代码

hiveserver和hiveserver2并不是以后台服务的形式运行，而命令行也只提供了设置hiveserver2参数的方法，关闭在后台运行的hiveserver2只有通过kill来实现了(cloudra和HDP中已经将hiveserver2设置成了一个service)。
[hadoop@cloud011 ~]$ hive –service hiveserver2 -H
usage: hiveserver2
-H,–help Print help information
–hiveconf Use value for given property
hiveserver2同时提供了一个client端的命令行工具beeline，[url=]官方文档[/url]上会给出详细使用说明。6. metastore

metastore服务负责连接到保存metastore信息的关系型数据库，一般情况和hiveserver服务在一起运行。也可以单独启动作为一个服务，通过METASTORE_PORT参数来设置端口。
hive –service metastore &

内置服务：

　Hive内部自带了许多的服务，我们可以在运行时用–service选项来明确指定使用什么服务，如果你不知道Hive内部有多少服务，可以用下面的–service help来查看帮助，如下:

$ hive --service help
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: beeline cli help hiveserver2 hiveserver hwi \
jar lineage metastore metatool orcfiledump rcfilecat
Parameters parsed:
--auxpath : Auxillary jars
--config : Hive configuration directory
--service : Starts specific service/component. cli is default
Parameters used:
HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
HIVE_OPT : Hive options
For help on a particular service:
./hive --service serviceName --help
Debug help: ./hive --debug --help
复制代码

大家可以看到上面的输出项Service List，里面显示出Hive支持的服务列表：beeline cli help hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat，下面介绍最有用的一些服务。

　　（1）、cli：这个就是Command Line Interface的简写，是Hive的命令行界面，用的比较多。这是默认的服务，直接可以在命令行里面使用。

　　（2）、hiveserver：这个可以让Hive以提供Trift服务的服务器形式来运行，可以允许许多不同语言编写的客户端进行通信。使用需要启动HiveServer服务以和客户端联系，我们可以通过设置HIVE_PORT环境变量来设置服务器所监听的端口号，在默认的情况下，端口为10000。可以通过下面方式来启动hiveserver:

[wyp@master ~]$  bin/hive --service hive server -p 10002
Starting Hive Thrift Server
复制代码

其中-p参数也是用来指定监听端口的。

　　（3）、hwi：其实就是hive web interface的缩写，它是Hive的Web接口，是hive cli的一个web替换方案。

　　（4）、jar：与Hadoop jar等价的Hive的接口，这是运行类路径中同时包含Hadoop和Hive类的Java应用程序的简便方式。

　　（5）、metastore：在默认情况下，metastore和Hive服务运行在同一个进程中（如下图介绍）。使用这个服务，可以让metastore作为一个单独的进程运行，我们可以通过METASTORE_PORT来指定监听的端口号。

feng01301218 · 发表于 2015-3-31 14:31:07

学习好资料哈
最近在看hive
谢谢楼主

Redgo · 发表于 2015-4-1 21:50:30

图文精华

使用Hive命令行及内置服务

已有(2)人评论

推荐 /2