一、Fetch抓取
Fetch 抓取是指,Hive 中对某些情况的查询可以不必使用 MapReduce 计算。例如:SELECT * FROM employees;在这种情况下,Hive 可以简单地读取 employee 对应的存储目录下的文件,然后输出查询结果到控制台。
<property> <name>hive.fetch.task.conversion</name> <value>more</value> <description> Expects one of [none, minimal, more]. Some select queries can be converted to single FETCH task minimizing latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incurs RS), lateral views and joins. 0. none : disable hive.fetch.task.conversion 1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only 2. more : SELECT, FILTER, LIMIT only (support TABLESAMPLE and virtual columns) </description> </property>
①查询默认抓取模式
hive (default)> set hive.fetch.task.conversion; hive.fetch.task.conversion=more
②select * 不走mr
hive (default)> select * from score; OK score.name score.subject score.score 孙悟空 语文 87 孙悟空 数学 95 ...省略... 婷婷 数学 85 婷婷 英语 78
③关闭抓取
hive (default)> set hive.fetch.task.conversion=none;
④再次查询,需要走mr
hive (default)> select * from score; Query ID = atguigu_20200425011511_d4d9f365-e96c-48b2-9bf6-7818f69e18da Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1587748417298_0001) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.48 s -------------------------------------------------------------------------------- OK score.name score.subject score.score 孙悟空 语文 87 孙悟空 数学 95 ...省略... 婷婷 数学 85 婷婷 英语 78 Time taken: 6.177 seconds, Fetched: 12 row(s)
二、本地模式
大多数的 Hadoop Job 是需要 Hadoop 提供的完整的可扩展性来处理大数据集的。不过,有时 Hive 的输入数据量是非常小的。在这种情况下,为查询触发执行任务消耗的时间可能会比实际 job 的执行时间要多的多。对于大多数这种情况,Hive 可以通过本地模式在单台机器上处理所有的任务。对于小数据集,执行时间可以明显被缩短。
启用本地模式有两个前提条件,文件大小不能超过hive.exec.mode.local.auto.inputbytes.max,文件数量不能超过hive.exec.mode.local.auto.input.files.max
①用户可以通过设置 hive.exec.mode.local.auto 的值为 true,来让 Hive 在适当的时候自动启动这个优化。
hive (default)> set hive.exec.mode.local.auto=true;
②测试
hive (default)> select count(*) from score; Automatically selecting local only mode for query Query ID = atguigu_20200425012518_35634c83-8b18-4703-b36d-2dfdea881305 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2020-04-25 01:25:22,746 Stage-1 map = 100%, reduce = 100% Ended Job = job_local2060501220_0001 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 426 HDFS Write: 3 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK _c0 12 Time taken: 3.974 seconds, Fetched: 1 row(s)
原文:https://www.cnblogs.com/noyouth/p/12770394.html