Hive 企业级调优（hive 1.2.1）

时间：2020-04-24 23:22:20 阅读：127 评论：0 收藏：0 [点我收藏+]

一、Fetch抓取

Fetch 抓取是指，Hive 中对某些情况的查询可以不必使用 MapReduce 计算。例如：SELECT * FROM employees;在这种情况下，Hive 可以简单地读取 employee 对应的存储目录下的文件，然后输出查询结果到控制台。

在 hive-default.xml.template 文件中 hive.fetch.task.conversion 默认是 more，老版本 hive默认是 minimal，该属性修改为 more 以后，在全局查找、字段查找、limit 查找等都不走mapreduce。

<property>
    <name>hive.fetch.task.conversion</name>
    <value>more</value>
    <description>
      Expects one of [none, minimal, more].
      Some select queries can be converted to single FETCH task minimizing latency.
      Currently the query should be single sourced not having any subquery and should not have
      any aggregations or distincts (which incurs RS), lateral views and joins.
      0. none : disable hive.fetch.task.conversion
      1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
      2. more    : SELECT, FILTER, LIMIT only (support TABLESAMPLE and virtual columns)
    </description>
</property>

①查询默认抓取模式

hive (default)> set hive.fetch.task.conversion;
hive.fetch.task.conversion=more

②select * 不走mr

hive (default)> select * from score;
OK
score.name    score.subject    score.score
孙悟空    语文    87
孙悟空    数学    95
...省略...
婷婷    数学    85
婷婷    英语    78

③关闭抓取

hive (default)> set hive.fetch.task.conversion=none;

④再次查询，需要走mr

hive (default)> select * from score;
Query ID = atguigu_20200425011511_d4d9f365-e96c-48b2-9bf6-7818f69e18da
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1587748417298_0001)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 4.48 s     
--------------------------------------------------------------------------------
OK
score.name    score.subject    score.score
孙悟空    语文    87
孙悟空    数学    95
...省略...
婷婷    数学    85
婷婷    英语    78
Time taken: 6.177 seconds, Fetched: 12 row(s)

二、本地模式

大多数的 Hadoop Job 是需要 Hadoop 提供的完整的可扩展性来处理大数据集的。不过，有时 Hive 的输入数据量是非常小的。在这种情况下，为查询触发执行任务消耗的时间可能会比实际 job 的执行时间要多的多。对于大多数这种情况，Hive 可以通过本地模式在单台机器上处理所有的任务。对于小数据集，执行时间可以明显被缩短。

启用本地模式有两个前提条件，文件大小不能超过hive.exec.mode.local.auto.inputbytes.max，文件数量不能超过hive.exec.mode.local.auto.input.files.max

①用户可以通过设置 hive.exec.mode.local.auto 的值为 true，来让 Hive 在适当的时候自动启动这个优化。

hive (default)> set hive.exec.mode.local.auto=true;

②测试

hive (default)> select count(*) from score;
Automatically selecting local only mode for query
Query ID = atguigu_20200425012518_35634c83-8b18-4703-b36d-2dfdea881305
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2020-04-25 01:25:22,746 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local2060501220_0001
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 426 HDFS Write: 3 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
_c0
12
Time taken: 3.974 seconds, Fetched: 1 row(s)

Hive 企业级调优（hive 1.2.1）

原文：https://www.cnblogs.com/noyouth/p/12770394.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)