Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python Web框架Django实现的.通过使用Hue我们可以再浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce程序.等等。
CDH版本:http://archive-primary.cloudera.com/cdh5/cdh/5/
本次我们采用的是CDH版本,版本号为hue-3.7.0-cdh5.3.6.tar.gz
tar -zxvf hue-3.7.0-cdh5.3.6.tar.gz -C /export/servers/
因为hue的默认用户不是使用root用户,所以需要增加hue用户,并设置好hue用户的密码(默认使用123456)。
#新增hue用户
useradd hue
#给hue用户设置密码(默认为123456)(输入如下命令,点击回车,再输入密码即可)
passwd hue
chown -R hue:hue hue-3.7.0-cdh5.3.6/
#Red-Hat(CentOS相当于Red-Hat)安装依赖如下所示
yum install -y gcc gcc-c++ libxml2-devel libxslt-devel cyrus-sasl-devel cyrus-sasl-gssapi mysql-devel python-devel python-setuptools python-simplejson sqlite-devel ant libsasl2-dev libsasl2-modules-gssapi-mit libkrb5-dev libtidy-0.99-0 mvn openldap-dev libldap2-dev openldap-devel
依赖参考:http://archive-primary.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6/manual.html
相关依赖列表如下表所示:
切换成hue用户,并到hue的顶层安装目录下 执行如下命令:
make apps
显示如下结果为编译成功:
在/export/servers/hue-3.7.0-cdh5.3.6/desktop/conf目录下配置hue.ini 参照官网:http://archive-primary.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6/manual.html#_configuring_hue的3.1章节 在hue.ini配置文件中配置秘钥,访问的ip地址和端口号,时区: secret_key=jFE93j;2[290-eiw.KEiwN2s3[‘d;/.q[eIW^y#e=+Iei*@Mn<qW5o # Webserver listens on this address and port http_host=spark-node01.ouyang.com http_port=8888 # Time zone name time_zone=Asia/Shanghai
请检查desktop目录下的desktop.db文件和desktop/conf目录下的hue.ini文件是否为hue用户组的hue用户,如果不是,请修改。
#修改desktop目录下的desktop.pb目录的权限
chmod o+w /opt/modules/hue-3.7.0-cdh5.3.6/desktop/desktop.db
#在build目录下启动hue服务
build/env/bin/supervisor
访问地址:node01.ouyang.com:8888
第一次访问,会让创建一个用户,这个用户是超级用户,拥有的权限比较多
登录成功后即可进入hue主页面:
# 该配置文件路径:/export/servers/hadoop-2.7.4/etc/Hadoop # 在该文件中添加如下内容: <!--开启webhdfs--> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
# 该配置文件路径:/export/servers/hadoop-2.7.4/etc/Hadoop # 在该文件中添加如下内容: <!--开启添加hue用户和用户组的信息--> <property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property>
scp core-site.xml node02.ouyang.com:$PWD
scp core-site.xml node03.ouyang.com:$PWD
scp hdfs-site.xml node02.ouyang.com:$PWD
scp hdfs-site.xml node03.ouyang.com:$PWD
这里执行hadoop的一键停止和启动功能
在[[hdfs_clusters]]标签下进行hdfs的配置:
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://node01.ouyang.com:9000
# NameNode logical name.
## logical_name=
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
webhdfs_url=http://node01.ouyang.com:50070/webhdfs/v1
# Change this if your HDFS cluster is Kerberos-secured
## security_enabled=false
# Default umask for file and directory creation, specified in an octal value.
## umask=022
# Directory of the Hadoop configuration
hadoop_conf_dir= /export/servers/hadoop-2.7.4/etc/hadoop
hadoop_hdfs_home= /export/servers/hadoop-2.7.4
hadoop_bin= /export/servers/hadoop-2.7.4/bin
# Configuration for YARN (MR2)
./build/env/bin/supervisor
在该界面里可以正常查看HDFS的文件夹和文件,还可以对文件的内容进行查看,但根据上述配置还不能对文件进行编辑。
在[[yarn_clusters]]标签下进行yarn的配置:
[[yarn_clusters]]
[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=node01.ouyang.com
# The port where the ResourceManager IPC listens on
resourcemanager_port=8032
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
# URL of the ResourceManager API
resourcemanager_api_url=http://node01.ouyang.com:8088
# URL of the ProxyServer API
proxy_api_url=http://node01.ouyang.com:8088
# URL of the HistoryServer API
history_server_api_url=http://node01.ouyang.com:19888
# In secure mode (HTTPS), if SSL certificates from Resource Manager‘s
# Rest Server have to be verified against certificate authority
## ssl_cert_ca_verify=False
修改配置文件后重新启动yarn和hue服务。
当我们将YARN服务启动之后,重新刷新Hue的WEB UI界面,就可以看到这条警告没有了。
在[beeswax]标签下进行Hive的配置:
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=node01.ouyang.com
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/export/servers/hive/conf
hive_home_dir=/export/servers/hive
# Timeout in seconds for thrift calls to Hive service
## server_conn_timeout=120
因为hive服务的根文件保存在hdfs的/tmp目录下,而该目录只有root的执行权限,但hue是使用hue用户启动的,使用需要将/tmp目录修改成hue用户可以执行的权限。
hdfs dfs -chmod 777 /tmp
重新启动hue服务
启动hive服务
因为有些hive表依赖HBase,使用也启动HBase服务
在hue.ini配置文件的[[databases]]标签下修改MySQL的配置
# mysql, oracle, or postgresql configuration.
[[[mysql]]]
# Name to show in the UI.
nice_name="My SQL DB"
# For MySQL and PostgreSQL, name is the name of the database.
# For Oracle, Name is instance of the Oracle server. For express edition
# this is ‘xe‘ by default.
name=mysql
# Database backend to use. This can be:
# 1. mysql
# 2. postgresql
# 3. oracle
engine=mysql
# IP or hostname of the database to connect to.
host=node01.ouyang.com
# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
port=3306
# Username to authenticate with when connecting to the database.
user=root
# Password matching the username to authenticate with when
# connecting to the database.
password=root
重新启动hue,即可在hue的web界面进行MySQL数据库的操作。
在hue.ini配置文件的[[hbase]]标签下修改MySQL的配置
[hbase]
# Comma-separated list of HBase Thrift servers for clusters in the format of ‘(name|host:port)‘.
# Use full hostname with security.
hbase_clusters=(Cluster|node01.ouyang.com:9090)
# HBase configuration directory, where hbase-site.xml is located.
hbase_conf_dir=/export/servers/hbase/conf
配置完成后需重启hue服务和启动上述ip地址下的HBase的thriftserver服务。
bin/hbase-daemon.sh start thrift
CentOS6安装各种大数据软件 第九章:Hue大数据可视化工具安装和配置
原文:https://www.cnblogs.com/yangshibiao/p/10635546.html