部署常见问题:
1.mysql jar错误
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "DBCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
拷贝hive的mysql-connector-java.xxxx.jar文件至impala的库文件目录即可默认(/usr/lib/impala/lib)
2.hdfs namenode错误
E0127 19:48:16.708744 31675 impala-server.cc:339] Could not read the HDFS root directory at hdfs://bipcluster. Error was:
Operation category READ is not supported in state standby
namenode ha没有自动开启,导致两个namenode都在standby状态。
手动设置为active状态即可。
3.impala特性支持
E0127 19:28:25.289991 13469 impala-server.cc:339] ERROR: short-circuit local reads is disabled because
- Impala cannot read or execute the parent directory of dfs.domain.socket.path
- dfs.client.read.shortcircuit is not enabled.
ERROR: block location tracking is not properly enabled because
- dfs.client.file-block-storage-locations.timeout is too low. It should be at least 3000.
E0127 19:28:25.290117 13469 impala-server.cc:341] Aborting Impala Server startup due to improper configuration
hdfs的配置文件hdfs-site.xml增加如下内容:
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
<name>dfs.client.file-block-storage-locations.timeout</name>
<value>3000</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
使用常见错误:
4.创建表错误
impala默认使用impala用户运行,创建表时,会由于hdfs权限导致创建报错
Query: create table nginx_test (line string) STORED AS TEXTFILE
ERROR: MetaException: Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=impala, access=WRITE, inode="/bip/hive_warehouse/cdnlog.db":hdfs:hdfs:drwxr-xr-x
5.查询出错
ERROR: Failed to open HDFS file hdfs://bipcluster/bip/hive_warehouse/cdnlog.db/dd_log/dt=20140117/data.file
Error(255): Unknown error 255
hdfsOpenFile(hdfs://bipcluster/bip/hive_warehouse/cdnlog.db/dd_log/dt=20140117/data.file): FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) error:
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1115)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
报hdfs文件打开错误,通过hadoop fs -cat可以查看文件内容,即impala和datanode通讯出错,重启impala进程即可
6.
1)hive有隐式转换的功能,可以直接avg(string字段)(如果是数字型的值),impala的话需要手动cast
比如下面这个:
[10.19.111.106:21000] > select avg(status) from dd_log where dt=‘20140117‘;
Query: select avg(status) from dd_log where dt=‘20140117‘
ERROR: AnalysisException: AVG requires a numeric or timestamp parameter: AVG(status)
可以通过下面的方式运行:
select avg(cast(status as DOUBLE)) from dd_log where dt=‘20140117‘;
2)ERROR: NotImplementedException: ORDER BY without LIMIT currently not supported
impala中order by 需要limit的限制才可以运行,否则报错,可以通过limit一个很大的值来查看所有的数据,另外limit不支持 limit a,b这种格式。
select ip,count(1) as cnt from cdnlog.dd_log group by ip order by cnt desc limit 100000000;