最近公司系统中全模糊查询很多,数据量又大,多表连接查询时会很影响性能。于是考虑使用搜索引擎来做全模糊查询,思路:
mysql数据库数据同步至ES类型,同步采用全量同步+定时增量方式,应用直接从ES中去查询想要的结果。
通过一番查找,决定使用elasticsearch-jdbc进行数据的同步,五六张表的连接结果,在数据量小的开发与测试环境运行正常,但在数据量比较大的性能测试环境做数据同步的话就会出现问题,以下为同步时报的一些错,github上也未找到相关有用的东西。群里问也都没人这样使用。
一种报错为连接提交时错误,以下为截取部分报错信息
[13:17:17,678][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 7 minutes 59 seconds = 479982 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:17:17,678][INFO ][metrics.sink.plain ][pool-3-thread-1] 7 minutes 59 seconds = 479335 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,589][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 8 minutes 44 seconds = 524264 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,589][INFO ][metrics.sink.plain ][pool-3-thread-1] 10 minutes 22 seconds = 622247 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,590][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 10 minutes 22 seconds = 622895 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,590][INFO ][metrics.sink.plain ][pool-3-thread-1] 10 minutes 22 seconds = 622247 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,595][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 10 minutes 22 seconds = 622900 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,598][INFO ][metrics.sink.plain ][pool-3-thread-1] 10 minutes 22 seconds = 622256 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,599][INFO ][metrics.source.plain ][pool-3-thread-1] totalrows = 0, 10 minutes 22 seconds = 622904 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,599][INFO ][metrics.sink.plain ][pool-3-thread-1] 10 minutes 22 seconds = 622257 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[13:19:40,618][WARN ][importer.jdbc.source.standard][pool-2-thread-1] while closing read connection: Communications link failure during commit(). Transaction resolution unknown.
另一种为线程内存溢出:
[17:42:34,243][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 0, 8 minutes 30 seconds = 510305 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[17:43:33,523][INFO ][metrics.sink.plain ][pool-5-thread-1] 8 minutes 36 seconds = 516618 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[17:46:00,561][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 0, 11 minutes 19 seconds = 679116 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[17:47:27,876][INFO ][metrics.sink.plain ][pool-5-thread-1] 12 minutes 37 seconds = 757511 ms, submitted = 0, succeeded = 0, failed = 0, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
[17:48:23,974][INFO ][metrics.source.plain ][pool-5-thread-1] totalrows = 0, 13 minutes 37 seconds = 817186 ms, 0 = 0.0 bytes bytes, 0.0 bytes = 0 avg size, 0 dps, 0 MB/s
Exception in thread "pool-5-thread-1" Exception in thread "elasticsearch[importer][generic][T#3]" java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1855)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2035)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.SynchronousQueue$TransferStack.snode(SynchronousQueue.java:318)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:361)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.io.IOException: pool did not terminate
at org.xbib.tools.JDBCImporter.shutdown(JDBCImporter.java:265)
at org.xbib.tools.JDBCImporter$2.run(JDBCImporter.java:322)
需想方法来解决批量数据从mysql同步到ES的方法,查到ES有bulk_request api,但这个都是对文本或日志进行批量处理导入,貌似还未见使用于mysql数据库近实时同步。问题待解决中。
本文出自 “努力奔向前方” 博客,请务必保留此出处http://liucb.blog.51cto.com/3230681/1907105
Elasticsearch-jdbc批量同步mysql数据失败
原文:http://liucb.blog.51cto.com/3230681/1907105