Elasticsearch是一个分布式的免费开源搜索和分析引擎,能够实现近实时的数据搜索。在使用的过程中,由于各种原因可能导致集群写入或者查询缓慢,本文主要讲述集中常见的原因和解决方法。
当像索引(存储和使文档可被搜索)或者搜索数据的时候会出现类似如下429状态码的报错:
"status": 429, "error": {"type": "es_rejected_execution_exception", "reason": "rejected execution of org.elasticsearch.transport.TransportService$7@77c11b3c on EsThreadPoolExecutor[name = VM-1-1-1-1/write, queue capacity = 800, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@4349a9ab[Running, pool size = 32, active threads = 32, queued tasks = 800, completed tasks = 13026004]]"}}
1)通过kibana或者其它monitor监控查看
2)通过API自己取目前的值,然后存储自己计算:
http://192.168.1.12:9200/_stats
1)通过kibana或者其它monitor监控查看
2)通过API自己取目前的值,然后存储自己计算:
http://192.168.1.12:9200/_cat/thread_pool/write?v&h=id,name,active,queue,rejected,completed
curl -X PUT "192.168.1.12:9200/_settings?pretty" -H ‘Content-Type: application/json‘ -d‘
{
"index.indexing.slowlog.threshold.index.warn": "10s",
"index.indexing.slowlog.threshold.index.info": "5s",
"index.indexing.slowlog.threshold.index.debug": "2s",
"index.indexing.slowlog.threshold.index.trace": "500ms",
"index.indexing.slowlog.level": "info",
"index.indexing.slowlog.source": "1000"
}
‘
1)通过kibana或者其它monitor监控查看
2)通过API自己取目前的值,然后存储自己计算:
http://192.168.1.12:9200/_stats
1)通过kibana或者其它monitor监控查看
2)通过API自己取目前的值,然后存储自己计算:
http://192.168.1.12:9200/_cat/thread_pool/search?v&h=id,name,active,queue,rejected,completed
curl -X PUT "192.168.1.12:9200/_settings?pretty" -H ‘Content-Type: application/json‘ -d‘
{
"index.search.slowlog.threshold.query.warn": "10s",
"index.search.slowlog.threshold.query.info": "5s",
"index.search.slowlog.threshold.query.debug": "2s",
"index.search.slowlog.threshold.query.trace": "500ms",
"index.search.slowlog.threshold.fetch.warn": "1s",
"index.search.slowlog.threshold.fetch.info": "800ms",
"index.search.slowlog.threshold.fetch.debug": "500ms",
"index.search.slowlog.threshold.fetch.trace": "200ms",
"index.search.slowlog.level": "info"
}
‘
1)通过kibana或者其它monitor监控查看
2)通过API:
http://192.168.1.12:9200/_cat/nodes?v=true
# curl http://192.168.1.12:9200/_nodes/hot_threads?human=true
::: {iz2zedw788ifnqbcj4wygzz}{-PPLeiJfSp-JMbh-_ONsHA}{gsak_FfmTmK361M7W5wTOw}{192.168.1.12}{192.168.1.12:9300}{ml.machine_memory=50476195840, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
Hot threads at 2021-02-19T08:29:41.502, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
67.3% (336.6ms out of 500ms) cpu usage by thread ‘elasticsearch[iz2zedw788ifnqbcj4wygzz][search][T#16]‘
4/10 snapshots sharing following 59 elements
org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:279)
org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:211)
org.apache.lucene.index.OrdinalMap.<init>(OrdinalMap.java:261)
org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:168)
org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:147)
# curl "http://192.168.1.12:9200/_tasks?detailed" |jq
{
"nodes" : {
"-PPLeiJfSp-JMbh-_ONsHA" : {
"name" : "test",
"transport_address" : "192.168.1.12:9300",
"host" : "192.168.1.12",
"ip" : "192.168.1.12:9300",
"roles" : [
"master",
"data",
"ingest"
],
"attributes" : {
"ml.machine_memory" : "50476195840",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
},
"tasks": {
"-PPLeiJfSp-JMbh-_ONsHA:675047607": {
"node": "-PPLeiJfSp-JMbh-_ONsHA",
"id": 675047607,
"type": "transport",
"action": "indices:data/read/search",
"description": "indices[test-log-web-*], types[], search_type[QUERY_THEN_FETCH], source[{\"size\":1000,\"query\":{\"function_score\":{\"query\":{\"bool\":{\"must\":[{\"term\":{\"tags\":{\"value\":\"parse_success\",\"boost\":1.0}}},{\"nested\":{\"query\":{\"bool\":{\"must\":[{\"match\":{\"test.perspective.domain\":{\"query\":\"cs.xunyou.com\",\"operator\":\"OR\",\"prefix_length\":0,\"max_expansions\":50,\"fuzzy_transpositions\":true,\"lenient\":false,\"zero_terms_query\":\"NONE\",\"auto_generate_synonyms_phrase_query\":true,\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"path\":\"test.perspective\",\"ignore_unmapped\":false,\"score_mode\":\"avg\",\"boost\":1.0}},{\"range\":{\"@timestamp\":{\"from\":\"2021-02-18T17:16:28+0800\",\"to\":\"2021-02-19T17:16:28+0800\",\"include_lower\":false,\"include_upper\":false,\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"functions\":[{\"filter\":{\"match_all\":{\"boost\":1.0}},\"random_score\":{}}],\"score_mode\":\"multiply\",\"max_boost\":3.4028235E38,\"boost\":1.0}},\"_source\":{\"includes\":[\"test.perspective\"],\"excludes\":[]}}]",
"start_time_in_millis": 1613726202655,
"running_time_in_nanos": 11466239140,
"cancellable": true,
"headers": {}
}
}
}
}
}
查看task management API返回的description字段,可以确定正在运行的特定查询。running_time_in_nanos字段指出查询运行的时长。要降低CPU使用率,可以取消正在占用较高CPU的搜索查询。task management API还支持对cancellable为true的任务进行_cancel调用,通过指定任务ID来取消,如上例子中的任务ID为”-PPLeiJfSp-JMbh-_ONsHA:675047607“。
# curl -X POST "http://192.168.1.12:9200/_tasks/-PPLeiJfSp-JMbh-_ONsHA:675047607/_cancel?pretty"
{
"nodes" : {
"-PPLeiJfSp-JMbh-_ONsHA" : {
"name" : "iz2zedw788ifnqbcj4wygzz",
"transport_address" : "192.168.1.12:9300",
"host" : "192.168.1.12",
"ip" : "192.168.1.12:9300",
"roles" : [
"master",
"data",
"ingest"
],
"attributes" : {
"ml.machine_memory" : "50476195840",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
},
"tasks" : {
"-PPLeiJfSp-JMbh-_ONsHA:675047607" : {
"node" : "-PPLeiJfSp-JMbh-_ONsHA",
"id" : 675047607,
"type" : "transport",
"action" : "indices:data/read/search",
"start_time_in_millis" : 1613726202655,
"running_time_in_nanos" : 40438340371,
"cancellable" : true,
"headers" : { }
}
}
}
}
}
elasticsearch消耗cpu+内存+io资源,故当数据量到一定规模,会出现各种各样的问题。有的问题是由于查询语句造成的,有的是由于资源紧张造成的,遇见问题先定位到原因,就能慢慢解决掉。
https://www.elastic.co/guide/en/elasticsearch/reference/7.11/tasks.html
https://www.elastic.co/cn/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
https://aws.amazon.com/cn/premiumsupport/knowledge-center/resolve-429-error-es/
https://aws.amazon.com/cn/premiumsupport/knowledge-center/es-high-cpu-troubleshoot/
https://www.elastic.co/cn/blog/advanced-tuning-finding-and-fixing-slow-elasticsearch-queries
https://www.elastic.co/guide/cn/elasticsearch/guide/current/_monitoring_individual_nodes.html
原文:https://blog.51cto.com/leejia/2631971