实现elasticsearch网关，兼容不同版本es,滚动升级-功能验证开发

时间：2021-03-28 17:14:13 阅读：29 评论：0 收藏：0 [点我收藏+]

接上一篇

https://www.cnblogs.com/zihunqingxin/p/14563640.html

项目验证目标

最初目标是完成elasticsearch7.10.2 和 elasticsearch 6.8.14的版本兼容

通过remote cluster 不同版本es并行，高版本为主，低版本为辅，低版本内的数据逐步淘汰，过渡升级至elasticsearch7.10.2

项目地址

https://github.com/cclient/elasticsearch-multi-cluster-compat-proxy

新/旧index的判断基准

新索引写入elasticsearch7.10.2,旧索引写入elasticsearch 6.8.14

配置在mysql数据库中

original index名称，唯一对应 dest index 指向对应的es版本
如mysql内无记录，则通过索引的规范化名称来判断

例如索引里有日期filebeat_202101_log，约定一个时间基准，判断新/旧索引

项目已开发完成，且基本功能性验证通过(功能正常，性能压测等还顾及不到)

因为只是作一期的可行性验证，所以代码比较简略

有以下几个注意项

1 spring-boot 2.4.4 支持 APPLICATION_NDJSON_VALUE，2.3.9.RELEASE 不支持，但不影响，用 MediaType.APPLICATION_JSON_VALUE即可

package org.springframework.http;
public static final String APPLICATION_NDJSON_VALUE = "application/x-ndjson";

使用了spring-boot-starter-webflux 而不是spring-boot-starter-web

并没有完全按webflux的规范开发，还是按标准mvc做的，选择webflux,主要是考虑到其底层是netty 并发会比spring-web会强一些，该服务本身也不是业务逻辑多的服务，并不需要结合其他web组件

spring-boot-starter-data-elasticsearch，实际最后没用

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.4.4</version>
	</parent>	
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
		</dependency>
2.4.4版本 spring-boot-starter-data-elasticsearch 有bug
2.3.9.RELEASE版本 spring-boot-starter-data-elasticsearch 正常使用

elasticsearch-rest-high-level-client，实际最后没用

通过elasticsearch-rest-high-level-client访问es _search请求还好 _bulk 实现比较麻烦，技术上没难度，

项目里保留了直接访问es sdk访问的代码，做为参考，有兴趣的可以改为用官方sdk实现

    public String doPostBulk(EsHost[] esHosts, List<String> lines, String auth, boolean isSSL) throws IOException {
        RestHighLevelClient client=getClient(esHosts,auth,isSSL);
        BulkRequest bulkRequest=new BulkRequest();
        //index
        IndexRequest indexRequest=new IndexRequest("index","type","id");
        indexRequest.source(new HashMap<String,Object>(1){{
            put("field1", "value1");
        }});
        //delete
        DeleteRequest deleteRequest=new DeleteRequest("index","type","id");
        //update
        UpdateRequest updateRequest=new UpdateRequest("index","type","id");
        updateRequest.doc(new HashMap<String,Object>(1){{
            put("field2", "value2");
        }});
        bulkRequest.add(indexRequest);
        bulkRequest.add(deleteRequest);
        bulkRequest.add(updateRequest);
        BulkResponse bulkResponse=client.bulk(bulkRequest,RequestOptions.DEFAULT);
        return Strings.toString(bulkResponse.toXContent(JsonXContent.contentBuilder(), ToXContent.EMPTY_PARAMS).humanReadable(true));
    }

pure http请求的方式，直接把lines做为字符串提交即可(这里有个隐患是 \n的处理)

没有做client负载分流

elasticsearch 服务本身可能会有多个client节点

okhttp 只访问了唯一地址，目前并不支持es的多client

尝试elasticsearch-rest-high-level-client本来是为了多client支持的，因为_bulk实现比较费事，放弃了
实现了https支持，主要是跳过证书验证，但未测试
单节点负载能力有限，需要做多节点搞分布式，分布式未做设计
针对_bulk的解析

一开始想的简单认为偶数行是索引信息 {_index,_type,_id,_opera}需要解析变更，奇数行不用处理

操作分index,delete,update,delete,其中delete 没有下一行的内容项，使用奇偶判断会有误差

其实个人经验上，要限制对es的delete操作，一方面，物理删除，部分情况导致问题排查困难，另一方面，大数据除写入es外，还有双写/多写至其他存储的场景，delete操作，双写/多写，也较难同步

建议是把物理删除统一变为逻辑删除，新增字段isDeleted，deleteDate，查询时以isDeleted过滤，后台定期清除deleteDate超时的数据，也可配合es的ilm周期，在ilm merge前执行物理删除

屏蔽es delete ，把物理删除，改为逻辑删除后，不会有奇偶不确定性问题，只用对偶数行作json解析，奇数行保持现状,但为了和es完全兼容，还是按会存在delete来实现

目前找到规律顺序处理，实际也不用解析所有行