数据库检索效率时,一般首要优化途径是从索引入手,然后根据需求再考虑更复杂的负载均衡、读写分离和分布式水平/垂直分库/表等手段;
索引通过信息冗余来提高检索效率,其以空间换时间并会降低数据写入的效率;因此对索引字段的选择非常重要。
本文以常用的IK Analyzer分词器为例,介绍如何在Neo4j中对字段新建全文索引实现模糊查询。
IKAnalyzer是一个开源的,基于java语言开发的轻量级的中文分词工具包。
IKAnalyzer3.0特性:
1
2
3
4
5
6
7
8
9
10
11
|
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">/ext.dic;</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords">/stopword.dic</entry>
</properties>
|
指定IKAnalyzer作为luncene分词的analyzer,并对所有Node的指定属性新建全文索引
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
[@Override](/user/Override)
public void createAddressNodeFullTextIndex () {
try (Transaction tx = graphDBService.beginTx()) {
IndexManager index = graphDBService.index();
Index<Node> addressNodeFullTextIndex =
index.forNodes( "addressNodeFullTextIndex", MapUtil.stringMap(IndexManager.PROVIDER, "lucene", "analyzer", IKAnalyzer.class.getName()));
ResourceIterator<Node> nodes = graphDBService.findNodes(DynamicLabel.label( "AddressNode"));
while (nodes.hasNext()) {
Node node = nodes.next();
//对text字段新建全文索引
Object text = node.getProperty( "text", null);
addressNodeFullTextIndex.add(node, "text", text);
}
tx.success();
}
}
|
对关键词(如’有限公司’),多关键词模糊查询(如’苏州 教育 公司’)默认都能检索,且检索结果按关联度已排好序。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
package uadb.tr.neodao.test;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.index.Index;
import org.neo4j.graphdb.index.IndexHits;
import org.neo4j.graphdb.index.IndexManager;
import org.neo4j.helpers.collection.MapUtil;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;
import org.wltea.analyzer.lucene.IKAnalyzer;
import com.lt.uadb.tr.entity.adtree.AddressNode;
import com.lt.util.serialize.JsonUtil;
/**
* AddressNodeNeoDaoTest
*
* [@author](/user/author) geosmart
*/
@RunWith(SpringJUnit4ClassRunner. class)
@ContextConfiguration(locations = { "classpath:app.neo4j.cfg.xml" })
public class AddressNodeNeoDaoTest {
[@Autowired](/user/Autowired)
GraphDatabaseService graphDBService;
[@Test](/user/Test)
public void test_selectAddressNodeByFullTextIndex() {
try (Transaction tx = graphDBService.beginTx()) {
IndexManager index = graphDBService.index();
Index<Node> addressNodeFullTextIndex = index.forNodes("addressNodeFullTextIndex" ,
MapUtil. stringMap(IndexManager.PROVIDER, "lucene", "analyzer" , IKAnalyzer.class.getName()));
IndexHits<Node> foundNodes = addressNodeFullTextIndex.query("text" , "苏州 教育 公司" );
for (Node node : foundNodes) {
AddressNode entity = JsonUtil.ConvertMap2POJO(node.getAllProperties(), AddressNode. class, false, true);
System. out.println(entity.getAll地址实全称());
}
tx.success();
}
}
}
|
1
2
3
4
|
profile
match (a:AddressNode{ruleabbr:‘TOW‘,text:‘唯亭镇‘})<-[r:BELONGTO]-(b:AddressNode{ruleabbr:‘STR‘})
where b.text=~ ‘金陵.*‘
return a,b
|
1
2
3
4
5
|
profile
START b=node:addressNodeFullTextIndex("text:金陵*")
match (a:AddressNode{ruleabbr:‘TOW‘,text:‘唯亭镇‘})<-[r:BELONGTO]-(b:AddressNode)
where b.ruleabbr=‘STR‘
return a,b
|
对label为AddressNode的节点,根据节点属性ruleabbr的分类addressnode_fulltext_index(省->市->区县->乡镇街道->街路巷/物业小区)/addressnode_exact_index(门牌号->楼幢号->单元号->层号->户室号),对属性text分别建不同类型的索引
1
2
3
4
|
profile
START a=node:addressnode_fulltext_index("text:商业街"),b=node:addressnode_exact_index("text:二期19")
match (a:AddressNode{ruleabbr:‘STR‘})-[r:BELONGTO]-(b:AddressNode{ruleabbr:‘TAB‘})
return a,b limit 10
|
原文:https://www.cnblogs.com/jpfss/p/11411128.html