如何使用字符表示图关系？

时间：2020-10-24 23:39:35 阅读：75 评论：0 收藏：0 [点我收藏+]

　　知识图谱听起来很高大上，而且也应用广泛。而图数据库，你可以到网上搜搜，基本就是像 neo4j, janusgraph, HugeGraph...
　　如果想让做个类似的图谱的东西，你会怎么办呢？一来就上真的图谱真的好吗？也许前期就三两个关系链，也许只是业务试水，你就去搞个真的图数据库过来？是不是太浪费了。
　　是的，实际上前期我们最好自己实现一些简单的关系链维护即可。
　　那么，为了能够适应稍微的关系变化，也许我们还是需要效仿下图数据库的概念。那么，现在的第一个问题就是：如何使用文字表述一个图关系链？

1. 如何定义规范？

　　图数据库三大要素: 实体, 关系, 客体。
　　实际上要解决这个问题倒也不难，只要自己定一种表示方法，自己能看懂就行，不去管其他人。比如用 ‘1,2,3‘ 代表先1后2再3... 但实际上，想要表示稍微复杂点的结构，也许并不是特别容易呢。而且，如果想要考虑后续可能的切真正的图数据库，为何不参考下别人的标准呢？
　　比如现在通用些的，cypher, gremlin... 大家可以网上搜索下资料，参考下来，好像cypher更形象化些，尤其是各种箭头的使用比较方便。
　　比如要表示A与的B的关系可以是: (:A)-[:关系]->(:B)
　　而对于多个复杂关系，则可以用多个类似的关系关联起来就可以了。
　　嗯，看起来不错。表示的方式定好了，那么我们如何具体处理关系呢？

2. 如何表示一个现实的图关系？

　　如下图所示，我们有如下关系，应该如何定义字符表达方法，以达到配置的目的？

技术分享图片

　　按照第1节中我们定义的规范，我们可以用如下字符串表示。

    (:PEOPLE)-[:养宠物]->(:CAT)-[:吃]->(:RICE)
    ,(:PEOPLE)-[:吃]->(:RICE)
    ,(:PEOPLE)-[:养宠物]->(:DOG)
    ,(:PEOPLE)-[:拥有]->(:HOUSE)
    ,(:PEOPLE)-[:干活]->(:JOB)
    ,(:CAT)-[:朋友]->(:DOG)
    ,(:DOG)-[:吃]->(:RICE)
    ,(:JOB)-[:产出]->(:BRICK)
    ,(:HOUSE)<-[:构件]-(:BRICK)
    ,(:HOUSE)<-[:构件]-(:GLASS)

　　应该说还是比较直观的，基本上我们只要按照图所示的关系，描述出出入边和关系就可以了。而且还有相应的cypher官方规范支持，也不用写文档，大家就可以很方便的接受了。

3. 如何解析图关系？

　　如上，我们已经用字符串表示出了关系了。但单是字符串，是并不能被应用理解的。我们需要解析为具体的数据结构，然后才可以根据关系推导出具体的血缘依赖。这是本文的重点。

　　实际也不复杂，我们仅仅使用到了cypher中非常少的几个元素表示法，所以也仅需解析出该几个字符，然后在内存中构建出相应的关系即可。

　　具体代码实现如下：

3.1. 解析框架

　　所谓框架就是整体流程管控代码，它会让你明白整个系统是如何work的。

import com.my.mvc.app.common.helper.graph.GraphNodeEntityTree;
import com.my.mvc.app.common.helper.graph.NodeDiscoveryDirection;
import com.my.mvc.app.common.helper.graph.VertexEdgeSchemaDescriptor;
import com.my.mvc.app.common.helper.graph.VertexOrEdgeType;
import com.my.mvc.app.common.util.CommonUtil;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
 * 功能描述: 简单图语法解析器（类 cypher 语法）
 *
 *      请参考网上 cypher 资料
 *
 */
public class SimpleGraphSchemaSyntaxParser {

    /**
     * 解析配置图谱关系配置为树结构
     *
     * @param cypherGraphSchema 类cypher语法的 关系表示语句
     * @return 解析好的树结构
     */
    public static GraphNodeEntityTree parseGraphSchemaAsTree(String cypherGraphSchema) {
        List<VertexEdgeSchemaDescriptor> flatNodeList = tokenize(cypherGraphSchema);
        return buildGraphAstTree(flatNodeList);
    }

    /**
     * 构建图关系抽象语法树
     *
     * @param flatNodeList 平展的图节点列表
     * @return 构建好的实例
     */
    private static GraphNodeEntityTree buildGraphAstTree(
                        List<VertexEdgeSchemaDescriptor> flatNodeList) {
        Map<String, GraphNodeEntityTree>
                uniqVertexContainer = new HashMap<>();
        GraphNodeEntityTree root = new GraphNodeEntityTree(flatNodeList.get(0));
        uniqVertexContainer.put(flatNodeList.get(0).getVertexLabelType(), root);
        GraphNodeEntityTree parent;
        GraphNodeEntityTree afterNode;
        for ( int i = 1; i < flatNodeList.size(); i++ ) {
            VertexEdgeSchemaDescriptor vertexOrEdge1 = flatNodeList.get(i);
            if(vertexOrEdge1.getNodeType() == VertexOrEdgeType.EDGE) {
                // 存在重复节点，需重建关系
                VertexEdgeSchemaDescriptor vertexPrev = flatNodeList.get(i - 1);
                if(vertexPrev.getNodeType() != VertexOrEdgeType.VERTEX) {
                    continue;
                }
                if(++i >= flatNodeList.size()) {
                    throw new RuntimeException("缺少客体关系配置, near 边["
                            + vertexOrEdge1.getRawWord() + "]");
                }
                VertexEdgeSchemaDescriptor relation = vertexOrEdge1;
                VertexEdgeSchemaDescriptor vertexAfter = flatNodeList.get(i);
                parent = uniqVertexContainer.get(vertexPrev.getVertexLabelType());
                afterNode = uniqVertexContainer.get(vertexAfter.getVertexLabelType());
                if(parent == null) {
                    parent = root;
                    uniqVertexContainer.putIfAbsent(vertexAfter.getVertexLabelType(),
                            parent);
                }
                if(afterNode == null) {
                    afterNode = new GraphNodeEntityTree(vertexAfter);
                    uniqVertexContainer.put(vertexAfter.getVertexLabelType(), afterNode);
                }
                if(relation.getDirection() == NodeDiscoveryDirection.OUT) {
                    parent.addOutVertex(afterNode, relation);
                }
                else {
                    parent.addInVertex(afterNode, relation);
                }
            }
        }
        root.setUniqVertexTypes(uniqVertexContainer);
        return root;
    }

    /**
     * 拆分图关系schema为 可理解的边和点
     *
     * @param cypherGraphSchema 建关系语句,如 (:BASE_LABEL)-[:被组合引用]->(:COMPOSE_LABEL)
     * @return 拆解后的token列表
     */
    private static List<VertexEdgeSchemaDescriptor> tokenize(String cypherGraphSchema) {
        String[] relationArr = cypherGraphSchema.split(",");
        List<VertexEdgeSchemaDescriptor> flatNodeList = new ArrayList<>();
        for (String relation1 : relationArr) {
            char[] src = relation1.trim().toCharArray();
            for (int i = 0; i < src.length; i++) {
                char ch = src[i];
                // 顶点
                if(ch == ‘(‘) {
                    StringBuilder specNameBuilder = new StringBuilder();
                    while (i + 1 < src.length) {
                        char nextCh = src[i + 1];
                        if(nextCh == ‘:‘) {
                            String vertexLabel = CommonUtil.readSplitWord(
                                    src, i, ‘:‘, ‘)‘, false);
                            flatNodeList.add(VertexEdgeSchemaDescriptor.newVertex(
                                    specNameBuilder.toString() + ":" + vertexLabel,
                                    vertexLabel));
                            i += vertexLabel.length() + 2;
                            break;
                        }
                        specNameBuilder.append(nextCh);
                        ++i;
                    }
                    continue;
                }
                // 边关系, (:SRC)-[:RELATION]->(:DST)
                if(ch == ‘-‘ &&
                        i + 1 < src.length && src[i + 1] == ‘[‘) {
                    ++i;
                    StringBuilder specNameBuilder = new StringBuilder();
                    while (i + 1 < src.length) {
                        char nextCh = src[i + 1];
                        if(nextCh == ‘:‘) {
                            String edgeLabel = CommonUtil.readSplitWord(
                                    src, i, ‘:‘, ‘]‘, false);
                            int nextVertexStart = i + edgeLabel.length() + 2;
                            if(nextVertexStart + 2 >= src.length) {
                                throw new RuntimeException("血缘图谱配置错误: 缺少客体" +
                                        ", near ‘" + new String(src, nextVertexStart,
                                        src.length - nextVertexStart));
                            }
                            if(src[++nextVertexStart] != ‘-‘
                                    || src[++nextVertexStart] != ‘>‘) {
                                throw new RuntimeException("血缘图谱配置错误: 主体后面需紧跟关系 ->" +
                                        ", near ‘" + new String(src, nextVertexStart,
                                            src.length - nextVertexStart));
                            }
                            flatNodeList.add(VertexEdgeSchemaDescriptor.newEdge(
                                    specNameBuilder.toString() + ":" + edgeLabel,
                                    edgeLabel, NodeDiscoveryDirection.OUT));
                            i = nextVertexStart;
                            break;
                        }
                        specNameBuilder.append(nextCh);
                        ++i;
                    }
                    continue;
                }
                // 边关系, (:SRC)<-[:RELATION]-(:DST)
                if(ch == ‘<‘) {
                    if(i + 2 > src.length) {
                        throw new RuntimeException("血缘配置错误: 长度不匹配, near ‘"
                                + new String(src, i, src.length - i));
                    }
                    if(src[++i] != ‘-‘ || src[++i] != ‘[‘) {
                        throw new RuntimeException("血缘配置错误: 边关系配置错误, near ‘"
                                + new String(src, i, src.length - i));
                    }
                    StringBuilder specNameBuilder = new StringBuilder();
                    while (i + 1 < src.length) {
                        char nextCh = src[i + 1];
                        if(nextCh == ‘:‘) {
                            String edgeLabel = CommonUtil.readSplitWord(
                                    src, i, ‘:‘, ‘]‘, false);
                            int nextVertexStart = i + edgeLabel.length() + 2;
                            if(nextVertexStart + 2 >= src.length) {
                                throw new RuntimeException("血缘图谱配置错误: 缺少客体" +
                                        ", near ‘" + new String(src, nextVertexStart,
                                        src.length - nextVertexStart));
                            }
                            if(src[++nextVertexStart] != ‘-‘
                                    || src[nextVertexStart + 1] != ‘(‘) {
                                throw new RuntimeException("血缘图谱配置错误: 主体后面需紧跟关系 -> " +
                                        ", near ‘" + new String(src, nextVertexStart,
                                        src.length - nextVertexStart));
                            }
                            flatNodeList.add(VertexEdgeSchemaDescriptor.newEdge(
                                    specNameBuilder.toString() + ":" + edgeLabel,
                                    edgeLabel, NodeDiscoveryDirection.IN));
                            i = nextVertexStart;
                            break;
                        }
                        specNameBuilder.append(nextCh);
                        ++i;
                    }
                }
            }
        }
        return flatNodeList;
    }


}

　　怎么样，不复杂吧。就是两个步骤：1. 解析每个单个元素信息； 2. 根据单元素信息，构建出上下级关系；

　　使用 IN 代表入方向关系，用 OUT 代表出方向关系，每两个顶点之间都有一条边相连。大体就是这样了。但是明显，还有许多细节需要我们去考虑，比如边关系放在哪里？如何添加相关节点？这些东西是需要特定的数据结构支持的。看我细细道来：

3.2. 单节点数结构

　　所谓单节点，即是站在任意关系点上来看整体图的结构，如果整个图是连通的，那么理论上，通过这个节点所以探索到任意其他节点。所以，其实它非常重要。

package com.my.mvc.app.common.helper.graph;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
 * 功能描述: 简单图结构树描述类
 *
 */
public class GraphNodeEntityTree {
    /**
     * 当前顶点描述
     */
    private VertexEdgeSchemaDescriptor vertex;

    /**
     * 关系边容器
     */
    private Map<NodeDiscoveryDirection, List<RelationWithVertexDescriptor>>
                    relations = new HashMap<>();
    /**
     * 入射方向节点
     */
    private List<GraphNodeEntityTree> in = new ArrayList<>();

    /**
     * 出射方向节点
     */
    private List<GraphNodeEntityTree> out = new ArrayList<>();

    /**
     * 所有顶点实例容器
     */
    private Map<String, GraphNodeEntityTree> uniqVertexTypes;
    
    public GraphNodeEntityTree(VertexEdgeSchemaDescriptor vertex) {
        this.vertex = vertex;
        uniqVertexTypes = new HashMap<>();
    }

    public void setUniqVertexTypes(Map<String, GraphNodeEntityTree> uniqVertexTypes) {
        this.uniqVertexTypes = uniqVertexTypes;
    }

    public void addRelation(VertexEdgeSchemaDescriptor srcEntity,
                            VertexEdgeSchemaDescriptor relation,
                            VertexEdgeSchemaDescriptor dstEntity) {
        List<RelationWithVertexDescriptor> list = relations.computeIfAbsent(
                                relation.getDirection(), k -> new ArrayList<>());
        list.add(new RelationWithVertexDescriptor(srcEntity, relation, dstEntity));
    }

    public GraphNodeEntityTree addInVertex(GraphNodeEntityTree embeddedEntity,
                                           VertexEdgeSchemaDescriptor relation) {
        embeddedEntity.addOutVertexInner(this, relation.reverseDirection());
        addInVertexInner(embeddedEntity, relation);
        return embeddedEntity;
    }

    private GraphNodeEntityTree addInVertexInner(GraphNodeEntityTree embeddedEntity,
                                                VertexEdgeSchemaDescriptor relation) {
        in.add(embeddedEntity);
        addRelation(vertex, relation, embeddedEntity.getVertex());
        return embeddedEntity;
    }

    public GraphNodeEntityTree addOutVertex(GraphNodeEntityTree embeddedEntity,
                                            VertexEdgeSchemaDescriptor relation) {
        embeddedEntity.addInVertexInner(this, relation.reverseDirection());
        addOutVertexInner(embeddedEntity, relation);
        return embeddedEntity;
    }

    private GraphNodeEntityTree addOutVertexInner(GraphNodeEntityTree embeddedEntity,
                                                  VertexEdgeSchemaDescriptor relation) {
        out.add(embeddedEntity);
        addRelation(this.getVertex(), relation, embeddedEntity.getVertex());
        return embeddedEntity;
    }

    public VertexEdgeSchemaDescriptor getVertex() {
        return vertex;
    }

    /**
     * 获取关系名称
     *
     * @param nodeIndex 节点序号
     * @param direction 方向
     * @return 关系名称描述
     */
    public String getRelationName(int nodeIndex, NodeDiscoveryDirection direction) {
        List<RelationWithVertexDescriptor> list = relations.get(direction);
        if(list == null || list.isEmpty()) {
            return null;
        }
        return list.get(nodeIndex).getRelationName();
    }

    public List<GraphNodeEntityTree> getIn() {
        return in;
    }

    public List<GraphNodeEntityTree> getOut() {
        return out;
    }

    /**
     * 快速获取图节点根（根据顶点label）
     *
     * @param vertexLabel 顶点标识
     * @return 节点所在实例, 找不到对应节点则返回 null
     */
    public GraphNodeEntityTree getNodeEntityTreeByVertexLabel(
            String vertexLabel) {
        return uniqVertexTypes.get(vertexLabel);
    }
}

　　可以说后续的操作入口都是在这里的，所以重点关注。

3.3. 图顶点和边描述类

　　最开始有一个token化的过程，那么token化之后，如何定义也比较重要，我们统一使用一个描述类来定义：

package com.my.mvc.app.common.helper.graph;


/**
 * 功能描述: 图顶点和边描述类
 *
 */
public class VertexEdgeSchemaDescriptor {
    private String rawWord;
    private VertexOrEdgeType nodeType;
    private String vertexLabelType;
    private String relationName;
    private NodeDiscoveryDirection direction;

    private VertexEdgeSchemaDescriptor(String rawWord,
                                       VertexOrEdgeType nodeType,
                                       String vertexLabelType,
                                       String relationName,
                                       NodeDiscoveryDirection direction) {
        this.rawWord = rawWord;
        this.nodeType = nodeType;
        this.vertexLabelType = vertexLabelType;
        this.relationName = relationName;
        this.direction = direction;
    }

    /**
     * 新建顶点实例
     *
     * @param rawWord 原始字符描述
     * @param vertexLabelType 解析后的顶点类型（枚举完成所有点类型）
     * @return 顶点实例
     */
    public static VertexEdgeSchemaDescriptor newVertex(String rawWord, String vertexLabelType) {
        return new VertexEdgeSchemaDescriptor(rawWord, VertexOrEdgeType.VERTEX,
                vertexLabelType, null, null);
    }

    /**
     * 新建边实例
     *
     * @param rawWord 原始字符描述
     * @param relationName 关系名称（当id使用）
     * @param direction 关系方向（ -> 出方向OUT, <- 入方向IN ）
     * @return 边实例
     */
    public static VertexEdgeSchemaDescriptor newEdge(String rawWord,
                                                     String relationName,
                                                     NodeDiscoveryDirection direction) {
        return new VertexEdgeSchemaDescriptor(rawWord, VertexOrEdgeType.EDGE,
                null, relationName, direction);
    }

    public String getRawWord() {
        return rawWord;
    }

    public VertexOrEdgeType getNodeType() {
        return nodeType;
    }

    public String getVertexLabelType() {
        return vertexLabelType;
    }

    public String getRelationName() {
        return relationName;
    }

    public NodeDiscoveryDirection getDirection() {
        return direction;
    }

    public VertexEdgeSchemaDescriptor reverseDirection() {
        return new VertexEdgeSchemaDescriptor(rawWord, nodeType,
                vertexLabelType, "-" + relationName,
                direction.reverse());
    }

    @Override
    public String toString() {
        // 点描述
        if(nodeType == VertexOrEdgeType.VERTEX) {
            return nodeType + "{" +
                    "rawWord=‘" + rawWord + ‘\‘‘ +
                    ", vertexLabelType=" + vertexLabelType +
                    ‘}‘;
        }
        // 边描述
        return nodeType + "{" +
                "rawWord=‘" + rawWord + ‘\‘‘ +
                ", relationName=‘" + relationName + ‘\‘‘ +
                ", direction=" + direction +
                ‘}‘;
    }
}

　　主要就是原始字符串，定义边、定义点。类似与单词的聚合吧。

3.4. 节点关系描述

　　我们需要清楚地知道各个点与各个点间的关系，所以需要一个关系描述类，来展示这东西。（实际上核心并未使用该关系）

package com.my.mvc.app.common.helper.graph;

/**
 * 功能描述: 关系实例, 实体 -> 关系 -> 客体
 *
 */
public class RelationWithVertexDescriptor {
    /**
     * 源点、起点
     */
    private final VertexEdgeSchemaDescriptor srcVertex;

    /**
     * 目标点
     */
    private final VertexEdgeSchemaDescriptor dstVertex;

    /**
     * 关系（名称）
     */
    private final VertexEdgeSchemaDescriptor relation;

    public RelationWithVertexDescriptor(VertexEdgeSchemaDescriptor srcVertex,
                                        VertexEdgeSchemaDescriptor relation,
                                        VertexEdgeSchemaDescriptor dstVertex) {
        this.srcVertex = srcVertex;
        this.dstVertex = dstVertex;
        this.relation = relation;
    }

    public VertexEdgeSchemaDescriptor getSrcVertex() {
        return srcVertex;
    }

    public VertexEdgeSchemaDescriptor getDstVertex() {
        return dstVertex;
    }

    /**
     * 获取当前关系名称
     */
    public String getRelationName() {
        return relation.getRelationName();
    }

    @Override
    public String toString() {
        if(relation.getDirection() == NodeDiscoveryDirection.OUT) {
            return srcVertex.getRawWord() + "(" + srcVertex.getVertexLabelType() + ")" +
                    " -> " + relation.getRelationName() +
                    " -> " + dstVertex.getRawWord() + "(" + dstVertex.getVertexLabelType() + ")"
                    ;
        }
        return srcVertex.getRawWord() + "(" + srcVertex.getVertexLabelType() + ")" +
                " <- " + relation.getRelationName() +
                " <- " + dstVertex.getRawWord() + "(" + dstVertex.getVertexLabelType() + ")"
                ;

    }
}

　　虽实际用处不大，但是当你在debug的时候，这个描述类可以很方便地让你观察到解析是否正确。

3.5. 几个基础类型定义

　　1. 方向定义

package com.my.mvc.app.common.helper.graph;

/**
 * 功能描述: 探索方向定义
 *
 * @since 2020/10/12
 */
public enum NodeDiscoveryDirection {

    /**
     * 入方向, 上游
     */
    IN,

    /**
     * 出方向, 下游
     */
    OUT,
    ;

    public NodeDiscoveryDirection reverse() {
        if(this == OUT) {
            return IN;
        }
        return OUT;
    }
}

　　2. 边或点类型定义

package com.my.mvc.app.common.helper.graph;

/**
 * 功能描述: 边或点类型定义
 *
 */
public enum VertexOrEdgeType {
    VERTEX,
    EDGE,
    ;
}

　　如此，整个解析模块就完成了。你可以完整的将如上字符解析为实体关系了。

4. 单元测试