Graph query languages typically feature graph pattern matching. This is the process of finding all matches for a given graph pattern. Graph pattern matching can (partly) be used for schema-instance consistency checking or validation as schema recognition (schema view). Assume a set of graph patterns, together forming the schema, then an instance conforms to a graph schema if all its subgraphs match with at least one graph pattern.

Graph pattern matching is typically defined in terms of the subgraph isomorphism problem which is known to be is NP-complete. It is the most strict semantic for pattern matching and preserves the topology of patterns in its matches. This is shown in figure 1b. Various relaxations based on similarity measures or simulation have been proposed to lower the complexity of pattern matching. Simulation can be computed in quadratic time (Henzinger et al. [1995]).

Figure 1a: Pattern
Figure 1b: Subgraph Isomorphism

The problem with non-isomorphic pattern semantics is that they do not preserve the topology of graphs. They may produce inexact matches that are significantly different in structure. For example, using simulation-based semantics a pattern node may map to several match nodes. As result a match may be a disconnected subgraph while the pattern is connected. This is shown in figure 1c.

Figure 1c: Graph simulation

Simulation

A graph $G = (V, E, \eta, \lambda, \upsilon)$ matches a pattern $Q[\bar{x}] = (V_Q, E_Q, \lambda_Q, \mu_Q)$ by simulation if there exists a binary relation $\asymp \in V_Q \times V$ such that:

  • for each $v_Q \asymp v$, $v_Q$ and $v$ have the same label, i.e. $\lambda_Q(v_Q) = \lambda(v)$, and
  • for each node $v_Q \in V_Q$, there exists a node $v \in V$ such that: $v_Q \asymp v$, and for each edge $(v_Q, v_Q') \in E_Q$, there exists an edge $(v, v') \in E$ such that $v_Q' \asymp v'$.

Simulation can be computed in quadratic time (Henzinger et al. [1995]). A distributed algorithm has been proposed by Fard et al. [2013].

As well as an incremental algorithm that computes changes to matches in response to updates, hereby minimizing unnecessary recomputations. The incremental algorithm is optimal as its time is bounded by the size of the changes in the input and output (Fan et al. [2013]).

Variations

A variation on simulation is bounded simulation by Fan et al. [2010]. In bounded simulation each edge is simulated by a path bounded by a maximum length. This is shown in figure 1d. Bounded simulation can be computed in cubic time. The authors provide an incremental algorithm that for acyclic graphs only computes the areas affected by updates, hereby minimizing unnecessary computations.

Figure 1d: Bounded simulation

Ma et al. [2011] propose another variation on simulation, strong simulation, which preserves many of the topological characteristics of graphs by enforcing duality and locality. Enforcement of duality preserves both the child and parent relationships, it requires that two matching nodes have the same edges and avoids simulation of a connected graph with a disconnected graph. This is called dual simulation and is shown in figure 2b. Locality eliminates excessive matches by constraining matches to be within a certain radius equal to the diameter of the pattern. Dual simulation and strong simulation can be computed in cubic-time.

Figure 2a: Pattern
Figure 2b: Dual simulation
Figure 2c: Strong simulation

Fard et al. [2013] propose distributed algorithms for simulation, dual simulation, strong simulation, and their own variation, strict simulation. Strict simulation performs better than strong simulation in terms of scalability while preserving its properties. They add an extra step to improve the enforcement of locality by computing a tighter radius based on the result of dual simulation (Ma
et al. [2011]).

Conclusion

Non-isomorphic graph pattern matching semantics do not preserve the topology of graphs and may produce inexact matches that are significantly different in structure. They offer lower computational complexities at the cost of their ability to capture the graph's topology.

Non-isomorphic graph pattern matching semantics can be used to balance computational cost and the amount of similarity in topology.


Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Yinghui Wu, and Yunpeng Wu. Graph pattern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment, 3(1-2):264–275, 2010.

Wenfei Fan, Xin Wang, and Yinghui Wu. Incremental graph pattern matching. ACM Transactions on Database Systems (TODS), 38(3):18, 2013.

A. Fard, M. U. Nisar, L. Ramaswamy, J. A. Miller, and M. Saltz. A distributed vertex-centric approach for pattern matching in massive graphs. In 2013 IEEE International Conference on Big Data, pages 403–411, Oct 2013. doi: 10.1109/BigData.2013.6691601.

M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS ’95, pages 453–, Washington, DC, USA, 1995. IEEE Computer Society. ISBN 0-8186-7183-1. URL http://dl.acm.org/citation.cfm?id=795662.796255.

Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, and Tianyu Wo. Capturing topology in graph pattern matching. Proceedings of the VLDB Endowment, 5(4):310–321, 2011.