Graph databases are typically used where the interconnectivity or topology of data is of importance (Abiteboul [1996], Angles and Gutierrez [2008]). Data follows the graph data model where: data is represented by graphs or by data structures generalizing the notion of graph; data manipulations are expressed by graph transformations or by operations whose main primitives are on graph features; and where appropriate integrity constraints can be defined over the graph structure (Angles and Gutierrez [2008]). This can be applied in social networks, information networks representing information flows, or biological networks.

Data is typically semi-structured in nature which is often explained as schemaless or self-describing (Abiteboul et al. [2000]). It is not raw like images or audio, but structured following an implicit and often-partial schema. Such schemas are not strictly defined or enforced as is the case with relational databases. With semi-structured data the schema may be defined after the database is populated, leading to the possibility of schema extraction (Abiteboul et al. [2000]). The graph schema is difficult to define, as the semi-structured nature of graph data makes it complex to derive an accurate schema.

Flexibility leads to higher levels of heterogeneity which results in issues of conceptually understanding what a data set represents, what it contains, how its structured, and how it evolves. This leads to increasing amounts of data and domain dependency as no schema is defined or enforced, while the structure may mutate over time.

There is need for a high-level description that captures the structural properties of the graph and the types and distributions of data. One approach is to define a notion of a graph schema.

Motivations for adding structure to graph data are: to optimize query evaluation; to facilitate data integration; to improve data partitioning and replication; to construct indices; and to forbid certain updates (Abiteboul et al. [2000]). As well as to support communication and understanding, analysis, and synthetic synthesis.

The focus is on the property graph model as it provides a good balance between expressiveness and complexity (Bonifati et al. [2018]). We believe the property graph model will be the next de facto standard for graphs as it enables efficient storage of data while allowing fast querying and traversals across connected data.

We can identify the following basic problems:

  • Defining a property graph schema formalism.
  • Given a property graph and schema, validate whether the graph conforms to the schema.
  • Given a query and property graph schema, validate whether the query is valid according to the schema.
  • Given a property graph schema, check for schema emptiness, i.e. whether the set of all conforming property graphs is empty.
  • Given a property graph schema, generate a conforming property graph.
  • Given a property graph, extract a conforming property graph schema.
  • Given two property graph schemas, check whether they subsume each other or whether they are equivalent.

S. Abiteboul. Querying semi-structured data. Technical Report 1996-19, Stanford InfoLab, 1996. URL http://ilpubs.stanford.edu:8090/144/.

Serge Abiteboul, P Buneman, and Dan Suciu. Data on the Web: From Relations to Semistructured Data and XML. 01 2000. ISBN 1-55860-622-X.

Renzo Angles and Claudio Gutierrez. Survey of graph database models. ACM Comput. Surv., 40(1):1:1–1:39, February 2008. ISSN 0360-0300. doi: 10.1145/1322432.1322433. URL http://doi.acm.org/10.1145/1322432.1322433.

Angela Bonifati, George Fletcher, Hannes Voigt, and Nikolay Yakovets. Querying graphs. Synthesis Lectures on Data Management, 10(3):1–184, 2018.