SmartGraphs

SmartGraphs are only available in the Enterprise Edition, including ArangoDB Oasis.

This chapter describes the smart-graph module, which enables you to manage graphs at scale. It will give a vast performance benefit for all graphs sharded in an ArangoDB Cluster. On a single server this feature is pointless, hence it is only available in cluster mode.

In terms of querying there is no difference between SmartGraphs and General Graphs. The former is a transparent replacement for the latter. For graph querying please refer to AQL Graph Operations and General Graph Functions sections. The optimizer is clever enough to identify whether it is a SmartGraph or not.

The difference is only in the management section: creating and modifying the underlying collections of the graph. For a detailed API reference please refer to SmartGraph Management.

Do the hands-on ArangoDB SmartGraphs Tutorial to learn more.

What makes a graph smart?

Most graphs have one feature that divides the entire graph into several smaller subgraphs. These subgraphs have a large amount of edges that only connect vertices in the same subgraph and only have few edges connecting vertices from other subgraphs.

Examples for these graphs are:

  • Social Networks
    Typically the feature here is the region/country users live in. Every user typically has more contacts in the same region/country then she has in other regions/countries

  • Transport Systems
    For those also the feature is the region/country. You have many local transportation but only few across countries.

  • E-Commerce
    In this case probably the category of products is a good feature. Often products of the same category are bought together.

If this feature is known, SmartGraphs can make use if it.

When creating a SmartGraph you have to define a smartAttribute, which is the name of an attribute stored in every vertex. The graph will than be automatically sharded in such a way that all vertices with the same value are stored on the same physical machine, all edges connecting vertices with identical smartAttribute values are stored on this machine as well. During query time the query optimizer and the query executor both know for every document exactly where it is stored and can thereby minimize network overhead. Everything that can be computed locally will be computed locally.

Benefits of SmartGraphs

Because of the above described guaranteed sharding, the performance of queries that only cover one subgraph have a performance almost equal to an only local computation. Queries that cover more than one subgraph require some network overhead. The more subgraphs are touched the more network cost will apply. However the overall performance is never worse than the same query using a General Graph.

Benefits of Hybrid SmartGraphs

Hybrid SmartGraphs are capable of using SatelliteCollections within their graph definition. Therefore, edge definitions defined between SmartCollections and SatelliteCollections can be created. As SatelliteCollections (and the edge collections between SmartGraph collections and SatelliteCollection) are globally replicated to each participating DB-Server, (weighted) graph traversals and (k-)shortest path(s) queries can partially be executed locally on each DB-Server. This means a larger part of the query can be executed fully local whenever data from the SatelliteCollections is required.

Benefits of Disjoint SmartGraphs

Disjoint SmartGraphs are a specialized type of SmartGraphs.

In addition to the guaranteed sharding in SmartGraphs, a Disjoint SmartGraph prohibits edges between vertices with different smartGraphAttribute values.

This ensures that graph traversals, shortest path, and k-shortest-paths queries can be executed locally on a DB-Server, achieving improved performance for these type of queries.

Benefits of Hybrid Disjoint SmartGraphs

Hybrid Disjoint SmartGraphs are like Hybrid SmartGraphs but also prohibit edges between vertices with different smartGraphAttribute values. This restriction makes it unnecessary to replicate the edge collections between SmartGraph collections and SatelliteCollections to all DB-Servers for local execution. They are sharded like the SmartGraph collections instead (distributeShardsLike).

Getting started

First of all, SmartGraphs cannot use existing collections. When switching to SmartGraph from an existing dataset you have to import the data into a fresh SmartGraph. This switch can be easily achieved with arangodump and arangorestore. The only thing you have to change in this pipeline is that you create the new collections with the SmartGraph module before starting arangorestore.

Create a SmartGraph

In contrast to General Graphs we have to add more options when creating the graph. The two options smartGraphAttribute and numberOfShards are required and cannot be modified later.

arangosh> var graph_module = require("@arangodb/smart-graph");
arangosh> var graph = graph_module._create("myGraph", [], [], {smartGraphAttribute: "region", numberOfShards: 9});
arangosh> graph_module._graph("myGraph");
Show execution results
Hide execution results
{[SmartGraph] 
}

Create a Disjoint SmartGraph

In contrast to regular SmartGraphs we have to add one option when creating the graph. The boolean option isDisjoint is required, needs to be set to true and cannot be modified later.

arangosh> var graph_module = require("@arangodb/smart-graph");
arangosh> var graph = graph_module._create("myGraph", [], [], {smartGraphAttribute: "region", numberOfShards: 9, isDisjoint: true});
arangosh> graph_module._graph("myGraph");
Show execution results
Hide execution results
{[SmartGraph] 
}

Add vertex collections

This is analogous to General Graphs. Unlike with General Graphs, the collections must not exist when creating the SmartGraph. The SmartGraph module will create them for you automatically to set up the sharding for all these collections correctly. If you create collections via the SmartGraph module and remove them from the graph definition, then you may re-add them without trouble however, as they will have the correct sharding.

arangosh> graph._addVertexCollection("shop");
arangosh> graph._addVertexCollection("customer");
arangosh> graph._addVertexCollection("pet");
arangosh> graph_module._graph("myGraph");
Show execution results
Hide execution results
{[SmartGraph] 
  "customer" : [ArangoCollection 10423, "customer" (type document, status loaded)], 
  "pet" : [ArangoCollection 10433, "pet" (type document, status loaded)], 
  "shop" : [ArangoCollection 10413, "shop" (type document, status loaded)] 
}

Define relations on the Graph

Adding edge collections works the same as with General Graphs, but again, the collections are created by the SmartGraph module to set up sharding correctly so they must not exist when creating the SmartGraph (unless they have the correct sharding already).

arangosh> var rel = graph_module._relation("isCustomer", ["shop"], ["customer"]);
arangosh> graph._extendEdgeDefinitions(rel);
arangosh> graph_module._graph("myGraph");
Show execution results
Hide execution results
{[SmartGraph] 
  "isCustomer" : [ArangoCollection 10481, "isCustomer" (type edge, status loaded)], 
  "shop" : [ArangoCollection 10450, "shop" (type document, status loaded)], 
  "customer" : [ArangoCollection 10460, "customer" (type document, status loaded)], 
  "pet" : [ArangoCollection 10470, "pet" (type document, status loaded)] 
}

Create a Hybrid SmartGraph

In addition to the attributes you would set to create a SmartGraph, there is an additional attribute satellites you need to set. It needs to be an array of one or more collection names. These names can be used in edge definitions (relations) and these collections will be created as SatelliteCollections. In this example, both vertex collections are created as SatelliteCollections:

arangosh> var graph_module = require("@arangodb/smart-graph");
arangosh> var rel = graph_module._relation("isCustomer", "shop", "customer")
arangosh> var graph = graph_module._create("myGraph", [rel], [], {satellites: ["shop", "customer"], smartGraphAttribute: "region", numberOfShards: 9});
arangosh> graph_module._graph("myGraph");
Show execution results
Hide execution results
{[SmartGraph] 
  "isCustomer" : [ArangoCollection 10060, "isCustomer" (type edge, status loaded)], 
  "shop" : [ArangoCollection 10058, "shop" (type document, status loaded)], 
  "customer" : [ArangoCollection 10059, "customer" (type document, status loaded)] 
}

Create a Hybrid Disjoint SmartGraph

The option isDisjoint needs to be set to true in addition to the other options for a Hybrid SmartGraph. Only the shop vertex collection is created as a SatelliteCollection in this example:

arangosh> var graph_module = require("@arangodb/smart-graph");
arangosh> var rel = graph_module._relation("isCustomer", "shop", "customer")
arangosh> var graph = graph_module._create("myGraph", [rel], [], {satellites: ["shop"], smartGraphAttribute: "region", isDisjoint: true, numberOfShards: 9});
arangosh> graph_module._graph("myGraph");
Show execution results
Hide execution results
{[SmartGraph] 
  "isCustomer" : [ArangoCollection 10079, "isCustomer" (type edge, status loaded)], 
  "shop" : [ArangoCollection 10078, "shop" (type document, status loaded)], 
  "customer" : [ArangoCollection 10068, "customer" (type document, status loaded)] 
}