Pregel HTTP API

See Distributed Iterative Graph Processing (Pregel) for details.

Start Pregel job execution

Start the execution of a Pregel algorithm

POST /_api/control_pregel

Request Body

  • algorithm (string, required): Name of the algorithm. One of:
    • "pagerank" - Page Rank
    • "sssp" - Single-Source Shortest Path
    • "connectedcomponents" - Connected Components
    • "wcc" - Weakly Connected Components
    • "scc" - Strongly Connected Components
    • "hits" - Hyperlink-Induced Topic Search
    • "effectivecloseness" - Effective Closeness
    • "linerank" - LineRank
    • "labelpropagation" - Label Propagation
    • "slpa" - Speaker-Listener Label Propagation
  • graphName (string, optional): Name of a graph. Either this or the parameters vertexCollections and edgeCollections are required. Please note that there are special sharding requirements for graphs in order to be used with Pregel.

  • vertexCollections (array of strings, optional): List of vertex collection names. Please note that there are special sharding requirements for collections in order to be used with Pregel.

  • edgeCollections (array of strings, optional): List of edge collection names. Please note that there are special sharding requirements for collections in order to be used with Pregel.

  • params (object, optional): General as well as algorithm-specific options.

    The most important general option is “store”, which controls whether the results computed by the Pregel job are written back into the source collections or not.

    Another important general option is “parallelism”, which controls the number of parallel threads that will work on the Pregel job at most. If “parallelism” is not specified, a default value may be used. In addition, the value of “parallelism” may be effectively capped at some server-specific value.

    The option “useMemoryMaps” controls whether to use disk based files to store temporary results. This might make the computation disk-bound, but allows you to run computations which would not fit into main memory. It is recommended to set this flag for larger datasets.

    The attribute “shardKeyAttribute” specifies the shard key that edge collections are sharded after (default: "vertex").

To start an execution you need to specify the algorithm name and a named graph (SmartGraph in cluster). Alternatively you can specify the vertex and edge collections. Additionally you can specify custom parameters which vary for each algorithm, see Pregel - Available Algorithms.

Responses

HTTP 200: HTTP 200 is returned in case the Pregel was successfully created and the reply body is a string with the id to query for the status or to cancel the execution.

HTTP 400: An HTTP 400 error is returned if the set of collections for the Pregel job includes a system collection, or if the collections to not conform to the sharding requirements for Pregel jobs.

HTTP 403: An HTTP 403 error is returned if there are not sufficient privileges to access the collections specified for the Pregel job.

HTTP 404: An HTTP 404 error is returned if the specified “algorithm” is not found, or the graph specified in “graphName” is not found, or at least one the collections specified in “vertexCollections” or “edgeCollections” is not found.

Examples

Run the Weakly Connected Components (WCC) algorithm against a graph and store the results in the vertices as attribute component:

shell> curl -X POST --header 'accept: application/json' --data-binary @- --dump - http://localhost:8529/_api/control_pregel <<EOF
{ 
  "algorithm" : "wcc", 
  "graphName" : "connectedComponentsGraph", 
  "params" : { 
    "maxGSS" : 36, 
    "resultField" : "component" 
  } 
}
EOF

HTTP/1.1 200 OK
content-type: application/json
connection: Keep-Alive
content-length: 7
server: ArangoDB
x-arango-queue-time-seconds: 0.000000
x-content-type-options: nosniff

"68640"

Get Pregel job execution status

Get the status of a Pregel execution

GET /_api/control_pregel/{id}

Query Parameters

  • id (number, required): Pregel execution identifier.

Returns the current state of the execution, the current global superstep, the runtime, the global aggregator values as well as the number of sent and received messages.

Responses

HTTP 200: HTTP 200 will be returned in case the job execution id was valid and the state is returned along with the response.

  • id (string): An id of the Pregel job, as a string.

  • algorithm (string): An algorithm used by the job.

  • created (string): A date and time when the job was created.

  • expires (string): A date and time when the job results expire. The expiration date is only meaningful for jobs that were completed, canceled or resulted in an error. Such jobs are cleaned up by the garbage collection when they reach their expiration date/time.

  • ttl (number): A TTL (time to live) value for the job results, specified in seconds. The TTL is used to calculate the expiration date for the job’s results.

  • state (string): State of the execution. The following values can be returned:
    • "running": Algorithm is executing normally.
    • "storing": The algorithm finished, but the results are still being written back into the collections. Occurs only if the store parameter is set to true.
    • "done": The execution is done. In version 3.7.1 and later, this means that storing is also done. In earlier versions, the results may not be written back into the collections yet. This event is announced in the server log (requires at least info log level for the pregel log topic).
    • "canceled": The execution was permanently canceled, either by the user or by an error.
    • "fatal error": The execution has failed and cannot recover.
    • "in error" (currently unused): The execution is in an error state. This can be caused by DB-Servers being not reachable or being non responsive. The execution might recover later, or switch to "canceled" if it was not able to recover successfully.
    • "recovering" (currently unused): The execution is actively recovering, will switch back to running if the recovery was successful.
  • gss (number): The number of global supersteps executed.

  • totalRuntime (number): Total runtime of the execution up to now (if the execution is still ongoing).

  • startupTime (number): Startup runtime of the execution. The startup time includes the data loading time and can be substantial. The startup time will be reported as 0 if the startup is still ongoing.

  • computationTime (number): Algorithm execution time. The computation time will be reported as 0 if the computation still ongoing.

  • storageTime (number): Time for storing the results if the job includes results storage. The storage time be reported as 0 if storing the results is still ongoing.

  • reports (object): Statistics about the Pregel execution. The value will only be populated once the algorithm has finished.

    • vertexCount (integer): Total number of vertices processed.

    • edgeCount (integer): Total number of edges processed.

HTTP 404: An HTTP 404 error is returned if no Pregel job with the specified execution number is found or the execution number is invalid.

Examples

Get the execution status of a Pregel job:

shell> curl --header 'accept: application/json' --dump - http://localhost:8529/_api/control_pregel/68792

HTTP/1.1 200 OK
content-type: application/json
connection: Keep-Alive
content-length: 398
server: ArangoDB
x-arango-queue-time-seconds: 0.000000
x-content-type-options: nosniff
Show response body

Get currently running Pregel jobs

Get the overview of currently running Pregel jobs

GET /_api/control_pregel

Returns a list of currently running and recently finished Pregel jobs without retrieving their results. The returned object is a JSON array of Pregel job descriptions. Each job description is a JSON object with the following attributes:

  • id: an id of the Pregel job, as a string.

  • algorithm: an algorithm used by the job.

  • created: a date and time when the job was created.

  • expires: a date and time when the job results expire. The expiration date is only meaningful for jobs that were completed, canceled or resulted in an error. Such jobs are cleaned up by the garbage collection when they reach their expiration date/time.

  • ttl: a TTL (time to live) value for job results, specified in seconds. The TTL is used to calculate the expiration date for the job’s results.

  • state: a state of the execution, as a string.

  • gss: a number of global supersteps executed.

  • totalRuntime: a total runtime of the execution up to now (if the execution is still ongoing).

  • startupTime: a startup runtime of the execution. The startup time includes the data loading time and can be substantial. The startup time is reported as 0, if the startup is still ongoing.

  • computationTime: an algorithm execution time. The computation time is reported as 0, if the computation is still ongoing.

  • storageTime: time for storing the results if the job includes result storage. The storage time is reported as 0, if storing the results is still ongoing.

  • reports: optional statistics about the Pregel execution. The value is only populated when the algorithm has finished.

Responses

HTTP 200: Is returned when the list of jobs can be retrieved successfully.

Cancel Pregel job execution

Cancel an ongoing Pregel execution

DELETE /_api/control_pregel/{id}

Query Parameters

  • id (number, required): Pregel execution identifier.

Cancel an execution which is still running, and discard any intermediate results. This will immediately free all memory taken up by the execution, and will make you lose all intermediary data.

You might get inconsistent results if you requested to store the results and then cancel an execution when it is already in its "storing" state (or "done" state in versions prior to 3.7.1). The data is written multi-threaded into all collection shards at once. This means there are multiple transactions simultaneously. A transaction might already be committed when you cancel the execution job. Therefore, you might see some updated documents, while other documents have no or stale results from a previous execution.

Responses

HTTP 200: HTTP 200 will be returned in case the job execution id was valid.

HTTP 404: An HTTP 404 error is returned if no Pregel job with the specified execution number is found or the execution number is invalid.

Examples

Cancel a Pregel job to stop the execution or to free up the results if it was started with "store": false and is in the done state:

shell> curl -X DELETE --header 'accept: application/json' --dump - http://localhost:8529/_api/control_pregel/68493

HTTP/1.1 200 OK
content-type: application/json
connection: Keep-Alive
content-length: 2
server: ArangoDB
x-arango-queue-time-seconds: 0.000000
x-content-type-options: nosniff

""