Executing queries from arangosh

Within the ArangoDB shell, the _query and _createStatement methods of the db object can be used to execute AQL queries. This chapter also describes how to use bind parameters, counting, statistics and cursors.

With db._query

One can execute queries with the _query method of the db object. This will run the specified query in the context of the currently selected database and return the query results in a cursor. The results of the cursor can be printed using its toArray method:

arangosh> db._create("mycollection")
arangosh> db.mycollection.save({ _key: "testKey", Hello : "World" })
arangosh> db._query('FOR my IN mycollection RETURN my._key').toArray()
Show execution results
Hide execution results
[ArangoCollection 221, "mycollection" (type document, status loaded)]
{ 
  "_id" : "mycollection/testKey", 
  "_key" : "testKey", 
  "_rev" : "_eLYV2yG--_" 
}
[ 
  "testKey" 
]

db._query Bind parameters

To pass bind parameters into a query, they can be specified as second argument to the _query method:

arangosh> db._query(
........> 'FOR c IN @@collection FILTER c._key == @key RETURN c._key', {
........>   '@collection': 'mycollection', 
........>   'key': 'testKey'
........> }).toArray();
Show execution results
Hide execution results
[ 
  "testKey" 
]

ES6 template strings

It is also possible to use ES6 template strings for generating AQL queries. There is a template string generator function named aql; we call it once to demonstrate its result, and once putting it directly into the query:

var key = 'testKey';
aql`FOR c IN mycollection FILTER c._key == ${key} RETURN c._key`;
{ 
  "query" : "FOR c IN mycollection FILTER c._key == @value0 RETURN c._key", 
  "bindVars" : { 
    "value0" : "testKey" 
  } 
}
arangosh> var key = 'testKey';
arangosh> db._query(
........> aql`FOR c IN mycollection FILTER c._key == ${key} RETURN c._key`
........> ).toArray();
Show execution results
Hide execution results
[ 
  "testKey" 
]

Arbitrary JavaScript expressions can be used in queries that are generated with the aql template string generator. Collection objects are handled automatically:

arangosh> var key = 'testKey';
arangosh> db._query(aql`FOR doc IN ${ db.mycollection } RETURN doc`
........> ).toArray();
Show execution results
Hide execution results
[ 
  { 
    "_key" : "testKey", 
    "_id" : "mycollection/testKey", 
    "_rev" : "_eLYV2yG--_", 
    "Hello" : "World" 
  } 
]

Note: data-modification AQL queries normally do not return a result (unless the AQL query contains an extra RETURN statement). When not using a RETURN statement in the query, the toArray method will return an empty array.

Statistics and extra Information

It is always possible to retrieve statistics for a query with the getExtra method:

arangosh> db._query(`FOR i IN 1..100
........>             INSERT { _key: CONCAT('test', TO_STRING(i)) }
........>                INTO mycollection`
........> ).getExtra();
Show execution results
Hide execution results
{ 
  "warnings" : [ ], 
  "stats" : { 
    "writesExecuted" : 100, 
    "writesIgnored" : 0, 
    "scannedFull" : 0, 
    "scannedIndex" : 0, 
    "filtered" : 0, 
    "httpRequests" : 0, 
    "executionTime" : 0.0008782510003584321, 
    "peakMemoryUsage" : 32768 
  } 
}

The meaning of the statistics values is described in Execution statistics. You also will find warnings in here; If you’re designing queries on the shell be sure to also look at it.

Setting a memory limit

To set a memory limit for the query, pass options to the _query method. The memory limit specifies the maximum number of bytes that the query is allowed to use. When a single AQL query reaches the specified limit value, the query will be aborted with a resource limit exceeded exception. In a cluster, the memory accounting is done per shard, so the limit value is effectively a memory limit per query per shard.

arangosh> db._query(
........> 'FOR i IN 1..100000 SORT i RETURN i', {}, {
........>   memoryLimit: 100000
........> }).toArray();
Show execution results
Hide execution results
[ArangoError 32: AQL: query would use more memory than allowed (while executing)]

If no memory limit is specified, then the server default value (controlled by startup option --query.memory-limit will be used for restricting the maximum amount of memory the query can use. A memory limit value of 0 means that the maximum amount of memory for the query is not restricted.

Setting options

There are further options that can be passed in the options attribute of the _query method:

  • fullCount: if set to true and the query contains a LIMIT clause, then the result will have an extra attribute with the sub-attributes stats and fullCount, { ... , "extra": { "stats": { "fullCount": 123 } } }. The fullCount attribute will contain the number of documents in the result before the last top-level LIMIT in the query was applied. It can be used to count the number of documents that match certain filter criteria, but only return a subset of them, in one go. It is thus similar to MySQL’s SQL_CALC_FOUND_ROWS hint. Note that setting the option will disable a few LIMIT optimizations and may lead to more documents being processed, and thus make queries run longer. Note that the fullCount attribute may only be present in the result if the query has a top-level LIMIT clause and the LIMIT clause is actually used in the query.

  • failOnWarning: when set to true, this will make the query throw an exception and abort in case a warning occurs. This option should be used in development to catch errors early. If set to false, warnings will not be propagated to exceptions and will be returned with the query results. There is also a server configuration option --query.fail-on-warning for setting the default value for failOnWarning so it does not need to be set on a per-query level.

  • cache: if set to true, this will put the query result into the query result cache if the query result is eligible for caching and the query cache is running in demand mode. If set to false, the query result will not be inserted into the query result cache. Note that query results will never be inserted into the query result cache if the query result cache is disabled, and that they will be automatically inserted into the query result cache when it is active in non-demand mode.

  • fillBlockCache: if set to true or not specified, this will make the query store the data it reads via the RocksDB storage engine in the RocksDB block cache. This is usually the desired behavior. The option can be set to false for queries that are known to either read a lot of data that would thrash the block cache, or for queries that read data known to be outside of the hot set. By setting the option to false, data read by the query will not make it into the RocksDB block cache if it is not already in there, thus leaving more room for the actual hot set.

  • profile: if set to true or 1, returns extra timing information for the query. The timing information is accessible via the getExtra method of the query result. Set to 2 the query will include execution stats per query plan node in sub-attribute stats.nodes of the extra return attribute. Additionally the query plan is returned in the sub-attribute extra.plan.

  • maxWarningCount: limits the number of warnings that are returned by the query if failOnWarning is not set to true. The default value is 10.

  • maxNumberOfPlans: limits the number of query execution plans the optimizer will create at most. Reducing the number of query execution plans may speed up query plan creation and optimization for complex queries, but normally there is no need to adjust this value.

  • optimizer: Options related to the query optimizer.

    • rules: A list of to-be-included or to-be-excluded optimizer rules can be put into this attribute, telling the optimizer to include or exclude specific rules. To disable a rule, prefix its name with a -, to enable a rule, prefix it with a +. There is also a pseudo-rule all, which matches all optimizer rules. -all disables all rules.
  • stream: Specify true and the query will be executed in a streaming fashion. The query result is not stored on the server, but calculated on the fly. Beware: long-running queries will need to hold the collection locks for as long as the query cursor exists. It is advisable to only use this option on short-running queries or without exclusive locks. When set to false the query will be executed right away in its entirety. In that case query results are either returned right away (if the result set is small enough), or stored on the arangod instance and accessible via the cursor API.

    Please note that the query options cache, count and fullCount will not work on streaming queries. Additionally query statistics, warnings and profiling data will only be available after the query is finished. The default value is false

  • maxRuntime: The query has to be executed within the given runtime or it will be killed. The value is specified in seconds. The default value is 0.0 (no timeout).

  • maxNodesPerCallstack: The number of execution nodes in the query plan after that stack splitting is performed to avoid a potential stack overflow. Defaults to the configured value of the startup option --query.max-nodes-per-callstack.

    This option is only useful for testing and debugging and normally does not need any adjustment.

  • maxTransactionSize: transaction size limit in bytes

  • intermediateCommitSize: maximum total size of operations after which an intermediate commit is performed automatically

  • intermediateCommitCount: maximum number of operations after which an intermediate commit is performed automatically

In the ArangoDB Enterprise Edition there is an additional parameter:

  • skipInaccessibleCollections: AQL queries (especially graph traversals) will treat collection to which a user has no access rights as if these collections were empty. Instead of returning a forbidden access error, your queries will execute normally. This is intended to help with certain use-cases: A graph contains several collections and different users execute AQL queries on that graph. You can now naturally limit the accessible results by changing the access rights of users on collections.

  • satelliteSyncWait: This Enterprise Edition parameter allows to configure how long a DB-Server will have time to bring the SatelliteCollections involved in the query into sync. The default value is 60.0 (seconds). When the max time has been reached the query will be stopped.

With _createStatement (ArangoStatement)

The _query method is a shorthand for creating an ArangoStatement object, executing it and iterating over the resulting cursor. If more control over the result set iteration is needed, it is recommended to first create an ArangoStatement object as follows:

arangosh> stmt = db._createStatement( {
........> "query": "FOR i IN [ 1, 2 ] RETURN i * 2" } );
Show execution results
Hide execution results
[object ArangoStatement]

To execute the query, use the execute method of the statement:

arangosh> c = stmt.execute();
Show execution results
Hide execution results
[ 
  2, 
  4 
]
[object ArangoQueryCursor, count: 2, cached: false, hasMore: false]

Cursors

Once the query executed the query results are available in a cursor. The cursor can return all its results at once using the toArray method. This is a short-cut that you can use if you want to access the full result set without iterating over it yourself.

arangosh> c.toArray();
Show execution results
Hide execution results
[ 
  2, 
  4 
]

Cursors can also be used to iterate over the result set document-by-document. To do so, use the hasNext and next methods of the cursor:

arangosh> while (c.hasNext()) { require("@arangodb").print(c.next()); }
Show execution results
Hide execution results
2
4

Please note that you can iterate over the results of a cursor only once, and that the cursor will be empty when you have fully iterated over it. To iterate over the results again, the query needs to be re-executed.

Additionally, the iteration can be done in a forward-only fashion. There is no backwards iteration or random access to elements in a cursor.

ArangoStatement parameters binding

To execute an AQL query using bind parameters, you need to create a statement first and then bind the parameters to it before execution:

arangosh> var stmt = db._createStatement( {
........> "query": "FOR i IN [ @one, @two ] RETURN i * 2" } );
arangosh> stmt.bind("one", 1);
arangosh> stmt.bind("two", 2);
arangosh> c = stmt.execute();
Show execution results
Hide execution results
[ 
  2, 
  4 
]
[object ArangoQueryCursor, count: 2, cached: false, hasMore: false]

The cursor results can then be dumped or iterated over as usual, e.g.:

arangosh> c.toArray();
Show execution results
Hide execution results
[ 
  2, 
  4 
]

or

arangosh> while (c.hasNext()) { require("@arangodb").print(c.next()); }
Show execution results
Hide execution results
2
4

Please note that bind parameters can also be passed into the _createStatement method directly, making it a bit more convenient:

arangosh> stmt = db._createStatement( { 
........>  "query": "FOR i IN [ @one, @two ] RETURN i * 2", 
........>  "bindVars": { 
........>    "one": 1, 
........>    "two": 2 
........>  } 
........> } );
Show execution results
Hide execution results
[object ArangoStatement]

Counting with a cursor

Cursors also optionally provide the total number of results. By default, they do not. To make the server return the total number of results, you may set the count attribute to true when creating a statement:

arangosh> stmt = db._createStatement( {
........> "query": "FOR i IN [ 1, 2, 3, 4 ] RETURN i",
........> "count": true } );
Show execution results
Hide execution results
[object ArangoStatement]

After executing this query, you can use the count method of the cursor to get the number of total results from the result set:

arangosh> var c = stmt.execute();
arangosh> c.count();
Show execution results
Hide execution results
4

Please note that the count method returns nothing if you did not specify the count attribute when creating the query.

This is intentional so that the server may apply optimizations when executing the query and construct the result set incrementally. Incremental creation of the result sets is no possible if all of the results need to be shipped to the client anyway. Therefore, the client has the choice to specify count and retrieve the total number of results for a query (and disable potential incremental result set creation on the server), or to not retrieve the total number of results and allow the server to apply optimizations.

Please note that at the moment the server will always create the full result set for each query so specifying or omitting the count attribute currently does not have any impact on query execution. This may change in the future. Future versions of ArangoDB may create result sets incrementally on the server-side and may be able to apply optimizations if a result set is not fully fetched by a client.

Using cursors to obtain additional information on internal timings

Cursors can also optionally provide statistics of the internal execution phases. By default, they do not. To get to know how long parsing, optimization, instantiation and execution took, make the server return that by setting the profile attribute to true when creating a statement:

arangosh> stmt = db._createStatement( {
........> "query": "FOR i IN [ 1, 2, 3, 4 ] RETURN i",
........> options: {"profile": true}} );
Show execution results
Hide execution results
[object ArangoStatement]

After executing this query, you can use the getExtra() method of the cursor to get the produced statistics:

arangosh> var c = stmt.execute();
arangosh> c.getExtra();
Show execution results
Hide execution results
{ 
  "warnings" : [ ], 
  "stats" : { 
    "writesExecuted" : 0, 
    "writesIgnored" : 0, 
    "scannedFull" : 0, 
    "scannedIndex" : 0, 
    "filtered" : 0, 
    "httpRequests" : 0, 
    "executionTime" : 0.00004038099996250821, 
    "peakMemoryUsage" : 0 
  }, 
  "profile" : { 
    "initializing" : 2.9600050766021013e-7, 
    "parsing" : 0.000007650999577890616, 
    "optimizing ast" : 8.100005288724788e-7, 
    "loading collections" : 3.8700000004610047e-7, 
    "instantiating plan" : 0.0000036920000638929196, 
    "optimizing plan" : 0.000016897999557841104, 
    "executing" : 0.000008513000466336962, 
    "finalizing" : 0.000002881999535020441 
  } 
}

Query validation

The _parse method of the db object can be used to parse and validate a query syntactically, without actually executing it.

arangosh> db._parse( "FOR i IN [ 1, 2 ] RETURN i" );
Show execution results
Hide execution results
{ 
  "code" : 200, 
  "parsed" : true, 
  "collections" : [ ], 
  "bindVars" : [ ], 
  "ast" : [ 
    { 
      "type" : "root", 
      "subNodes" : [ 
        { 
          "type" : "for", 
          "subNodes" : [ 
            { 
              "type" : "variable", 
              "name" : "i", 
              "id" : 0 
            }, 
            { 
              "type" : "array", 
              "subNodes" : [ 
                { 
                  "type" : "value", 
                  "value" : 1 
                }, 
                { 
                  "type" : "value", 
                  "value" : 2 
                } 
              ] 
            }, 
            { 
              "type" : "no-op" 
            } 
          ] 
        }, 
        { 
          "type" : "return", 
          "subNodes" : [ 
            { 
              "type" : "reference", 
              "name" : "i", 
              "id" : 0 
            } 
          ] 
        } 
      ] 
    } 
  ] 
}