Profiling and Hand-Optimizing AQL queries

To give you more insight into your query ArangoDB allows to execute your query with special instrumentation code enabled. This will then print a query plan with detailed execution statistics.

To use this in an interactive fashion on the shell you can use db._profileQuery(..) in arangosh. Alternatively, there is a button Profile in the Query tab of the web interface.

The printed execution plan then contains three additional columns:

  • Call: The number of times this query stage was executed
  • Items: The number of temporary result rows at this stage
  • Runtime: The total time spent in this stage

Below the execution plan there are additional sections for the overall runtime statistics and the query profile.

Example: Simple AQL query

Assuming we got a collection named acollection and insert 10000 documents via for (let i=0; i < 10000;i++) db.acollection.insert({value:i}). Then a simple query filtering for value < 10 will return 10 results:

arangosh> db._profileQuery(`
........> FOR doc IN acollection
........>  FILTER doc.value < 10
........>  RETURN doc`, {}, {colors: false}
........> );
Show execution results
Hide execution results
Query String (58 chars, cacheable: false):
 FOR doc IN acollection
  FILTER doc.value < 10
  RETURN doc

Execution plan:
 Id   NodeType                  Calls   Items   Runtime [s]   Comment
  1   SingletonNode                 1       1       0.00000   * ROOT
  2   EnumerateCollectionNode       1      10       0.00262     - FOR doc IN acollection   /* full collection scan  */   FILTER (doc.`value` < 10)   /* early pruning */
  5   ReturnNode                    1      10       0.00000       - RETURN doc

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   move-filters-into-enumerate

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Peak Mem [b]   Exec Time [s]
           0            0       10000            0       9990              0         0.00280

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00000
 parsing                    0.00003
 optimizing ast             0.00000
 loading collections        0.00000
 instantiating plan         0.00001
 optimizing plan            0.00012
 executing                  0.00263
 finalizing                 0.00001

An AQL query is essentially executed in a pipeline that chains together different functional execution blocks. Each block gets the input rows from the parent above it, does some processing and then outputs a certain number of output rows.

Without any detailed insight into the query execution it is impossible to tell how many results each pipeline-block had to work on and how long this took. By executing the query with the query profiler (db._profileQuery() or via the Profile button in the web interface) you can check exactly how much work each stage had to do.

Without any indexes this query should have to perform the following operations:

  1. Perfom a full collection scan via a EnumerateCollectionNode and outputting a row containing the document in doc.
  2. Calculate the boolean expression LET #1 = doc.value < 10 from all inputs via a CalculationNode
  3. Filter out all input rows where #1 is false via the FilterNode
  4. Put the doc variable of the remaining rows into the result set via the ResultNode

The EnumerateCollectionNode processed and returned all 10k rows (documents), as did the CalculationNode. Because the AQL execution engine also uses an internal batch size of 1000 these blocks were also called 100 times each. The FilterNode as well as the ReturnNode however only ever returned 10 rows and only had to be called once, because the result size fits within a single batch.

Let us add a persistent index on value to speed up the query:

db.acollection.ensureIndex({type:"persistent", fields:["value"]});
arangosh> db._profileQuery(`
........> FOR doc IN acollection
........>  FILTER doc.value < 10
........>  RETURN doc`, {}, {colors: false}
........> );
Show execution results
Hide execution results
Query String (58 chars, cacheable: false):
 FOR doc IN acollection
  FILTER doc.value < 10
  RETURN doc

Execution plan:
 Id   NodeType        Calls   Items   Runtime [s]   Comment
  1   SingletonNode       1       1       0.00000   * ROOT
  6   IndexNode           1      10       0.00004     - FOR doc IN acollection   /* persistent index scan */    
  5   ReturnNode          1      10       0.00000       - RETURN doc

Indexes used:
 By   Name                      Type         Collection    Unique   Sparse   Selectivity   Fields        Ranges
  6   idx_1733157424083238912   persistent   acollection   false    false       100.00 %   [ `value` ]   (doc.`value` < 10)

Optimization rules applied:
 Id   RuleName
  1   use-indexes
  2   remove-filter-covered-by-index
  3   remove-unnecessary-calculations-2

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Peak Mem [b]   Exec Time [s]
           0            0           0           10          0              0         0.00021

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00001
 parsing                    0.00002
 optimizing ast             0.00000
 loading collections        0.00000
 instantiating plan         0.00001
 optimizing plan            0.00011
 executing                  0.00006
 finalizing                 0.00001

This results in replacing the collection scan and filter block with an IndexNode. The execution pipeleine of the AQL query has become much shorter. Also the number of rows processed by each pipeline block is only 10, because we no longer need to look at all documents.

Example: AQL with Subquery

Let us consider a query containing a subquery:

arangosh> db._profileQuery(`
........> LET list = (FOR doc in acollection FILTER doc.value > 90 RETURN doc)
........> FOR a IN list 
........>  FILTER a.value < 91 
........>  RETURN a`, {}, {colors: false, optimizer:{rules:["-all"]}}
........> );
Show execution results
Hide execution results
Query String (116 chars, cacheable: false):
 LET list = (FOR doc in acollection FILTER doc.value > 90 RETURN doc)
 FOR a IN list 
  FILTER a.value < 91 
  RETURN a

Execution plan:
 Id   NodeType                  Calls   Items   Runtime [s]   Comment
  1   SingletonNode                 1       1       0.00000   * ROOT
 12   SubqueryStartNode             1       2       0.00001     - LET list = ( /* subquery begin */
  3   EnumerateCollectionNode      11   10001       0.00225       - FOR doc IN acollection   /* full collection scan  */
  4   CalculationNode              11   10001       0.00171         - LET #5 = (doc.`value` > 90)   /* simple expression */   /* collections used: doc : acollection */
  5   FilterNode                   10    9910       0.00137         - FILTER #5
 13   SubqueryEndNode               1       1       0.00106         - RETURN  doc ) /* subquery end */
  8   EnumerateListNode            10    9909       0.00110     - FOR a IN list   /* list iteration */
  9   CalculationNode              10    9909       0.00167       - LET #7 = (a.`value` < 91)   /* simple expression */
 10   FilterNode                    1       0       0.00027       - FILTER #7
 11   ReturnNode                    1       0       0.00000       - RETURN a

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   splice-subqueries

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Peak Mem [b]   Exec Time [s]
           0            0       10000            0      10000         720896         0.00962

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00000
 parsing                    0.00003
 optimizing ast             0.00000
 loading collections        0.00000
 instantiating plan         0.00002
 optimizing plan            0.00011
 executing                  0.00945
 finalizing                 0.00002

The resulting query profile contains a SubqueryNode which has the runtime of all its children combined.

Actually, we cheated a little. The optimizer would have completely removed the subquery if it had not been deactivated (rules:["-all"]). The optimimized version would take longer in the “optimizing plan” stage, but should perform better with a lot of results.

Example: AQL with Aggregation

Let us try a more advanced query, using a COLLECT statement. Assume we have a user collection with each document having a city, a username and an age attribute.

The following query gets us all age groups in buckets (0-9, 10-19, 20-29, …):

arangosh> db._profileQuery(`
........> FOR u IN myusers
........>  COLLECT ageGroup = FLOOR(u.age / 10) * 10
........>  AGGREGATE minAge = MIN(u.age), maxAge = MAX(u.age), len = LENGTH(u)
........>  RETURN {
........>    ageGroup, 
........>    minAge, 
........>    maxAge,
........>    len
........>  }`, {}, {colors: false}
........> );
Show execution results
Hide execution results
Query String (186 chars, cacheable: false):
 FOR u IN myusers
  COLLECT ageGroup = FLOOR(u.age / 10) * 10
  AGGREGATE minAge = MIN(u.age), maxAge = MAX(u.age), len = LENGTH(u)
  RETURN {
    ageGroup, 
    minAge, 
    maxAge,
    len
  }

Execution plan:
 Id   NodeType                  Calls   Items   Runtime [s]   Comment
  1   SingletonNode                 1       1       0.00000   * ROOT
  2   EnumerateCollectionNode       1      20       0.00002     - FOR u IN myusers   /* full collection scan, projections: `age`  */
  3   CalculationNode               1      20       0.00002       - LET #5 = (FLOOR((u.`age` / 10)) * 10)   /* simple expression */   /* collections used: u : myusers */
  4   CalculationNode               1      20       0.00000       - LET #7 = u.`age`   /* attribute expression */   /* collections used: u : myusers */
  6   CollectNode                   1       8       0.00001       - COLLECT ageGroup = #5 AGGREGATE minAge = MIN(#7), maxAge = MAX(#7), len = LENGTH()   /* hash */
  9   SortNode                      1       8       0.00001       - SORT ageGroup ASC   /* sorting strategy: standard */
  7   CalculationNode               1       8       0.00001       - LET #11 = { "ageGroup" : ageGroup, "minAge" : minAge, "maxAge" : maxAge, "len" : len }   /* simple expression */
  8   ReturnNode                    1       8       0.00000       - RETURN #11

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   remove-redundant-calculations
  3   remove-unnecessary-calculations
  4   move-calculations-up-2
  5   move-calculations-down
  6   reduce-extraction-to-projection

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Peak Mem [b]   Exec Time [s]
           0            0          20            0          0          32768         0.00036

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00000
 parsing                    0.00005
 optimizing ast             0.00001
 loading collections        0.00000
 instantiating plan         0.00002
 optimizing plan            0.00021
 executing                  0.00008
 finalizing                 0.00001

Without any indexes this query should have to perform the following operations:

  1. Perfom a full collection scan via a EnumerateCollectionNode and outputing a row containg the document in doc.
  2. Compute the expression LET #1 = FLOOR(u.age / 10) * 10 for all inputs via a CalculationNode
  3. Perform the aggregations via the CollectNode
  4. Sort the resulting aggregated rows via a SortNode
  5. Build a result value via another CalculationNode
  6. Put the result variable into the result set via the ResultNode

Like within the example above, you can see that after the CalculationNode stage, from the originally 20 rows only a handful remained.

Typical AQL Performance Mistakes

With the new query profiler you should be able to spot typical performance mistakes that we see quite often:

  • Not employing indexes to speed up queries with common filter expressions
  • Not using shard keys in filter statements, when it is known (only a cluster problem)
  • Using subqueries to calculate an intermediary result, but only using a few results

Bad example:

LET vertices = (
  FOR v IN 1..2 ANY @startVertex GRAPH 'my_graph'
    // <-- add a LIMIT 1 here
    RETURN v
)
FOR doc IN collection
  FILTER doc.value == vertices[0].value
  RETURN doc

Adding a LIMIT 1 into the subquery should result in better performance, because the traversal can be stopped after the first result instead of computing all paths.

Another mistake is to start a graph traversal from the wrong side (if both ends are known).

Assume we have two vertex collections users and products as well as an edge collection purchased. The graph model looks like this: (users) <--[purchased]--> (products), i.e. every user is connected with an edge in pruchased to zero or more products.

If we want to know all users that have purchased the product playstation as well as produts of type legwarmer we could use this query:

FOR prod IN products
  FILTER prod.type == 'legwarmer'
  FOR v,e,p IN 2..2 OUTBOUND prod purchased
    FILTER v._key == 'playstation' // <-- last vertex of the path
    RETURN p.vertices[1] // <-- the user

This query first finds all legwarmer products and then performs a traversal for each of them. But we could also inverse the traversal by starting of with the known playstation product. This way we only need a single traversal to achieve the same result:

FOR v,e,p IN 2..2 OUTBOUND 'product/playstation' purchased
  FILTER v.type == 'legwarmer' // <-- last vertex of the path
  RETURN p.vertices[1] // <-- the user