Wildcard Search with ArangoSearch
Search for strings with placeholders that stand for one or many arbitrary characters
You can use the LIKE()
function for this search technique to find strings
that start with, contain or end with a certain substring, but it can do more
than that. You can place the special characters _
and %
as wildcards for
single or zero-or-more characters in the search string to match multiple
partial strings.
The ArangoSearch LIKE()
function
is backed by View indexes. In contrast, the
String LIKE()
function cannot utilize any
sort of index. Another difference is that the ArangoSearch variant does not
accept a third argument to make matching case-insensitive. You can control this
via Analyzers instead, also see
Case-insensitive Search with ArangoSearch.
Which of the two equally named functions is used is determined by the context.
It is the ArangoSearch variant in SEARCH
operations and the String variant
everywhere else.
Wildcard Syntax
_
: A single arbitrary character%
: Zero, one or many arbitrary characters\\_
: A literal underscore\\%
: A literal percent sign
Literal backlashes require different amounts of escaping depending on the context:
\
in bind variables (Table view mode) in the Web UI (automatically escaped to\\
unless the value is wrapped in double quotes and already escaped properly)\\
in bind variables (JSON view mode) and queries in the Web UI\\
in bind variables in arangosh\\\\
in queries in arangosh- Double the amount compared to arangosh in shells that use backslashes for
escaping (
\\\\
in bind variables and\\\\\\\\
in queries)
Wildcard Search Examples
Dataset: IMDB movie dataset
View definition:
{
"links": {
"imdb_vertices": {
"fields": {
"title": {
"analyzers": [
"identity"
]
}
}
}
}
}
AQL queries:
Match all titles that starts with The Matr
using LIKE()
,
where _
stands for a single wildcard character and %
for an arbitrary amount:
FOR doc IN imdb
SEARCH ANALYZER(LIKE(doc.title, "The Matr%"), "identity")
RETURN doc.title
Result |
---|
The Matrix Revisited |
The Matrix |
The Matrix Reloaded |
The Matrix Revolutions |
The Matrix Trilogy |
You can achieve the same with the STARTS_WITH()
function:
FOR doc IN imdb
SEARCH ANALYZER(STARTS_WITH(doc.title, "The Matr"), "identity")
RETURN doc.title
Match all titles that contain Mat
using LIKE()
:
FOR doc IN imdb
SEARCH ANALYZER(LIKE(doc.title, "%Mat%"), "identity")
RETURN doc.title
Result |
---|
The Matrix Revisited |
Gray Matters |
Show: A Night In The Life of Matchbox Twenty |
The Mating Habits of the Earthbound Human |
Dark Matter |
Dave Matthews & Tim Reynolds: Live at Radio City |
Once Upon A Mattress |
Tarzan and His Mate |
Donald in Mathmagic Land |
Das Geheimnis der Materie |
… |
Match all titles that end with rix
using LIKE()
:
FOR doc IN imdb
SEARCH ANALYZER(LIKE(doc.title, "%rix"), "identity")
RETURN doc.title
Result |
---|
Ben 10: Secret of the Omnitrix |
Pinchcliffe Grand Prix |
Hendrix |
The Matrix |
The Animatrix |
Les Douze travaux d’Astérix |
Vercingétorix |
Match all titles that have an H
as first letter, followed by two arbitrary
characters, followed by ry
and any amount of characters after that. It will
match titles starting with Harry
and Henry
:
FOR doc IN imdb
SEARCH ANALYZER(LIKE(doc.title, "H__ry%"), "identity")
RETURN doc.title
Result |
---|
Henry & June |
Henry Rollins: Live in the Conversation Pit |
Henry Rollins: Uncut from NYC |
Harry Potter and the Sorcerer’s Stone |
Harry Potter and the Chamber Of Secrets |
… |
Use a bind parameter as input, but escape the characters with special meaning and perform a contains-style search by prepending and appending a percent sign:
FOR doc IN imdb
SEARCH ANALYZER(LIKE(doc.title, CONCAT("%", SUBSTITUTE(@term, ["_", "%"], ["\\_", "\\%"]), "%")), "identity")
RETURN doc.title
Bind parameters:
{ "term": "y_" }
The query constructs the wildcard string %y\\_%
and will match Cry_Wolf
.