--- title: Lunr tagline: full index search date: 2020-11-08 12:00:00 description: > Learn the search query language used by Lunr. The DSL used by Lunr is quite close to Solr to support a powerful search experience for better result. Using search expressions with wildcards, conditions like and|or and fuzzy search algorithms are supported. tags: [ Roundtrip, Introduction ] categories: [ Search ] toc: true scrollbar: false permalink: /pages/public/learn/roundtrip/lunr/ regenerate: true resources: [ lunr, rouge ] resource_options: - attic: padding_top: 400 padding_bottom: 50 opacity: 0.5 slides: - url: /assets/images/modules/attics/banner/lunr-banner-1280x800.jpg alt: Lunr --- // Page Initializer // ============================================================================= // Enable the Liquid Preprocessor :page-liquid: // Set page (local) attributes here // ----------------------------------------------------------------------------- // :page--attr: // Load Liquid procedures // ----------------------------------------------------------------------------- {% capture load_attributes %}themes/{{site.template.name}}/procedures/global/attributes_loader.proc{%endcapture%} // Load page attributes // ----------------------------------------------------------------------------- {% include {{load_attributes}} scope="all" %} // Page content // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ == Searching After you have built an index of your documents, the next step is to perform a search. The simplest way to start is to pass the text on which you want to search into the search method: [source, javascript] ---- idx.search('foo') ---- The above will return details of all documents that match the term “foo”. Although it looks like a string, the `search` method parses the string into a search query. This supports special syntax for defining more complex queries. Searches for multiple terms are also supported. If a document matches _at least_ one of the search terms, it will show in the results. The search terms are combined with OR. [source, javascript] ---- idx.search('foo bar') ---- The above example will match documents that contain either “foo” or “bar”. Documents that contain _both_ will score more highly and will be returned first. === Scoring The score (also known as relevance) of a document is calculated by the https://en.wikipedia.org/wiki/Okapi_BM25[BM25] algorithm, along with other factors such as link:#Boosts[boosts]. You don’t need to worry too much about the details of how BM25 works; to summarize, the more a search term occurs in a single document, the more that term will increase that document’s score, but the more a search term occurs in the overall _collection_ of documents, the less that term will increase a document’s score. For example, let’s say you’re indexing a collection of documents about JavaScript testing libraries. The terms “JavaScript”, “library”, and “test” may occur very frequently throughout the entire collection, so finding a document that mentions one of these terms isn’t very significant. However, if you’re searching for “integration test”, only three documents in the collection have the term “integration” in them, and _one_ of them mentions “integration” many times, that will bring the score for that one document higher up. === Wildcards Lunr supports wildcards when performing searches. A wildcard is represented as an asterisk (`*`) and can appear anywhere in a search term. For example, the following will match all documents with words beginning with “foo”: [source, javascript] ---- idx.search('foo*') ---- This will match all documents that end with ‘oo’: [source, javascript] ---- idx.search('*oo') ---- Leading wildcards, as in the above example, should be used sparingly. They can have a negative impact on the performance of a search, especially in large indexes. Finally, a wildcard can be in the middle of a term. The following will match any documents that contain a term that begins with “f” and ends in “o”: [source, javascript] ---- idx.search('f*o') ---- It is also worth noting that, when a search term contains a wildcard, no stemming is performed on the search term. === Fields By default, Lunr will search all fields in a document for the query term, and it is possible to restrict a term to a specific field. The following example searches for the term “foo” in the field title: [source, javascript] ---- idx.search('title:foo') ---- The search term is prefixed with the name of the field, followed by a colon (`:`). The field _must_ be one of the fields defined when building the index. Unrecognised fields will lead to an error. Field-based searches can be combined with all other term modifiers and wildcards, as well as other terms. For example, to search for words beginning with “foo” in the title or with “bar” in any field the following query can be used: [source, javascript] ---- idx.search('title:foo* bar') ---- === Boosts In multi-term searches, a single term may be important than others. For these cases Lunr supports term level boosts. Any document that matches a boosted term will get a higher relevance score, and appear higher up in the results. A boost is applied by appending a caret (`^`) and then a positive integer to a term. [source, javascript] ---- idx.search('foo^10 bar') ---- The above example weights the term “foo” 10 times higher than the term “bar”. The boost value can be any positive integer, and different terms can have different boosts: [source, javascript] ---- idx.search('foo^10 bar^5 baz') ---- === Fuzzy Matches Lunr supports fuzzy matching search terms in documents, which can be helpful if the spelling of a term is unclear, or to increase the number of search results that are returned. The amount of fuzziness to allow when searching can also be controlled. Fuzziness is applied by appending a tilde (`~`) and then a positive integer to a term. The following search matches all documents that have a word within 1 edit distance of “foo”: [source, javascript] ---- idx.search('foo~1') ---- An edit distance of 1 allows words to match if either adding, removing, changing or transposing a character in the word would lead to a match. For example “boo” requires a single edit (replacing “f” with “b”) and would match, but “boot” would not as it also requires an additional “t” at the end. === Term Presence By default, Lunr combines multiple terms together in a search with a logical OR. That is, a search for “foo bar” will match documents that contain “foo” or contain “bar” or contain both. This behaviour is controllable at the term level, i.e. the presence of each term in matching documents can be specified. By default each term is optional in a matching document, though a document must have at least one matching term. It is possible to specify that a term must be present in matching documents, or that it must be absent in matching documents. To indicate that a term must be present in matching documents the term should be prefixed with a plus (`+`) and to indicate that a term must be absent the term should be prefixed with a minus (`-`). Without either prefix the term’s presence in matching documents is optional. The below example searches for documents that must contain “foo”, might contain “bar” and must not contain “baz”: [source, javascript] ---- idx.search("+foo bar -baz") ---- To simulate a logical AND search of “foo AND bar” mark both terms as required: [source, javascript] ---- idx.search("+foo +bar") ----