--- title: QuickSearch tagline: full index search based on lunr date: 2020-11-08 00:00:00 +100 description: > QuickSearch is based on the search engine Lunr, fully integrated with the J1 Template. Lunr is designed to be lightweight yet full-featured to provide a great search experience. No need for complex external, server-sided search engines or commercial services on the Internet like Google. categories: [ Roundtrip ] tags: [ Introduction, Module, Lunr, QuickSearch ] toc: true scrollbar: false fam_menu_id: page_ctrl_simple permalink: /pages/public/learn/roundtrip/quicksearch/ regenerate: false resources: [ lunr, rouge, lightbox, clipboard ] resource_options: - toccer: collapseDepth: 3 - attic: padding_top: 400 padding_bottom: 50 opacity: 0.5 slides: - url: /assets/images/modules/attics/banner/lunr-banner-1280x800.jpg alt: Lunr --- // Page Initializer // ============================================================================= // Enable the Liquid Preprocessor :page-liquid: // Set (local) page attributes here // ----------------------------------------------------------------------------- // :page--attr: // Load Liquid procedures // ----------------------------------------------------------------------------- {% capture load_attributes %}themes/{{site.template.name}}/procedures/global/attributes_loader.proc{%endcapture%} // Load page attributes // ----------------------------------------------------------------------------- {% include {{load_attributes}} scope="all" %} // Page content // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Include sub-documents // ----------------------------------------------------------------------------- QuickSearch is based on the search engine Lunr, fully integrated with the J1 Template. Lunr is designed to be lightweight yet full-featured to provide a great search experience. No need for complex external, server-sided search engines or commercial services on the Internet like Google. Searching a website using QuickSearch is different from search engines like Google or Microsoft Bing. Those search platforms using complex algorithms to provide a simple interface to the public but using a lot of artificial intelligence (AI) methods to make sense of results out of a handful of words given for a search. Nevertheless, QuickSearch, the J1 implementation of Lunr, is simple like searching at Google but offers additional features to do searches more specifically - if wanted. QuickSearch provides an easy-to-use query language for better results - anyway! == Core concepts Understanding some of the concepts and terminology that QuickSearch (Lunr) uses will allow users to provide powerful search functionality - to get more relevant search results. === Indexing documents QuickSearch offers searches on *all* documents of the website generated by J1 but only for this site. Advantage, no internet access is done for searches because it's not needed. Searches are based on a pre-build local site *full-text* index loaded by the browser on a page request. The index for a site is generated by the (Jekyll) plugin `lunr_index.rb` located in the `_plugins` folder. The full-text index is always generated by Jekyll at build-time: .Index creation at buildtime ---- Startup the site .. Configuration file: ... Incremental build: enabled Generating... J1 QuickSearch: creating search index ... J1 QuickSearch: finished, index ready. .... ---- Or, if you're running a website in development mode, the index get refreshed for all files added or modified. .Index creation if files added, or modified ---- site: Regenerating: n file(s) changed at ... site: ... site: J1 QuickSearch: creating search index ... site: J1 QuickSearch: finished, index ready. ... ---- === Documents The searchable data in an index is organized as documents containing the text and the words (terms) you want to search on. A document is a data set (JSON object) with fields that are processed to create the result list for a search. A document data set might look like this: [source, json, role="noclip"] ---- { "title": "Web in a Day", "tagline": "meet & greet jekyll", "url": "/pages/public/learn/kickstarter/web_in_a_day/meet_and_greet/", "date": "2018-05-01 00:00:00 +0000", "tags": [ "Introduction" ], "categories": [ "Jekyll", "Knowledge", "Tutorial" ], "description": "Web in a Day is the first in a series of tutorials ..." } ---- In this document, there are several fields, like `title`, `tagline`, or `description`, that could be used for *full-text* searches. But additional fields are available, like `tags` or `categories` that can be used for more specific searches based on `identifiers`. NOTE: The document *content* is collected by the (intrinsic) field `body`. To limit the index data loaded by the browser, the body field is removed from a document. The `body` field not available as an *explicit* field for searches, but the *content* is still fully searchable. To do a simple full-text search as well as more specific searches, the QuickSearch core engine Lunr offers a query language, a DSL (domain-specific language). Find more about *QuickSearch|Lunr DSL* queries with the section <>. === Scoring The relevance (the `score`) is calculated based on an algorithm called *BM25*, along with other factors. You don’t need to worry too much about the details of how this technique works. To summarize: the more a search term occurs in a single document, the more that term will increase that document’s score, but the more a search term occurs in the *overall* collection of documents, the less that term will increase a document’s score. In other words, seldom words count and increase the score. Scoring information generated by the BM25 algorithm is added to the (local) search index and allows a very fast calculation of the relevance of documents for queries. Imagine you’re website contains documents about Jekyll. The term `Jekyll` may occur very frequently throughout the entire website. Used quite often for the content. So finding a document that mentions the term Jekyll isn’t very significant for a search. However, if you’re searching for `Jekyll Generator`, only some documents of the website has the word `Generator` in them, and that will bring the score (relevance) for documents having both words in them at a higher level, bring them higher up in the search results. Matching and scoring are used by all search engines - the same as for J1 QuickSearch. You’ll see for QuickSearch a similar behavior in *sorting* search results as you already know from commercial internet search engines like Google: the top results are the more relevant ones. == Searching To access QuickSearch, a magnifier button is available in the `Quicklinks` area in the menu bar at the top-right of every page. .Search button (magnifier) in the quick access area lightbox::quicksearch-icon[ 800, {data-quicksearch-icon} ] A mouse-click on the magnifier button opens the search input and disables all other navigation to focus on what you're intended to do: searching. .Input bar for a QuickSearch lightbox::quicksearch-input[ 800, {data-quicksearch-input} ] Search queries look like simple text. But the search `engine` under the hood of QuickSearch transforms the given search string (text) always into a search query. Search queries support a special syntax, the DSL, for defining more complex queries for better (scored) results. As always: start simple! === Simple searches The simplest way to run a search is to pass the text (words, terms) on which you want to search on: [source, text] ---- jekyll ---- The above will return all documents that match the term `jekyll`. Searches for *multiple* terms (words) are also supported. If a document matches *at least* one of the search terms, it will show in the results. The search terms are combined by a logical `OR`. [source, text] ---- jekyll tutorial ---- The above example will match documents that contain either `jekyll` *OR* `tutorial`. Documents that contain _both_ will increase the score, and those documents are returned first. NOTE: Comparing to a Google search (terms are combined at Google by a logical `AND`) a Quicksearch combines the terms by an `OR`. To combine search terms in a QuickSearch query by a logical *AND*, the terms could be prepended by a plus sign (`+`) to mark them as for the QuickSearch query (DSL) as *required*: [source, text] ---- +jekyll +tutorial ---- === Wildcards QuickSearch supports wildcards when performing searches. A wildcard is represented as an asterisk (`*`) and can appear anywhere in a search term. For example, the following will match all documents with words beginning with `Jek`: [source, text] ---- jek* ---- NOTE: Language grammar rules are not relevant for searches. For simplification, all words (terms) are transformed to lower case. As a result, the word `Jekyll` is the same as `jekyll` from a search-engines perspective. Language variations of `Jekyll's` or plurals like `Generators` are reduced to their base form. For searches, don't take care of grammar rules but the spelling. If you're unsure about the spelling of a word, use wildcards. === Fields By default, Lunr will search *all fields* in a document for the given query terms, and it is possible to restrict a term to a specific *field*. The following example searches for the term `jekyll` in the field title: [source, text] ---- title:jekyll ---- The search term is prefixed with the field's name, followed by a colon (`:`). The field _must_ be one of the fields defined when building the index. Unrecognized fields will lead to an error. Search queries based on fields can be combined with all other term modifiers like wildcards. For example, to search for words beginning with `jek` in the title *AND* the wildcard `coll*` in a document, the following query can be used: [source, text] ---- +title:jek* +coll* ---- ==== Available fields Besides the document *body*, an intrinsic field to create the full-text index out of the document *content*, some more specific fields are available for searches. .Available fields (all documents) [cols="3a,3a,6a, options="header", width="100%", role="rtable mt-3"] |=============================================================================== |Name |Value |Description\|Example\|s |`title` |`string` |The headline of a document (article, post) Example\|s: QuickSearch [source, text] ---- title:QuickSearch ---- |`tagline` |`string` |The subtitle of a document (article, post) Example\|s: full index search |`tags` |`string` |Tags describe the content of a document. Example\|s: Roundtrip, QuickSearch |`categories` |`string` |Categories describe the group of documnets a document belongs to. Example\|s: Search |`description` |`string` |The description is given by the author for a document. It gives a brief summary what the document is all about. Example\|s: QuickSearch is based on the search engine Lunr, fully integrated with J1 Template ... |=============================================================================== //// === Boosts In multi-term searches, a single term may be important than others. For these cases Lunr supports term level boosts. Any document that matches a boosted term will get a higher relevance score, and appear higher up in the results. A boost is applied by appending a caret (`^`) and then a positive integer to a term. [source, javascript] ---- idx.search('foo^10 bar') ---- The above example weights the term “foo” 10 times higher than the term “bar”. The boost value can be any positive integer, and different terms can have different boosts: [source, javascript] ---- idx.search('foo^10 bar^5 baz') ---- === Fuzzy Matches Lunr supports fuzzy matching search terms in documents, which can be helpful if the spelling of a term is unclear, or to increase the number of search results that are returned. The amount of fuzziness to allow when searching can also be controlled. Fuzziness is applied by appending a tilde (`~`) and then a positive integer to a term. The following search matches all documents that have a word within 1 edit distance of “foo”: [source, javascript] ---- idx.search('foo~1') ---- An edit distance of 1 allows words to match if either adding, removing, changing or transposing a character in the word would lead to a match. For example “boo” requires a single edit (replacing “f” with “b”) and would match, but “boot” would not as it also requires an additional “t” at the end. //// === Term presence By default, Lunr combines multiple terms in a search with a logical OR. That is, a search for `jekyll collections` will match documents that contain `jekyll` or contain `collections` or contain both. This behavior is controllable at the term level, i.e., the presence of each term in matching documents can be specified. By default, each term is optional in a matching document, though a document must have at least one matching term. It is possible to specify that a term must be present in matching documents or that it must be absent in matching documents. To indicate that a term must be *present* in matching documents, the term could be prefixed with a plus sign (`+`) (required), and to indicate that a term must be *absent* (not wanted), the term should be prefixed with a minus (`-`). The below example searches for documents that *must* contain `jekyll`, and must *not* contain the word `collection`: [source, text] ---- +jekyll -collection ---- To simulate a logical *AND* search of documents that contain the word `jekyll` *AND* the word `collection`, mark both terms as required: [source, text] ---- +jekyll +collection ---- == What next You've explored some of the possibilities J1 offers for websites. But much, much more can J1 do for your project. This was the last page to go for the roundtrip. More details of the most common elements of Bootstrap can be found on the previewer for a theme. Have a look at the link:{url-previewer--theme}[Theme previewer]. To make things real for your new site, go for *Web in a day*. This tutorial guides you through all the steps on how to build a website. Your site using Jekyll and the template system J1. It's a pleasant journey to learn what modern static webs can offer today. Start your journey from here: link:{url-j1-kickstarter--web-in-a-day}[Web in a day, {browser-window--new}].