Search Services

Contents

1. Overview
2. Enable/Disable a search service
3. Build a custom search service
3.1 Search service specifications/requirements
3.1.1 3.1.1 Using SearchService base class
3.1.2 3.1.2 Using ListLinksService class
3.1.3 3.1.3 Using KnowledgeBaseService class
3.2 Good practices

1. Overview

Search services are meant to display information contextual to a search query in very specialized way, in the sense that they can search/retrieve/display data beyond the traditional concept of records. Typical search services could for example include:

  • Spell-check user queries by calling an external spellchecking library, and offering "Did you mean ...?" options.
  • Parse user input and display an author profile when searching for a well-defined author.
  • Search for submission names matching the user input.
  • Retrieve phone number from the institutional LDAP.
  • Etc.

Search services are displayed (in addition) just before the results returned by the standard Invenio search engine. Each service is queried with the context, and returns:

  • A score indicating the relevance of the response, given the context
  • An HTML formatted response
The relevance returned by the service is typically a score between 0 and 100 (see websearch_services.CFG_WEBSEARCH_SERVICE_MAX_SERVICE_ANSWER_RELEVANCE for details) which indicate how much the service thinks it can address the "question" and how much it thinks it was able to answer it. Given the wanted simplistic (non-scientific) nature of services it is extremely important to consider the score very carefully, and compare it with existing services when designing a new service. Failing to do so might lead to hide more relevant answers provided by other services with non-relevant information.

Services are designed to provide unobtrusive and useful information to the user. To that end some measure are taken to cut possible verbosity introduced by the services:

  • A maximum number of 2 (most relevant) services is displayed for each query. Defined in websearch_services.CFG_WEBSEARCH_SERVICE_MAX_NB_SERVICE_DISPLAY.
  • Service with a too low relevance (<21) are not displayed. Defined in websearch_services.CFG_WEBSEARCH_SERVICE_MIN_RELEVANCE_TO_DISPLAY.
  • When the distance between the relevance of two services is too great (30), only the most relevant service out of the two is displayed. Defined in websearch_services.CFG_WEBSEARCH_SERVICE_MIN_RELEVANCE_TO_DISPLAY.
  • When a service replies with too many answers, display only 4 answers of the services. More can be displayed when clicking "more".
  • Use a visually light theme.
See also the section 3.2 Good practices.

2. Enable/Disable a search service

In order to enable a service, drop its files into the following location:

/opt/invenio/lib/python/invenio/search_services/

To disable a service, remove the file from the above directory.

3. Build a custom search service

Services use the Invenio plugin_utils infrastructure, and are self-contained in single Pythonic files that comply with the specifications defined in section 3.1 Search service specifications/requirements.

3.1 Search service specifications/requirements

A search service is a Pythonic file stored in /opt/invenio/lib/python/invenio/search_services/, which name corresponds to a class defined in the file.

In order to be valid, your service should inherit from the base SearchService class and implement some functions (see Section 3.1.1 Using SearchService base class). Other helper, more specialized classes exists to help you build services that responds with links of links (Section 3.1.2 Using ListLinksService class) or answer based on a BibKnowledge knowledge base (3.1.3 Using KnowledgeBaseService class)

3.1.1 Using SearchService base class

Start implementing your service by defining a class that inherits from the SearchService base class. Choose a class name that matches the name of your service file.

For eg. a spellchecker service could exist in /opt/invenio/lib/python/invenio/search_services/SpellCheckerService.py, with the following content:

from invenio.websearch_services import SearchService

__plugin_version__ = "Search Service Plugin API 1.0"

class CollectionNameSearchService(SearchService):

    def get_description(self, ln=CFG_SITE_LANG):
        "Return service description"
        return "Spell check user input"


    def answer(self, req, user_info, of, cc, colls_to_search, p, f, search_units, ln):
        """
        Answer question given by context.

        Return (relevance, html_string) where relevance is integer
        from 0 to 100 indicating how relevant to the question the
        answer is (see C{CFG_WEBSEARCH_SERVICE_MAX_SERVICE_ANSWER_RELEVANCE} for details) ,
        and html_string being a formatted answer.
        """

        [...]

        return (score, html_answer)

The bare minimum for a search service to be valid is to inherit from the abstract base class "SearchService". The service must:

  • Override get_description(..) to return a short description of the service.
  • Override answer(..) to return an answer (int, html_string).
  • Define __plugin_version__ for the Search Service plugin version this service is compatible with.

If your service must pre-process some data and cache it, it can override the following helper methods:

  • prepare_data_cache(..) to prepare some cache needed to answer. The returned structure is cached for you and can be retrieved later via the self.get_data_cache() function. This method is called only when the service is queried and the cache has not yet been prepared, or must be refreshed. The Invenio DataCacher infrastructure is used, to store the cache in memory.
  • timestamp_verifier(..) to indicate if cache must be refreshed, by returning the latest modification timestamp of your data.

3.1.2 Using ListLinksService class

If your service is designed to display list of responses, you can inherit from the ListLinksService class instead of SearchService, in order to benefit from the following additional helper functions:

from invenio.websearch_services import ListLinksService
from invenio.messages import gettext_set_language

__plugin_version__ = "Search Service Plugin API 1.0"

class CollectionNameSearchService(ListLinksService):

   [...]

   def get_description(self, ln=CFG_SITE_LANG):
        "Return service description"
        return "A foo that answer with list of bars"


   def get_label(self, ln=CFG_SITE_LANG):
        """
        Return label displayed next to the service responses.

        @rtype: string
        """

        _ = gettext_set_language(ln)
        return _("You might be interested in")


    def answer(self, req, user_info, of, cc, colls_to_search, p, f, search_units, ln):
        """
        Answer question given by context.

        Return (relevance, html_string) where relevance is integer
        from 0 to 100 indicating how relevant to the question the
        answer is (see C{CFG_WEBSEARCH_SERVICE_MAX_SERVICE_ANSWER_RELEVANCE} for details) ,
        and html_string being a formatted answer.
        """

        [...]

        return (score, self.display_answer_helper(list_of_tuples_labels_urls))
  • get_label(..) to return the label displayed to the user for the service next to the list of responses.
  • display_answer_helper(..) a function to be used as a help to process a list of tuples (label, url) in order to return a nicely formatted list of items from the answer(). function

The requirements from the base class SearchService are still valid when using ListLinksService:

  • Override get_description(..) to return a short description of the service.
  • Override answer(..) to return an answer (int, html_string).
  • Define __plugin_version__ for the Search Service plugin version this service is compatible with.

3.1.3 Using KnowledgeBaseService class

If you would like to build a service that answers automatically based on the values stored in a BibKnowledge knowledge base, use the KnowledgeBaseService. In this case you do not need to define the answer(..) function, which is already implemented for you:

from invenio.websearch_services import KnowledgeBaseService
from invenio.messages import gettext_set_language

__plugin_version__ = "Search Service Plugin API 1.0"

class MyKBService(KnowledgeBaseService):

   [...]
   def get_description(self, ln=CFG_SITE_LANG):
        "Return service description"
        return "A foo that answer with list of bars, based on myKB"


   def get_label(self, ln=CFG_SITE_LANG):
        """
        Return label displayed next to the service responses.

        @rtype: string
        """
        _ = gettext_set_language(ln)
        return _("You might be interested in")


    def get_kbname(self):
        "Load knowledge base with this name"
        return "myKB"

With the above code, the knowledge base "myKB" will be loaded and used to reply to the user. It is expected that the knowledge base contains for each mapping a list of whitespace-separated as key, and a label + url (separated with a "|") as value. For eg:


help submit thesis   ->     How to submit a thesis|http://localhost/help/how-to-submit-thesis
Atlantis Times       ->     Atlantis Times journal|http://localhost/journal/AtlantisTimes

When filling up the knowledge base keys, one should carefully select the keywords in order to focus on a very specific subject, and avoid too generic terms that would unnecessarily match too many user queries. One should also consider the possible clash with other services (the first rule of the above KB example might for example hide responses from the "SubmissionNameSearchService" service

A default implementation of a KnowledgeBaseService is provided with Invenio: "FAQKBService". This service makes use of the demo knowledge base named "FAQ".

3.2 Good practices

  • Gradually try to understand if the service can answer the question, and exclude it as quickly as possible (avoid heavy computation if the service is not concerned by the question).
  • It would often make sense to exclude the service if a user is querying using search field (for eg. author field).
  • Avoid noise and redundancy by providing only services that you think will be useful to your users, or make sure that they mutually exclude each other in the area they cover.
  • Be cautious when returning a relevance score, in order to ensure that your service will not hide other that might be more relevant. Test and consider other services. Follow the guidelines regarding the score to return.
  • Avoid matching too generic terms and stop-words, such as "document", "the", "to", etc. Be careful when using some keyword such as "how", "search", etc.

See also 1. Overview section.