Engines

To see a list of all resources (search engines) that are used by your server, open settings.html in your client. This will show a table with resources, about 25 if you connect to the University of Twente Searsia server. In the list of resources, one of the search engines is the University of Twente Searsia server. We will call this resource the 'mother': It is the search engine that is fully trusted, like you would trust your mother. A client (or another) server will copy all the resources from its mother. This way, configurations can be quickly shared, and a copy of --for instance-- the University of Twente Searsia server can be quickly instantiated. To enable updates, start the (mother) server with the --open option. (BEWARE: This will enable anyone to update your server. Searsia does not support authentication at the moment.) To add a resource, click on the "+Add" button on the bottom of the settings.html page.

Adding a Searsia resource

To add a Searsia resource (an engine that implements the Searsia protocol, returning results of mime-type application/searsia+json), enter a unique Identifier (Id), and the resource' API template. The Id should be unique in your server. The API template is the url of the resource, where the query is replaced by {q} or {q?} (where the question mark should be used if the query is optional). After clicking "Update", the server tests the resource. A pop-up window appears with the results of the test, and a message explaining whether the resource is added or not.

Adding an HTML resource

The server can get search results by scraping the HTML that search engines return for their end users. To add an HTML resource, use text/html in the field Mime type. As the field API Template, take the resource' URL from your browser after querying the search engine, replacing the query in the url by {q}. If the URL does not contain a query, the search engine probably uses a POST request. To find out what the POST request is, we recommend Live HTTP Headers for Firefox. Put the POST string, replacing the query by {q} in the field Post string.

Searsia uses XPath 1.0 to extract the search results from the web page. XPath is a query language for selecting elements of semi-structured data. Suppose the search results are displayed as list elements on the page, then these are encode as <li> ... </li> on the page, and they can be extracted from the page with the XPath query //li. To find the most likely XPath query, we recommend Search Result Finder for Firefox. Fill in the XPath query in the field Item XPath. To tell Searsia how to extract the components of the search result, add XPaths for the fields title, description, link, or any other field you like (the client also supports image).

Adding an XML or JSON resource

To add an XML resource, fill in application/xml in the field Mime type. Then proceed as above. The Firefox Search Result Finder cannot be used in this case.

To add a JSON resource, fill in application/json in the field Mime type. Searsia also uses XPath queries to interpret JSON output, by internally converting JSON to XML, where each JSON attribute name is converted to an XML element; JSON lists are converted to repeated XML elements with the JSON list's name.

Examples

Searsia supports many API's by including API keys as secret parameters that will not be shared, as well the possibility to add custom HTTP headers. Look at the University of Twente settings for examples of Searsia's resource configurations, including several examples that use HTML scrapers, and examples for accessing the API's of Google, Twitter, Facebook, Flickr, Instagram, and more. If you believe that Searsia is unable to get search results from an existing resource that should be supported, please post your question under Searsia Server Issues. Please, note that Searsia is not meant to scrape sites that do not want to be scraped, and therefore does not contain ways to circumvent for instance session cookies.

While the Searsia resource configurations provide a way to get the search results for a great variety of existing search engines, Searsia also provides a flexible way to structure the search results from these engines. The search results, i.e., the objects in the "hits" list, may contain any attribute that seems appropriate, for instance an attribute "phone_number" for a telephone directory or an attribute "nr_of_citations" for a search engine that searches scientific papers. The following attributed are reserved:

  • "title": The title of the search results, that can be clicked to go to the web page that was found. Usually, the title is equals to the title of the web page that was found. The title is the only attribute that is mandatory.
  • "url": The link to the web page that was found.
  • "description": A small summary describing the result. This might be a snippet from the web site containing the query, or some other summary.
  • "image": The url of a (thumbnail) image, to be displayed with the search result.

The Results demo mockup below shows 7 ways to present the search results from Wikipedia's search suggestions, that is the mockup shows the same search results 7 times using different configurations.

The 7 results presentations are achieved as follows:

  • wikididyoumean, which' name is "Did you mean:", returns a single search result, that contains the title as well as "tags":"#suggestion", which tells the client to display the result as a query suggestion.
  • wikifull (Wikipedia Pages) returns title, description and URL. In this configuration, the domain of "urltemplate" (wikipedia.net) does not match the results' domains (wikipedia.org). Therefore, the client displays the URLs for each aggregated result. The "urltemplate" is the url that the user will use to search on the site, whereas the "apitemplate" will be used by the server.
  • wikiimage (Wikipedia Images) returns title and image and "tags":"#image", which tells the client to display the results as an image result. Note how the XPath functions concat() and substring-after() are used to create a custom image URL.
  • wikismall2, which is called Search Wikipedia for, does not return URLs, which makes the client infer the URL from the "title" and the "urltemplate", effectively creating a search engine that spawns a search on Wikipedia.
  • wikifull2 (Wikipedia Again) returns full search results with a thumbnail image, much like the Wikipedia Pages engine above.
  • wikismall (Wikipedia Small) returns search results with "tags":"#small", telling the client to display the results on a single line.
  • wikirelated is called Related searches:; It returns only the titles and "tags":"#small". The header Related searches: cannot be clicked because the resource does not configure the "urltemplate" for the end user.

Note that each configuration uses the same "apitemplate": Each of the 7 results effectively use the exact same search engine. Please do not use this example in a actual server configuration. Note that the configuration of the mockup, if used in an actual Searsia Server, would send each query 7 times(!) to Wikipedia.

An overview of all fields

The table below contains a quick reference for all fields:

ParameterExplanation
*Id: A unique string identifying this resource. Should be unique within the server.
Name: A short name for this resource, to be displayed to the user.
Icon: The url of the icon image, to be displayed to the user. Icons should have equal width and height. Icons are preferably png files, not smaller than 48x48 pixels.
User template: A url specified following the searsia url template syntax, to be used by users. The mime type of this url must be text/html or application/xml+xhtml.
*API template: A url specified following the searsia url template syntax, to be used by the server.
Post string: Only set if the API template HTTP method is POST, empty if the HTTP method is GET. Like the template, the post string may include parameters.
Mime type: The format returned by the API Template. Supported formats are: application/searsia+json, text/html, application/xml, application/json. If omitted, the mime type application/searsia+json is assumed.
Test query: A query for which a search gives a non-empty result. If not set, the system should give a non-empty result for the query searsia.
Rerank:Specifies a ranking algorithm or filter that is used to rerank/refilter the search results. Currently, only or is supported, filtering the returned results such that results that do not match any of the query terms are removed.
Prior: A value between 0 and 1 indicating the prior probability of a resource to be selected (prior to knowing the query). For instance, a value of 0.1 indicates that the resource gives relevant results for 1 in 10 queries. A value of 1 indicates that the resource always returns relevant results. In this case, the server likely returns the resource for every query. Values bigger than 1 are permitted to prioritize between multiple resources that are always selected.
Item XPath: An XPath 1.0 query that selects the search results from an HTML or XML result. XPath is also used to select results from JSON results, assuming a standard conversion to XML, where JSON lists are converted to repeated XML elements.
XPaths: Field names and XPath queries for selecting parts of search results such as the title, url, and description. For example, the title would typically be selected as the first anchor text in a search result, i.e., (.//a)[1]. The XPath queries are evaluated with respect to an Item XPath context node, and they typically start with . (the 'self' axis step).
Headers: HTTP headers to be sent to the API Template, consisting of a field name (without ':') and the field value. Like API Template and Post String, the field value may include parameters.
Parameters: Secret parameter names and their values. Occurrences of the parameter names in the API Template, Post String or Headers will be replaced by the value. The server will not share the parameter names and values with other clients, so it is safe to use them for API keys and secrets.

* mandatory settings