Start

Searsia comes with a client and a server.

The client

The Searsia Web client can be downloaded as searsiaclient.zip and unzipped on your local machine or web server. To use the web client:

  1. Set the variable API_TEMPLATE in the second line of the file js/searsia.js;
  2. Open index.html in a web browser;
  3. Congratulations! You now run your own web application for federated search.

The API template is a url with a placeholder for the query and possibly other parameters. Examples of API templates of on-line Searsia servers are the University of Twente's site search engine, and Dr. Sheet Music:

https://search.utwente.nl/searsia/index.json?q={searchTerms}&page={startPage?}

or

https://drsheetmusic.com/searsia/index.json?q={searchTerms}&page={startPage?}

If you run Searsia Server on your local machine (see next section), you can connect the client to your own server by setting the API template to something like:

http://localhost:16842/searsia/index?q={searchTerms?}&page={startPage?}

The server

Download the Java server searsiaserver.jar and use the following command to start the server:

java -jar searsiaserver.jar

The server requires Java 8 or higher. Just like the web client, the java server needs another Searsia server's API template to connect to. We call the other server the mother, because your server will learn what it needs to know from the other server. Your server will display the following message:

Please provide mother's api template (use '-m').

Additionally, the server displays all server options. Use the option -m to provide the API template, for instance one of the example templates under Client Options above.

Your Searsia server copies the Searsia search engine definitions of the server it connects to (specified by the API template). Your server will have its own API template, which it will display at start up. For instance, connect to Dr. Sheet Music as follows:

java -jar searsiaserver.jar -m 'https://drsheetmusic.com/searsia/index.json?q={searchTerms}&page={startPage?}'

Searsia server v1.0.2
Starting: Dr. Sheet Music (index)
API template: http://localhost:16842/searsia/index?q={searchTerms}&page={startPage?}
Use Ctrl+c to stop.

Use the reported API template in your client as explained above to connect your client to your server.

Federation: One server, many search engines

A Searsia server provides access to many search engines. Together, the search engines form a federation. Like a federation of countries that form a country together, Searsia manages a federation of search engines that form a search engine together. Each search engine in the federation has a unique identifier. The identifier of the search engine in the examples above is 'index'. Suppose there is another search engine in the federation that is called 'didyoumean', then we can access it by replacing 'index' by 'didyoumean' as follows:

https://search.utwente.nl/searsia/didyoumean

If you look at the URL in your browser, you will see the JSON search engine definition below:

{
  "resource": {
    "apitemplate": "https://search.utwente.nl/searsia/didyoumean.php?q={searchTerms}",
    "favicon": "https://search.utwente.nl/ut-icons/ut.png",
    "id": "didyoumean",
    "mimetype": "application/searsia+json",
    "name": "Did you mean:",
    "testquery": "test"
  },
  "searsia": "v1.0.2"
}

Searsia servers copy and share these search engine definition files. We might now start a local Searsia server that serves search results from 'didyoumean' as follows:

java -jar searsiaserver.jar -m 'https://search.utwente.nl/searsia/didyoumean?q={searchTerms}'

Searsia retrieves the definition file and runs a local copy that provides results from 'didyoumean'. In fact, we might also download the JSON definition file to our machine and then start Searsia Server from there with the same effect:

wget https://search.utwente.nl/searsia/didyoumean.json
java -jar searsiaserver.jar -m didyoumean.json

Testing a search engine

Searsia Server tests a search engine using the -t option as follows:

java -jar searsiaserver.jar -m didyoumean.json -t json

This will output the json search results for the "testquery" that is specified in didyoumean.json, or an error if there is a problem with the definition file or the didyoumean search engine.

Testing search engines will be needed frequently when setting up a new Searsia engine, or when maintaining an existing Searsia engine. More about adding engines to the federation can be found on the Protocol page.

Server options

Searsia server supports several command line options, shown in the following table. For convenience, each option has a short-hand consisting of one hyphen and the first letter of the option, for instance -h for --help.

OptionExplanation
--cache <arg> Set cache size (integer: number of result pages). The default is 500 pages.
--dontshare Do not share resource definitions. This will make it impossible for other servers to meaningfully connect to your server.
--export Export index to stdout and exit.
--help Show help.
--interval <arg> Takes as argument an integer, which is the poll interval (in seconds). The server sends a random query each interval to a search engine. The default value is 120 seconds. If your server contains 30 resources (search engines), a polling interval of 120 seconds will poll each resource on average once per 120 * 30 = 3600 seconds, so once each 60 minutes, about 24 queries a day. In scientific literature on distributed information retrieval, polling is usually called query-based sampling. An interval of 0 disables polling.
--log <arg>Takes as argument an integer, which is the type of log messages produced by the server (to the index directory). Supported levels are: 0 = no logging; 1 = only errors; 2 = errors and warnings; 3 = errors, warnings, and information, 4 = all of level 3 plus debug information. The default level is 2.
--mother <arg> REQUIRED. Sets the API template of the mother. See above for example values.
--nohealth Do not share health report.
--path <arg> Set the path on the file system where the index is stored. The default depends on your operating system. Typically, the index ends up in your home directory under: .local/share/searsia if your are on a Linux-based system, under Library/Application Support/Searsia if you are on Apple, and under Application Data/Searsia on many Windows versions.
--quiet No output to console.
--test <arg> Print test output and exit (argument: 'json', 'xml', 'response', or 'all').
--url <arg> Set the url of the web service endpoint. The default is 'http://localhost:16842/searsia/'

Server trouble shooting

The server might return the following error messages:

  • Setup failed: index_75cbc797a03dfe0dd08780e6c68e7bbc
    The server cannot create the index. Check if the server has write premissions in the reported directory. Use the --path option to change the location of the index.
  • Setup failed: Lock obtain timed out: NativeFSLock
    There is already an instance of the Searsia server running on the same index. Kill the server and try again.
  • Server failed: Failed to start Grizzly HTTP server: Address already in use
    There is already a running instance of the Searsia server on the same url. Kill the server, or run the server on a different url or port number with the --url option.
  • Server failed: Failed to start Grizzly HTTP server: Unresolved address, or
    Server failed: Failed to start Grizzly HTTP server: Permission denied
    The provided url is invalid, or the url does not belong to the machine that runs the server, or the port is blocked or taken by another application. Change the url with the --url option.
  • Error: Connection failed: java.net.UnknownHostException, or
    Error: Connection failed: java.io.FileNotFoundException
    Connection to the mother failed. Check your internet connection. Check the mother engine and mother template. Change the mother template with the --mother option.
  • Error: Connection failed: org.json.JSONException
    The mother is not a Searsia engine. Check the mother engine and mother template. Change the mother with the --mother option.

Download Searsia

Thanks, Codeberg!