Outbreak Database
Worldwide Database for Nosocomial Outbreaks
Beta Release
You are not logged in.
Log in
Register for free
 
Home    About    News    Contact    References   |   Help    Field Reference   |   Advanced Search   |   Site Map

FAQ (Frequently asked questions)

How to use the search function in Outbreak Database

  1. Quick Start
  2. Terms
  3. Fields
  4. Term Modifiers
  5. Boolean Operators
  6. Grouping
  7. Field Grouping
  8. Escaping Special Characters

Quick Start

To be brief, you can enter and combine your search terms pretty much like you are probably used to from using web search engines like Google.

E.g.:

mrsa finds all articles containing mrsa or MRSA etc.

mrsa dirt finds all articles containing mrsa and dirt.

"hepatitis c" finds all articles containing the phrase hepatitis c, i.e. term hepatitis immediately followed by term c.

If you prefix a term with a tag followed by a colon, it will only be searched in the field concerning the specific tag, otherwise, it will be searched in all fields. The page Field Reference describes the structure of an Outbreak article and the tags that are usable for any query.

(The following documentation is based on the documentation of Apache Lucene.)

Terms

A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases.

A Single Term is a single word such as virus or mrsa.

A Phrase is a group of words surrounded by double quotes such as "staphylococcus aureus".

Multiple terms can be combined together with Boolean operators to form a more complex query (see below).

The search terms are case-insensitive, i. e. Virus yields the same result as virus.

Fields

If you prefix a term with a tag followed by a colon, the term will only be searched in the field belonging to that tag, otherwise, it will be searched in all fields.

Let's see some examples. If you want to find an article about an outbreak in the USA containing the phrase "they washed their hands" somewhere in the article, you can enter:

cy:usa AND "they washed their hands"

or

cy:usa "they washed their hands"

Note: The field is only valid for the term that it directly precedes, so the query

cy:United Arab Emirates

will only find United in the country field. It will find Arab and Emirates just somewhere in the text.
Simply type:

cy:"United Arab Emirates"

Note: The characters in a tag must either be all UPPERCASE or all lowercase.

Term Modifiers

Wildcard Searches

Single and multiple character wildcard searches are supported.

To perform a single character wildcard search use the "?" symbol.

To perform a multiple character wildcard search use the "*" symbol.

The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:

te?t

Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:

test*

You can also use the wildcard searches in the middle of a term:

te*t

Note: You cannot use a * or ? symbol as the first character of a search.

Fuzzy Searches

To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to staphylokokus use the fuzzy search:

staphylokokus~

This search will also find Staphylococcus.

An additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:

staphylokokus~0.8

The default that is used if the parameter is not given is 0.5.

Proximity Searches

Finding words that are within a specific distance is also supported. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a dirt and infection within 10 words of each other in a document use the search:

"dirt infection"~10

Note that in rare cases, proximity search might return strange results! We are working on the issue.

Boosting a Term

To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be for your search result.

Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for

dirt infection

and you want the term "dirt" to be more relevant boost it using the ^ symbol along with the boost factor next to the term. You would type:

dirt^4 infection

This will make documents with the term "dirt" appear more relevant. You can also boost Phrase Terms as in the example:

"dirt and dust"^4 "infection and disease"

By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2).

Boolean Operators

Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "+", OR, NOT and "-" as Boolean operators . (Note: Boolean operators must be all UPPERCASE).

AND

The AND operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the AND operator is used.

The AND operator matches documents where both terms exist in a single article.

To search for documents that contain "dirt and dust" and "infection and disease" use the query:

"dirt and dust" AND "infection and disease"

or

"dirt and dust" "infection and disease"

OR

The OR operator links two terms and finds a matching article if either of the terms exists in the article.

To search for documents that contain either "dirt and dust" or just "infection and disease" or both use the query:

"dirt and dust" OR "infection and disease"

NOT

The NOT operator excludes documents that contain the term after NOT.

To search for documents that contain "dirt and dust" but not "infection and disease" use the query:

"dirt and dust" NOT "infection and disease"

Note: The NOT operator does not exacty behave like a logical NOT. E. g., it cannot be used with just one term. For example, the following search will return no results:

NOT "foobar 123"

The following query will not return all documents:

mrsa OR (NOT "foobar 123")

Grouping

You can use parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.

To search for either dirt or dust and infection use the query:

(dirt OR dust) AND infection

This eliminates any confusion and makes sure you that infection must be contained and either term dust or dirt must be contained.

Always use parentheses when grouping boolean queries! Writing something like

dirt OR dust AND infection

might lead to unexpected results.

Field Grouping

You can use parentheses to group multiple clauses to a single field.

To search for a country that contains both the word "united" and the phrase "arab emirates" use the query:

cy:(united "arab emirates")

Escaping Special Characters

Lucene supports escaping special characters that are part of the query syntax. The current list special characters are

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

To escape these character use the \ before the character. For example to search for (1+1):2 use the query:

\(1\+1\)\:2