Enhanced Ghost Search

In my previous post I revealed the long-awaited (ha!) feature, Ghost Search. Thanks to having too much time on an airplane, today's update of Phasmodeck enhances the basic ghost name search by allowing keyword and partial keyword searches across multiple fields, including ghost name, description, strength, and weakness.

Why is this cool?

What's that ghost that interacts a bunch with electronic devices? You don't have to remember anymore, just search for it! In the search box type, "elec" and up pops Raiju! Sounds like a Pokémon!

How does it work?

Pre-processing

Queries and documents are pre-processed a bit to remove common stop words and punctuation, so as many queries as possible will match. Stop words are common words that are not usually useful for searching. For example, "the" would return every document and would not help narrow doc the search space. This kind of normalization is also convenient for the user. For example, I could search for "dots" and get results including "D.O.T.S." without having to type all those periods. I didn't think too hard about this part, so some words may be un-normalized.

Search structure

The main search structure is a most basic and naive reverse index from words contained within a ghost description to the ghost ID. It is implemented as a prefix trie, where each node in the trie contains a list of matching documents (ghosts).

In order to match partial keywords (e.g., "elec" for "electronic"), when building the trie, each word needs to be added along with all of its suffixes:

electronic lectronic ectronic ctronic tronic

And so on.

Querying

A query can contain one or more keywords. Exact text search is not supported. The trie is queried for each keyword and the keyword frequency is recorded for each result document. Results are in descending orded of hit frequency. Hopefully this should put the most relevant documents at the top of the search results, but that will not always be the case.

Bee-tee-dubs, all of this happens in your browser, of course. There are no API calls for searching for ghosts.

Conclusion

I don't think there's anything else I want to say about search right now. Have fun searching for ghosts.