Fuzzy Search Implementation

Fuzzy Search Implementation

Option

Where on stack

Description

Typical use cases

Should we use it and how

fuse.js

frontend/backend

  • simple to use javascript library that allows fuzzy search to be performed on arrays of objects based on specified keys (i.e. company name, visa requirements, etc. - supports logical query operators (AND, OR) so it can be used to implement filtering by tags in the in future - implemented using the Bitap algorithm which consider 2 strings as equal using Levenshtein distance - Levenshtein distance is the minimum number of substitutions required to convert one string to another

  • client-side fuzzy search on small to moderately large data sets - typically for applications that can’t justify having a backend

  • its features are perfect for uses cases where if a student searches for “gogle” and jobs from “google” should be returned - however, it makes less sense for our application since it would involve querying the entire database on page load (for example every job) and caching the results for use as the user navigates the site - it’s simple to use since the above result can be stored in an array and the fuse api is simply a function that operates on an array - despite its simplicity, we should probably stay away because as our database increases in size, initially page load time will become slower ⇒ poor UX

fuzzystrmatch

db (psql only)

  • official postgres library for performing fuzzy search straight on the database - provides multiple heuristics for determining similarities between strings 1. soundex: searching for ‘john’ returns strings like ‘jon’, ‘joan’, etc. 2. levenshtein: mentioned in fuse.js 3. metaphone / double metaphone: more phonetic algorithms based on sound

server-side fuzzy search for production size databases

  • has the benefits on fuse.js without the need for initial query on database - however, since it’s server side, there may be delays between user inputting text on frontend to receiving suggestions - flow: user input → frontend → backend → db → backend → frontend - if this option is chosen, perhaps calling for a fuzzy search on every keystroke could be replaced with fuzzy search per every few characters since it would make more sense as our db gets larger - since it’s just psql, everyone who has db experience or has taken COMP3311 should get the hang of it pretty easily

algolia

new service

  • externally hosted search engine that delivers real-time results from the first keystroke. it promises to deliver relevant results in under 100ms anywhere in the world - data stored locally in our database can be uploaded to their severs and be regularly updated on our needs (i.e. new company registered) - provides the UI components required for Vue

used at large companies such Adobe, Twitch, Stripe… etc

  • feels slightly overkill but it does feel like most scalable approach - free 10k searches + records (essentially a row in the db) per month, then $1 per 1k searches + record - 10k threshold may be exceeded once we store user profiles via jared’s thingo - unsure on how difficult this will be to incorporate

Thanks to @Matthew Liu (Unlicensed) for researching this!!!