Improving Postgres Text Search Speed 7x On Millions of Records

Improving Postgres Text Search Speed 7x On Millions of Records

CharityAPI.org improved our Postgres response time so that searches initiated via our API were 760% faster without adding an additional tool like Elastic Search. This is how we did it.

CharityAPI.org is a simple API for nonprofit data lookup; the API the IRS should have built. In May we received customer feedback that our search was extremely slow for some queries. Upon investigation, we agreed with the customer and set about to improve search speed. CharityAPI.org stores approximately 1.5 million organizations in our main Postgres table of nonprofits, and we didn’t want to reach for Elastic Search yet, so we dug into the Postgres documentation to see if we could squeeze some more performance out of Postgres alone. We’re pleased to share that we increased our search speed by 760% so customers can get their data faster and delight their users with performant search. 

When we investigated our customer feedback, we found that while most searches returned in under 50ms,  the slowest 10% of calls to our charity search API endpoint took 15 seconds. Yikes. We can improve that.

First iteration: Add and query a tsvector field on the organizations table

Postgres offers search based on a tsvector type, which permits one to convert text to a tokenized vector type that is optimized for search. On our organizations table, we already have a field called “searchable_fields” where we place text we want associated with the record for search results, so we defined the new tsvector field “searchable_document” to be derived from the searchable_fields column. We also used GIN to build an index on the field. In Ecto, that migration looks like this: 

We then created a job that ensured all organizations had their organization_search index record properly created and updated. 

We updated our search query to search the new index table and join on organizations like so: 

This query using join was 76% slower than our naive implementation without the organization_search index table. We regressed.

Version Iterations Per Second
Querying “organizations.searchable_document”, without the organization_search index table. 0.29 IPS
Querying organization_search.searchable_document, joining on organizations. .0678 IPS (76% slower)

Final Iteration: 2 Queries: First on the organization_search table to retrieve organization IDs, then query organizations for those IDs. 

Since searching organizations_search and joining on organizations was slow, we wondered if omitting the join could possibly be faster even if it meant executing two queries. We refactored our search logic into two queries; one to collect the organization IDs and the second to retrieve those from the organizations table. 

A key gotcha that tripped us up: when querying with a list of primary keys, Postgres will not return the records in the same order as the IDs provided in a “where … in” clause. That means if you asked for the organizations where IDs were in [1, 3, 5], the results could be org3, org5, org1. The way to ensure order is to use the “WITH ORDINALITY” function in Postgres (source) as we did here. When we first tested this version, with the unordered results problem, search was absurdly fast but when we checked the quality of results we noticed the result order made zero sense which led us to realize our mistake. 

Version Iterations Per Second
Querying “organizations.searchable_document” 0.28 to 0.51
Querying “organization_search.searchable_document” for IDs in order, then a second search on “organizations” for those IDs. 0.89 (230% faster)

Crazy: two queries is 230% faster than our previous improvement, and significantly faster than trying to join in one query. 

We ran both in production side-by-side for a while to ensure we continued to see similar numbers on live traffic and no qualitative drop in search quality, and once satisfied we promoted our newest logic to be the main search execution path. 

Search speed for queries that have to hit Postgres are now consistently resolving in under 1 second, even for the slowest 1% of searches. We ran searches against prod from a few locations over a few days, and all results were extremely similar.

Run IPS Average Deviation Median 1% Slowest
1 0.95 1.05s +/- 6% 1.06s 1.11s
2 1.07 0.93s +/- 2.69% 0.94s 0.95s
3 1.14 0.87s +/- 8.2% 0.85s 0.97s

When we permit cache hits, we see average speeds double, while the slowest (hitting Postgres) are around 1.2 seconds.

IPS Average Deviation Median 1% Slowest
1.14 0.51s +/- 54.21% 0.42s 1.284s

Our AppSignal application monitoring graphs reflect the improvement; the slowest 10% of queries could take as long as 15 seconds without caching prior to the upgrade, but now the slowest decile of queries is under 2 seconds; a huge improvement. 

We’re extremely happy to deliver much more performant search for our API customers, and to do it all within Postgres without introducing another dependency.

Previous IPS Current IPS Change
0.132 1.14 +763%

Checkout our Docs or Signup to get an API key and begin querying all of the IRS data about nonprofits on CharityAPI.org. CharityAPI.org is relied on for millions of API calls a month by companies that need accurate nonprofit data.

Photo by Parrish Freeman on Unsplash

Related Posts