Open Library Search: Balancing High-Impact with High-Demand

The Open Library is a card catalog of every book published spanning more than 50 million edition records. Fetching all of these records all at once is computationally expensive, so we use
Apache Solr
to power our search engine. This system is primarily maintained by Drini Cami and has enjoyed support from myself, Scott Barnes, Ben Deitch, and Jim Champ.
Our search engine is responsible for rapidly generating results when patrons use the autocomplete search box, when apps make book data requests using our programatic
Search API
, to load data for rending book carousels, and much more.
A key challenge of maintaining such a search engine is keeping its schema manageable so it is both compact and efficient yet also versatile enough to serve the diverse needs of millions of registered patrons. Small decisions, like whether a certain field should be made sortable can — at scale — make or break the system’s ability to keep up with requests.
This year, the Open Library team was committed to releasing several ambitious search improvements, during a time when the search engine was already struggling to meet the existing load:
Edition-powered Carousels
that go beyond the general work to show you the most relevant, specific, available edition, in your desired language.
Trending
algorithms that showcase what books are having sudden upticks, as opposed to what is consistently popular over stretches of time.
10K Reading Levels
to make the K-12
Student Library
more relevant and useful.
Rather than tout a success story (we’re still in the thick of figuring out performance day-by-day), our goal is to pay it forward, document our journey, and give others reference points and ideas for how to maintain, tune, and advance a large production search system with a small team. The vibe is “keep your head above water.”
Starting in the Red
Towards the third quarter of last year, the Internet Archive and the Open Library were victim to a
large scale, coordinate DDOS attack
. The result was significant excess load to our search engine and material changes in how we secured and accessed our networks. During this time, the entire Solr re-indexing process (i.e. the technical process for rebuilding a fresh search engine from the latest data dumps) was left in an broken state.
In this pressurized state, our first action was to tune Solr’s heap. We had allocated 10GB of RAM to the Solr instance but also the heap was allowed to use 10GB, resulting in memory exhaustion. When Scott lowered the heap to 8GB, we encountered fewer heap errors. This was compounded by the fact that previously, we dealt with long spikes of 503s by restarting Solr, causing a
thundering herd problem
where the server would restart just to be overwhelmed by heap errors.
With 8GB of heap, our memory utilization gradually rose until we were using about 95% of memory and without further tuning and monitoring, we had few options other than to i…

Descubre más desde Hoy En Perspectiva

Suscríbete y recibe las últimas entradas en tu correo electrónico.

Ultima Hora

Open Library Search: Balancing High-Impact with High-Demand