The Open Library Blog

Refining the Open Library Catalogue: My Internship Story

Refining the Open Library Catalogue: My Internship Story

By
Jordan Frederick
AKA
Tauriel063 (she/her)
, Canada
When deciding where to complete my internship for my Master’s in Library and Information Science (MLIS) degree, Open Library was an obvious choice. Not only have I been volunteering as an Open Librarian since September 2022, but I have also used the library myself. I wanted to work with people who already knew me, and to work with an organisation whose mission I strongly believe in. Thus, in January 2025, I started interning at Open Library with Lisa Seaberg and Mek Karpeles as my mentors.
At the time of writing this, I am three courses away from completing my MLIS through the University of Denver, online. During my time as both a student and Open Librarian, I gained an interest in both cataloguing and working with collections. I decided to incorporate both into my internship goals, along with learning a little about scripting. Mek and Lisa had plenty of ideas for tasks I could work on, such as creating patron support videos and importing reading levels into the Open Library catalogue, which ensured that I had a well-rounded experience (and also never ran out of things to do).
The first few weeks of my internship centered largely around building my collection, for which I chose the topic of
Understanding Artificial Intelligence
(AI). Unfortunately, I can’t take credit for how well-rounded the collection looks presently, as I quickly realised that my goal to learn some basic coding was more challenging than I expected. If you happen to scroll to the bottom and wonder why there are over 80 revisions to the collection, that was because I spent frustrated hours trying to get books to display using the correct codes and repeatedly failed. It is because of Mek’s and Jim Champ’s coding knowledge that the collection appears fairly robust, although I suggested many of the sections within the collection, such as “Artificial Intelligence: Ethics and Risks” and “History of Artificial Intelligence.” However, Mek has informed me that the AI collection will likely continue to receive attention by the community for the remainder of the year, as part of the project’s yearly goals. I hope to see it in much better shape by the time of our annual community celebration in October.
The Artificial Intelligence Collection.
I successfully completed several cataloguing tasks, including adding 25 AI books to the catalogue. With the help of Scott Barnes, an engineer at the Internet Archive, I made these books readable. I also separated 36 incorrectly merged Doctor Who books and merged duplicate author and book records. Another project involved addressing bad data, where hundreds of book records had been imported into the catalogue under the category “
Non renseigné
,” with minimal information provided for each. While I was able to fix the previously conflated
Doctor Who
records, there are still over 300 records listed as “
Non renseigné
.” As such, thi…

Streamlining Special Access for Patrons with Qualifying Print Disabilities

Streamlining Special Access for Patrons with Qualifying Print Disabilities

By
Mek
, Elizabeth Mays, & Ella Cuskelly
A core aspect of Open Library’s mission is making the published works of humankind accessible to all readers, regardless of age, ability, or location. In service of this goal, the Internet Archive participates in a special access program to serve patrons who have certified
print disabilities
that impact their ability to read standard printed text. Individuals certified by
qualifying authorities
can access materials in accessible formats through their web browser or via protected downloads. These affordances are offered in accordance with the
Marrakesh Treaty
, which exists to “facilitate access to published works for persons who are blind, visually impaired or otherwise print disabled.”
The first hurdle individuals with print disabilities must clear before getting the access they require is discovering which organizations, like the Internet Archive, participate in special access programs. Previously, patrons would have to perform a google search or the Internet Archive’s help pages in order to learn about the special access offerings and the next steps for certification. The Internet Archive is excited to announce a new, streamlined process where patrons with qualifying print disabilities may apply for special access while registering for their free
OpenLibrary.org
account.
How to Request Special Print Disability Access
Starting May 15th, 2025, patrons who
register
for a free Internet Archive Open Library account will be presented with a checkbox option to “apply for
special print disability access
through a qualifying program.”
A screenshot showing the new checkbox on the registration page for patrons to request special print disability access.
Once this box is checked, the patron is prompted to select which
qualifying program
will certify the patron’s request for special access. This will be an organization like BARD, BookShare, ACE, or a participating university that has a relationship with the patron and can qualify their request.
A select box enables patrons to select the program through which they qualify for special print disability access.
Once the patron completes registration and logs in, they will receive an email with steps to either immediately apply their BARD or ACE credentials, or connect with their qualifying program to complete the certification process.
Once certified, print disabled patrons will have special access to a digital repository of more than 1.5 million accessible titles.
An example search result for “biology textbooks,” which shows blue Special Access buttons for patrons with certified print disabilities.
We hope these improvements will make our offerings more discoverable to those who need them and reduce unnecessary steps hindering access.
The Open Library team is committed to improving the enrollment process and accessibility offerings available to those with qualified print disabilitie…

Connecting K-12 Students With Books by Reading Level

Connecting K-12 Students With Books by Reading Level

We recently improved the relevance of our
Student Library
collection by adding more than 10,000 reading levels to borrowable K-12 books in our search engine.
What are Reading Levels?
In the same way a library-goer may be interested in finding a book on a specific topic, like vampires, they may also be interested in narrowing their search for books that are within their reading level. The
readability
of a book may be scored in numerous ways, such as analyzing sentence lengths or complexity, or the percentage of words that are unique or difficult, to name a few.
For our purposes, we choose to incorporate
Lexile
scores—a proprietary system developed by MetaMetrics®. Lexile scores are widely used within school systems and have a reliable scoring system that is accessible and
well documented
.
While the goal of our initiative was to add reading levels specifically for borrowable K-12 books within the Open Library catalog, Lexile also offers a fantastic
Find a Book
hub where teachers, parents, and students may search more broadly for books by Lexile score. We’re grateful that Lexile features “Find in Library” links to the Open Library so readers can check nearby libraries for the books they love!
Before Reading Levels
Before Open Library had reading level scores, the system used
subject tags
to identify books according to grade level. Many of these categories were noisy, inaccurate, and had high overlap, making it difficult to find relevant books. Furthermore, with grade level bucketing, there was no intuitive way to search for books across a range of reading levels.
Searching for Books by Reading Level
Now, lexile score ranges like “lexile:[500 TO 900]” can be used flexibly in search queries to find the exact books that are right for a reader, with results being limited by grade levels.
https://openlibrary.org/search?q=lexile%3A%5B700+TO+900%5D
Putting It All Together
By utilizing these lexile ranges, we’ve been able to develop a more coherent and expansive K-12 portal experience where there are fewer duplicate books across grade levels.
We expect this improvement will make it easier for K-12 students and teachers to find appropriate books to satisfy their reading and learning goals. You can explore the newly improved K-12 student collection at
http://openlibrary.org/k-12
.
What Do You Think?
Is there something you miss from the previous K-12 page? Is the new organization more useful to you? Share your thoughts on
bluesky
.
Credits
This reading level import initiative was led by Mek, the Open Library program lead, with assistance from Drini Cami, Open Library senior software engineer. The project received support from Jordan Frederick, an Open Library intern who shadowed this project as part of her Master’s in Library and Information Science (MLIS) degree.

What’s Trending on Open Library?

What’s Trending on Open Library?

A major update to the Open Library search engine now makes it easy for patrons to find books that are receiving spikes of interest.
You may be familiar with the trending books featured on Open Library’s home page. Actually, you might be very familiar with them, because many seldom change! Our previous trending algorithm approximated popularity by tracking how often patrons clicked that they wanted to read a book. While this approach is great for showcasing popular books, the results often remain the same for weeks at a time.
The new trending algorithm, developed by
Benjamin Deitch
(Open Library volunteer and Engineering Fellow) and
Drini Cami
(Open Library staff and senior software engineer) uses hour-by-hour statistics to give patrons fresh, timely, high-interest books that are gaining traction now on Open Library.
This improved algorithm now powers much of the Open Library homepage and is fully integrated into Open Library’s search engine, meaning:
A patron can sort any search on Open Library by trending scores. Check out what’s trending today in
Sci-fi
,
Romance
, and
Short-reads in French
.
A more diverse selection of books should be displayed within the carousels on the homepage, the library explorer, and on subject pages.
Librarians can leverage sort-by-trending to discover which high-traffic book records may be impactful to prioritize.
Sorting by Trending
From the search results page, a patron may change the “Relevance” sort dropdown to “Trending” to sort results by the new z-score trending algorithm:
The Algorithm
Open Library’s Trending algorithm is implemented by computing a
z-score
for each book, which compares each book’s: (a) “activity score” over the last 24 hours with (b) the total “activity score” of the last 7 days.
Activity scores are computed for a given time interval by adding the book’s total human page views (how often is the book page visited) with an amplified count of its reading log events (e.g. when a patron marks a book as “Want to Read”). Here, amplified means that a single reading log event has a higher impact on the activity score than a single page view.
All of the intermediary data used to compose the z-score is stored and accessible from our search engine in the following ways:
For Developers
While the
trending_z_score
is the ultimate value used to determine a book’s trending score on Open Library, developers may also query the search engine directly to access many of the intermediary, raw values used to compute this score.
For instance, we’ve been experimenting with the
trending_score_daily_[0-6]
fields and the
trending_score_hourly_sum
field to create useful ways of visualizing trending data as a chart over time:
The search engine may be
queried
and filter by:
trending_score_hourly_sum
– Find books with the highest accumulative hourly score for today, as opposed to the computed weekly trending score.
trending_score_daily_0
through
trending_score_daily_6
– Find books…

Bringing Sidewalk Libraries Online

Bringing Sidewalk Libraries Online

by
Roni Bhakta
&
Mek
All around the world, sidewalk libraries have been popping up and improving people’s lives, grounded in
our basic right
to pass along the books we own: take a book, leave a book.
As publishers transition from physical books to ebooks, they are rewriting the rules to strip away the
ownership
rights
that make libraries possible. Instead of selling physical books that can be preserved, publishers are forcing libraries to rent ebooks on locked platforms with restrictive licenses.
What is a library that doesn’t own books?
And it’s not just libraries losing this right — it’s us too.
Did you know:
When a patron borrows a book from their library using platforms like Libby, the library typically pays each year to rent the ebook. When individuals purchase ebooks on Amazon/Kindle, they don’t own the book — we are agreeing to a “perpetual” lease that can’t be resold or transferred and might disappear at any moment. In 2019, Microsoft Books shut down and
customers lost access to their books
.
This year,
Roni Bhakta
, from Maharashtra, India, joined
Mek
from the Internet Archive’s Open Library team for Google Summer of Code 2025 to ask:
how can the idea of a sidewalk library exist on the Internet?
Our response is a new
open-source, free, plug-and-play
“Labs” prototype called
Lenny
, that lets anyone, anywhere – libraries, archives, individuals – set up their own
digital
lending library online to lend the digital books they
own
. You may view Roni’s initial proposal for Google Summer of Code
here
. To make a concept like Lenny viable, we’re eagerly following the progress of publishers like Maria Bustillos’s
BRIET
, which are creating a new market of ebooks, “for libraries,
for keeps
“.
Design Goals
Lenny
is designed to be
:
Self-hostable
. Anyone can host a Lenny node with minimal compute resources.
Easy to install.
A single
https://lennyforlibraries.org/install.sh
install script uses Docker so Lenny works right out of the box.
Preloaded with books.
Lenny comes preloaded with over 500+ open-access books.
Compatible with dozens of existing apps.
Each Lenny uses the OPDS standard to publish its collection, so any compatible reading app (Openlibrary, Internet Archive, Moon reader and others) can be used to browse its books.
Features
Lenny comes integrated with:
A seamless reading experience.
An onboard Thorium Web EPUB reader lets patrons read digital books instantly from their desktop or mobile browsers.
A secure, configurable lending system.
All the basic options and best practices a library or individual may need to make the digital books they own borrowable with protections.
A marketplace.
Lenny is designing a connection to an experimental marketplace so one can easily buy and add new digital books to their collection.
Learn More
Lenny is an early stage prototype and there’s still much work to be done to bring the idea of Lenny to life. At the same time, we’ve made
great progress
towards a wo…

Sandy Chu: My Internship at the Internet Archive

Sandy Chu: My Internship at the Internet Archive

This summer, continuing a years-long tradition, Open Library and the Internet Archive took part in
Google Summer of Code
(GSoC), a Google initiative focused on bringing new contributors into open source software development. This year, I was lucky enough to mentor Sandy, a long-time Open Library volunteer, on an exciting project to increase the accessibility of our books with real-time translations. We have invited Sandy to speak about her experience here as we reach the culmination of the GSoC period. It was a pleasure getting to work on this exciting project with you Sandy! – Drini
My name is
Sandy Chu
and I am a 2025 Google Summer of Code (GSoC) candidate who had the opportunity to work with the amazing Internet Archive engineering team. Prior to participating in the GSoC program, I had contributed as a volunteer software engineer for the Open Library open source repo. As someone who grew up using local libraries as a place to supplement my education and read books that my school could not afford, I was drawn to the Open Library’s mission to empower book lovers and provide a free, valuable resource to all. You can view my initial proposal
here
.
Coming soon in September, the Open Library will be able to better serve its global audience to access books that were previously not available due to a lack of localization. With the help of open source projects such as the
Mozilla Firefox Translation Models
and
Bergamot Translator
library, a new BookReader plugin will have the ability to leverage a user’s browser and hardware resources to toggle translations from a book’s original language to a translation in their language. Additionally, the translated text will also work with the ReadAloud feature to read books in the translated language.
The “Real-Time In-Browser Book Translation w/ Read Aloud (TTS)” project closely aligns with the Open Library’s 2025 goal of providing more with less. Although the Internet Archive hosts and provides its patrons with hundreds of thousands of publicly available works, patrons are limited to a subset of works that were published in their native language. Due to the unique image based implementation of the BookReader application, default browser translator options are not viable for many readers, so this project presents an opportunity to make a big impact for international audiences.
Currently in internal beta, the translation plugin allows patrons to quickly initiate a local translator on-their device and translate the book’s text in just a few seconds per page. With nine distinct languages available for translation from English (and potentially over 40 as we update to Mozilla’s
latest models
), this project will make countless works more accessible for patrons.
The primary goals of this project were:
Translating a book’s original text content to the patron’s desired language with minimal delay or disruption
Creating a visually seamless experience to maint…

Open Library Search: Balancing High-Impact with High-Demand

Open Library Search: Balancing High-Impact with High-Demand

The Open Library is a card catalog of every book published spanning more than 50 million edition records. Fetching all of these records all at once is computationally expensive, so we use
Apache Solr
to power our search engine. This system is primarily maintained by Drini Cami and has enjoyed support from myself, Scott Barnes, Ben Deitch, and Jim Champ.
Our search engine is responsible for rapidly generating results when patrons use the autocomplete search box, when apps make book data requests using our programatic
Search API
, to load data for rending book carousels, and much more.
A key challenge of maintaining such a search engine is keeping its schema manageable so it is both compact and efficient yet also versatile enough to serve the diverse needs of millions of registered patrons. Small decisions, like whether a certain field should be made sortable can — at scale — make or break the system’s ability to keep up with requests.
This year, the Open Library team was committed to releasing several ambitious search improvements, during a time when the search engine was already struggling to meet the existing load:
Edition-powered Carousels
that go beyond the general work to show you the most relevant, specific, available edition, in your desired language.
Trending
algorithms that showcase what books are having sudden upticks, as opposed to what is consistently popular over stretches of time.
10K Reading Levels
to make the K-12
Student Library
more relevant and useful.
Rather than tout a success story (we’re still in the thick of figuring out performance day-by-day), our goal is to pay it forward, document our journey, and give others reference points and ideas for how to maintain, tune, and advance a large production search system with a small team. The vibe is “keep your head above water.”
Starting in the Red
Towards the third quarter of last year, the Internet Archive and the Open Library were victim to a
large scale, coordinate DDOS attack
. The result was significant excess load to our search engine and material changes in how we secured and accessed our networks. During this time, the entire Solr re-indexing process (i.e. the technical process for rebuilding a fresh search engine from the latest data dumps) was left in an broken state.
In this pressurized state, our first action was to tune Solr’s heap. We had allocated 10GB of RAM to the Solr instance but also the heap was allowed to use 10GB, resulting in memory exhaustion. When Scott lowered the heap to 8GB, we encountered fewer heap errors. This was compounded by the fact that previously, we dealt with long spikes of 503s by restarting Solr, causing a
thundering herd problem
where the server would restart just to be overwhelmed by heap errors.
With 8GB of heap, our memory utilization gradually rose until we were using about 95% of memory and without further tuning and monitoring, we had few options other than to i…

Save the Date: 2025 Open Library Community Celebration

https://archive.org/embed/openlibrary-tour-2020/openlibrary-community-celebration-2024.mp4

Save the Date: 2025 Open Library Community Celebration

Each year since 2020, we’ve hosted a virtual celebration to honor the many global contributors who make the Open Library project possible and continuously improve the experience for our patrons.
This year’s Open Library Community Celebration will be held virtually on Tuesday, Nov. 4, at 9 a.m. PDT.
Volunteers, staff, patrons and friends of the library are invited to RSVP
here
to get the link.
Last year was marked by more than
500,000 books being removed from the library
,
cyber security attacks
, and power outages. Our
response
has been to focus on doing more with less: making the books we have more useful, making our contributors more effective, and targeting our efforts to the underserved communities who rely on our services most.
Celebrate with us as we present:
Personal success stories
New improvements for our library patrons
A sneak peek at our 2026 roadmap
Open Library’s strategic path forward
Also, check out previous years’ community celebrations to learn more about other recent victories:
2024
,
2023
,
2022
,
2021
,
2020
.
Looking forward to
inviting
you to this year’s celebration!