Identifying Article Metadata in “The Avicultural Magazine”

Identifying Article Metadata in “The Avicultural Magazine”

This blog post was written by Taylor Smith,
the 2019
Kathryn Turner Diversity and Technology Intern
in the Smithsonian Libraries’ Web Services Department. At the time of her internship, Taylor was an undergraduate Computer Science student at Bowie State University. Her work in the summer of 2019 consisted of developing and coding a method for identifying article metadata in
The Avicultural Magazine,
a leading journal for the keeping of non-domesticated birds in captivity.
As a biology major with an interest in computer science, I had a curiosity for wildlife and a newfound love for coding. I kept the two in mind when searching for internships, and luckily for me, I was led to the Kathryn Turner Diversity in Technology Internship for the summer of 2019. When I saw that the internship would focus on working with zoo articles relating to botany and wildlife, I knew this was perfect for me.
I had never held an internship before, let alone one that involves coding (which I had started learning that year). I had no idea what to expect when coming into this internship, but I learned a lot more than I could have imagined. Throughout this internship, I learned what metadata was and why it was so important. I learned why having digitized articles available online was so crucial. I also learned that making information accessible took a lot more work than anyone would think.
In my first week, I was introduced to the
Biodiversity Heritage Library
(BHL), an online digital library designed to make biodiversity literature available to the public. In this library, I was specifically working with
The Avicultural Magazine
. This was a journal created by the Avicultural Society in 1894 with the purpose of spreading information, advice, and updates on non-domesticated birds. The volumes are digitized by Smithsonian Libraries and Archives and processed through optical character recognition (OCR) for the convenience of zookeepers and other zoo curators. The only problem with this is that it takes scrolling through endless pages of articles to find the specific item you’re looking for. My job was to create code that finds metadata for these articles to make them much more accessible and citable.
Below is an example of a page with the beginning of an article.
J. Lewis Bonhote, “Field Notes on Some Bahama Birds”,
The Avicultural Magazine
, volume 9, number 1
(November 1902): 19.
At first, I had to write code that would open up the directory of all the articles, open up one file at a time, and look for titles, page numbers, authors, etc. I set to work, but it was not long before we found that Penn State University and the National University of Singapore actually had
a project named ParsCit
that went through the files and searched for said data. The results are placed into an XML file, which was helpful to the process but not exactly as we needed.
My job then became loading and parsing the XML files us…


Descubre más desde Hoy En Perspectiva

Suscríbete y recibe las últimas entradas en tu correo electrónico.

Deja un comentario

Descubre más desde Hoy En Perspectiva

Suscríbete ahora para seguir leyendo y obtener acceso al archivo completo.

Seguir leyendo

Descubre más desde Hoy En Perspectiva

Suscríbete ahora para seguir leyendo y obtener acceso al archivo completo.

Seguir leyendo