Playing ancient music without an instrument

How we took an archive of sheet music and used software to turn it into sound.

Tristan Roddis
cogapp

--

When dealing with readable archival material we commonly end up performing text analysis on scanned images. There are powerful OCR tools that can convert scanned documents into machine-readable text, which makes records much easier to find and read.

However, there are other kinds of archive content that we might want to have in a machine-readable format. Take sheet music, for example. What if you could take scanned sheet music from the page to your speakers, and listen to scores from a museum archive? This was the task we set out to accomplish. Read on to see how we did.

coghack3

As part of our regular hack-day we like to invite others to join us and bring their expertise, products, or data for us to experiment with. For the third edition, “coghack3”, we welcomed Martin Shatwell and Rachel Nimmo from the National Library of Scotland into our midst.

Our team’s attention was caught by the library’s collections of 18th and 19th century musical scores. We have great experience of working with textual data but musical scores presented a novel challenge. What if we could automatically read the scores from the images and produce a meaningful way to interact with this data?

We decided to pursue two goals; firstly, to make it possible to search for patterns of notes within the archive; secondly, to play back the extracted work for your listening pleasure. The results are available on the Cogapp labs website. This is how we did it.

The results of our Hackday work

The source material

Our first step was to pick some suitable examples from the NLS online archive. We wanted content that had a good change of being parsed correctly, so handwritten scores with corrections and ink blots were out.

A visually arresting, but hard to read, example from the archive.

We settled on a volume named “Ancient music of Ireland”: this featured printed scores that did not appear too visually dense, as well as some fantastic song titles.

Our candidate for automated note extraction.

From screen to machine

Next, we needed to find suitable software for Optical Musical Recognition. Like Optical Character Recognition (OCR), the aim is to read information from an image and convert it to a form that a computer can understand and manipulate. This is very sophisticated software, and whilst there are a number of commercial offerings available we needed something that could be used with minimal effort for our prototype. This led us to the open-source project Audiveris.

Audiveris in action. Highlighting score components by type

Manually running some samples through Audiveris successfully identified the musical sections, and output them as MusicXML files. The software’s ability to pick out component parts of the image seemed very impressive; even identifying where there was more than one piece on a page and extracting these as separate files, complete with titles.

However, we still needed to check whether what the software was outputting in the MusicXML files matched up with what was on the page; difficult with no musicians amongst us! Thankfully Rachel from the NLS was on hand to confirm that the sequences of notes did indeed match what was on the score. Later were we able to render the XML back as a score and compare the results to the originals.

Source material above compared with extracted MusicXML as rendered by MuseScore

Turning it up to 11

Now that we had pulled the music out of the image, we were keen to hear what it sounded like. There were a couple of online services available that could play a MusicXML file as audio. Uploading our test examples to Soundslice gave us our first listen of this obscure music, and was the final component for our proof-of-concept.

You can hear a sample of the output directly at http://labs.cogapp.com/nls-omr/wavs/91387296.wav

Getting the band together

With the constraints of a 7 hour hack-day (including a pizza break!) we took a rapid prototyping approach by linking existing components together. Working with my colleague Steve Norris, and Rachel Nimmo from the National Library of Scotland, we used Python to script a workflow that would:

  1. Scrape the content from the NLS website
  2. Extract the scores from the downloaded images
  3. Convert the scores to audio
  4. Push the data into a search index
  5. Display the indexed data on a website

The website is lightly customised installation of Searchkit: a framework specifically designed for adding a web interface to an Elasticsearch datastore.

The complete technology stack used was as follows:

The results are available to browse on the Cogapp labs website. Songs are searchable by title (extracted automatically using OCR) and playable as WAV files.

Future improvements

We were really happy with what we managed to achieve in a single day, but if we were to take this project further there are some clear improvements that we could make.

It would be much nicer for users to represent complete pieces rather than having them split by page. However, this goes beyond the realms of what we could do with an automated workflow. We need to speak to potential users (musicians, musicologists, archivists) to find out the kinds of questions that they want to ask, and consider how the technology could best help them find what they want.

We could also improve the accuracy of the data extraction, and maybe crowd-source cleanup of the data. Would it be possible to enable cross-referencing with other collections held online?

A fully-featured embedded player would allow the speed and instrument used for playback to be adjusted, and an on-screen piano keyboard could enable a more contextual way to enter search terms.

Finally, we’d like to complete the original aim: to have a full musical search where people could type a sequence of notes into the search box, and to match this to notes within a given key of a score. This information is all encoded in the MusicXML, so in theory we just need a bit more work to extract these notes as strings of text, and allow Searchkit/Elasticsearch to do the rest!

As a hack-day project the aim is very much to test out the potential of a technology or idea and just build something, which we achieved. This prototype provides the foundations for something that could be a different interface for browsing and consuming a musical collection beyond a set of images on a screen.

Think this is something you’d be interested in exploring further? Have more ideas? Want to come to our next hack-day? Let us know in the comments below.

--

--