Crowdsourcing song identification

Some friends of mine are pulling together a project for crowdsourcing identification of a large collection of music clips. At least a couple of us are professional software developers, but I’m the one with the most free time right now, and it fits with my library background, so I’ve become lead developer. In talking about it, we’ve realized it can be useful to librarians, archivists, and researchers, so we’re looking into making it a crowdfunded open source project.

A little background: “Filk music” is songs created and sung by science fiction and fantasy fans, mostly at conventions and in homes. I’ve offered a definition of filk on my website. There are some shoestring filk publishers; technically they’re in business, but it’s a labor of love rather than a source of income. Some of them have a large backlog of recordings from past conventions. Just identifying the songs and who’s singing them is a big task.

This project is, initially, for one of these filk publishers, who has the biggest backlog of anyone. The approach we’re looking at is making short clips available to registered crowdsource contributors, and letting them identify as much as they can of the song, the author, the performer(s), the original tune (many of these songs are parodies), etc. Reports would be delivered to editors for evaluation. There could be multiple reports on the same clip; editors would use their judgment on how to combine them. I’ve started on a prototype, using PHP and MySQL.

There’s a huge amount of enthusiasm among the people already involved, which makes me confident that at least the niche project will happen. The question is whether there may be broader interest. I can see this as a very useful tool for professionals dealing with archives of unidentified recordings: folk music, old jazz, transcribed wax cylinder collections, whatever. There’s very little in the current design that’s specific to one corner of the musical world.

The first question: Has anyone already done it? Please let me know if something like this already exists.

If not, how interesting does it sound? Would you like it to happen? What features would you like to see in it?

Update: On the Code4lib mailing list, Jodi Schneider pointed out that nichesourcing is a more precise word for what this project is about.

3 responses to “Crowdsourcing song identification

  1. My first thoughts were “MusicBrainz” and “AcoustID”, although upon reading the former is more suited towards published music. However, you could create a fork of MusicBrainz and AcoustID and start a fresh database with limited access to the audio stored elsewhere. Many popular(ish) works are available in MusicBrainz, and it’s open for edits.

    • The MusicBrainz server might be worth looking into, perhaps as a back end to a custom web application. The documentation all talks about connecting to “the MusicBrainz database,” so I don’t know how much work there might be in making it work with a private database. AcoustID doesn’t sound like it fits in; the FAQ specifically says it doesn’t work with short audio snippets or low-quality recordings. Thanks for the suggestions in any case.

  2. I can’t help with development, but when it gets to that stage I will certainly be a contributor.