Here's an hour long teaser of the datadump I'll be unloading on this blog in the next few weeks. Listeners of my radio show know I've been obsessed with Hugo Keesing's Chartsweep of every #1 hit for too long now. After I made an homage to his work with a sweep of every #1 easy listening hit, I really saw the need in our society to have every large database of music cutup into tiny, bite sized pieces, but realized I didn't have the time to do it. So now that unix has turned 42 (happy b-day, big guy!!), I wrote a little patch using SoX to grab the hook from every mp3 in a folder and stich em together.
While pretty much everybody can identify and appreciate the hook from any individual song (it's the one part you remember of that song you hadn't heard in years!), definining the hook concretely is a tricky problem. It's usually the title of the song, but not always. It's usually at the beginning or end of the chorus, but not always. So when I wanted to teach a computer to grab the hook of a song I had to ask the deep question, what is the hookiness of these hooks?
With no real answer, my cheapskate workaround is that the hook of the song is almost always simply the LOUDEST part of the song. Super easy to teach a computer to do that, since an mp3 is nothing but a list of how loud sound is at a given time.
Yes, it's true, in this Darwinian yelling match that births the concept of a "hit song", there's no room for saving the hook for a quiet, subtle moment in the song. I'd go so far to say that any song that DOESN'T put the hook at the loudest part of the song is possibly communist. So even though these chartsweeps aren't nearly as pristine as Hugo Keesings, but they do get the point across pretty well. And as an added bonus, this program weeds out those songs that are immoral and unamerican.
I'll be posting a decade of chartsweeps every...eh, week or so, in the coming weeks. Stay tuned!
You mean Unix, not Linux, no? Linux was born in 1992, but the first release of Unix, which inspired Linux, was in 1969, not 1971, making it 42 years old this year, not 40. Just wanted to tidy up the misinformation, for those of us who might care.
Posted by: Nit Picker | September 24, 2011 at 05:45 PM
Fixed! Thanks for that. I thought I just read that it turned 40...
The more glaring error that I didn't talk about is that the maximum amplitude is not the loudest part of the song! Finding the loudest perceived sound in a song would require frequency analysis, which I'm not quite to yet...
Posted by: Nat | September 24, 2011 at 05:52 PM
I must respectfully take issue with one aspect of what is clearly an amazing piece of work, which is this:
although the hook is often the loudest part, sometimes the 'hookiest' of hooks is that part where everything drops out except for that *one amazing* bass/snare/vocal/guitar/random noise/whatever part
Posted by: blackgreen13 | September 25, 2011 at 08:40 AM
Identifying hooks sounds like a job for Mechanical Turk: http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk. You'd be amazed how little it costs to get thousands of people around the world to tell you where they think the hooks are. A simple statistical filtering of the results would probably give you what you're looking for.
Posted by: Nit Picker | September 25, 2011 at 10:20 AM
2nd song I hear in 1956 is a 1957 Elvis, followed later by more non'56 Elvis. Guess this stuff only works if you have the tunes tagged with the correct year.
Posted by: fred | September 25, 2011 at 11:10 AM
Hmmmm...Fred, you may be right. I am taking these mp3s from WFMU's mp3 library, which has the top 100 of every year by year. So perhaps those folders are incorrect. But also, sometimes it is the case that a song charts in a different year than it was released on record.
I will check out Mechanical Turk, nitpicker! Sounds interesting. Crowdsourcing is an interesting method of music classification - personally, I am more interested in signal analysis. It will be interesting to see which methodology leads to more powerful results in the coming years.
Posted by: Nat | September 25, 2011 at 01:50 PM
I appreciate this effort. Keep it up!
Posted by: Andrew | September 25, 2011 at 05:30 PM
Any way I can access a list to check out the ones that are driving me mad?
Posted by: Cheese Knees | September 26, 2011 at 01:42 PM
Well, if you prefer signal analysis methods, then there is a whole body of machine learning algorithms that might come in handy. It would be interesting to see how well, say, a classic neural network, once properly trained, could identify hooks. Even more interesting would be the messed up classifications that a neural network trained on a the wrong data set would yield. For instance, you could show a neural network what the hooks are in 100 songs from, say, the 1950s, then see how it would use that knowledge to identify hooks in songs from the 1990s, or vice versa. Or train it with music from one genre, and set it loose on a different one. The possibilities are endless...
On a vaguely related note, here's a quasi-lame demo of a friend's software that uses a machine learning algorithm to detect (with more or less accuracy) the genre of music to which a song belongs:
http://www.youtube.com/watch?v=NDLhrc_WR5Q
You'll find references to articles about the software at the end of the clip.
Posted by: Nit Picker | September 26, 2011 at 11:21 PM
How did spam (from a Kenyan domain) get in here? I'm impressed.
Posted by: Nit Picker | September 27, 2011 at 11:01 AM
Just if you care to fix it -- the "Jailhouse Rock" that comes up in '57 is a later live version by Elvis, not the '57 hit version.
Posted by: Ian W. Hill | September 27, 2011 at 02:40 PM
Nitpicker, I actually wrote an article for Rhizome on Tzanetakis!
http://rhizome.org/editorial/2011/jul/13/info-mining-george-tzanetakis/
Hmmm...wish I knew anything about neural networks. Machine learning is nuts. Not ready to tackle that stuff yet personally. I know that there are some superior hook identification softwares to mine out there, although I haven't had the chance to use them yet.
Posted by: Nat | September 27, 2011 at 08:48 PM
Cheese Knees, use google? haha...I'll post a list of the tracks if I can find the time.
Posted by: Nat | September 27, 2011 at 08:49 PM
Quelle coincidence, Nat! George is actually my supervisor in a computer science grad school program in Victoria, B.C. He's a great guy, and a good musician too. How did you become aware of him and his work?
Posted by: Nit Picker | September 28, 2011 at 12:35 AM
What I find interesting in the comments here are that nobody says whether they remember any of these songs from the time they were popular. The odds are 55 years ago these people weren't even born yet and wouldn't know the "hook" (or the title) of most of these vintage songs or who recorded them!
I'm 65 years old and can remember most of these tunes from the years they came out. A lot of childhood memories come flooding back.
I don't think any computer can be trained to identify (on it's own) a "hook" in a song by either sound levels, frequencies,or any other means short of telling the computer the answer. An instrumental song would really confuse a computer.
Posted by: Rich | September 28, 2011 at 11:07 PM
Definitely not a perfect way to distinguish hooks, but it's interesting nonetheless to hear the loudest sections of each song. As anyone knows, music has its exceptions. For example, The Shangri-Las' "(Remember) Walking in the Sand" (one of my favorite 60's songs) has hook(s) that are quieter than the verses.
On a similar note dealing with #1 hits: a few years back, I wrote a blog series about a research project I completed that analyzed the keys of all the Billboard Hot 100 #1 songs from the rock era (55-present [2009]) in an attempt to trace the trends that keys had upon each decade of music. I noted the most popular keys and least popular keys, key changes, as well as many other statistics. There were definitely some interesting trend changes throughout the spans. If you'd like to check it out, here it is: http://gigdoggy.wordpress.com/the-key-to-music-research-project
Posted by: Robert | September 29, 2011 at 11:15 PM