So I’m designing an experiment. I have an interface. I think it’s really neat, but would like to measure the best way to use this interface and try to quantify its neatness factor. I don’t want to tell you exactly what it is, because you might be a participant in my user study and that could muck around with the results. Suffice it to say that this is an interface for music search and browsing.
When I say it’s an interface for search and browsing, I don’t mean it performs a query. It is only a means to navigate a collection of music. How that collection of music has come into existence is someone else’s business. I just want to help people interact with a collection, in this case smaller collections. The idea is that someone performed some kind of query on the world of music and has a small (< 50 songs) set of songs that they want to traverse through in an efficient manner.
The Problem
I need a several sets of songs in order to perform my user study with my interface. Participants will be searching for specific songs and browsing for songs that fit a written description. As this is an interface for music search and browsing, I think that those sets of songs should be thoughtfully chosen.
I need
- 6 sets of heterogeneous songs.
- 10 sets of homogeneous songs so that there are 2 sets of a single “genre” for 5 “genres.”
- All sets needs to be unique and no song appears in more than 1 set.
- The sets have no order.
- There will be approximately 30 songs in a set. This may change slightly after some pilot studies, but it shouldn’t change significantly.
Heterogeneous songs are songs that are as different as possible in timbre and musical style. I want as little similarity between songs within a heterogeneous set as possible.
Homogeneous songs are songs that are as similar to the other songs in the set as possible. This includes notions of “genre” and timbre. I want songs that are similar in signal content and musicality.
I want to use songs that are from the Million Song Dataset. I want this to be reproducible research, and I want to use a dataset that will overlap with other studies. Plus, the 30 second audio clips are exactly the audio content I want for the study – I don’t want full songs.
So I want to know how do I choose my 16 sets of music from the million available songs? I don’t want to write a lot of code – I’m not interested in this selection as a research question. I just want to do it and have it be reasonable. I’d like to use a combination of the Echo Nest API and the sample code for the Millon Song Dataset, but pointers to other useful bits of code will be appreciated as well.
My two main questions are: how should I choose my song sets and what “genres” should be represented?
Well I don’t see any particular worry about which genres to use, you’d probably use some of the broad recordshop genre labels but try to bias it towards the types of music your participants are likely to be familiar with (if you happen to know anything about who your participants will be).
I would normally expect to use a straightforward random sample in a lot of cases, so for your ‘heterogeneous’ sets where you’re deliberately biasing it towards dissimilarity, make sure it’s something you can justify if you’re writing a paper.
One way to get dissimilar tracks could be to take a larger random sample, find the full matrix of echonest-similarities between all these tracks, and repeatedly eliminate one track of the most-similar pair. The converse of that could work to help make sure that within your genre things are as homogeneous as possible, after first having done a genre-tag query, I guess…