Culture Hack Day

This is now a bit overdue, but three weeks ago I attended the Culture Hack Day.

My hack used the data made available by the UK Crafts Council.  Just before the last general election, they created a website asking the public “Why does craft matter to you?” and provided a space for people to submit their answers.  While some of this data is exposed on the website, most of it is not.  As there has been over 1000 submissions, I wanted to create a new way to explore the responses that focused on the content and not external data such as the city of the person making the submission.

The result can be seen in the video.  When you visit a website, a submitted quote about why craft matters is shown.  You can then click on any word in that quote and a new quote will appear which also uses that word.  If there isn’t another quote with that word, the same quote is shown.

The hack is built using the Python library xlrd to read in the Excel spreadsheet containing the Craft Matters data into a Python dictionary. The nltk library is used to tokenize the words in each of the quotes. A Cherrypy server then randomly serves up a quote with each word being a link to a search for another quote containing that word. A little css makes it pretty, and that’s it. All of the code (but not the xls files with the data) is available here.

The calculus of cleggeron

Cameron gave a speech recently in the hipster headquarters of the Old Truman Brewery in Shoreditch.  There must be some irony in this.  The Truman Brewery was an East London brewery which was started in the 1660s and brewed on site until it went out of business in 1988 leaving behind massive empty buildings in pre-gentrified Shoreditch. The buildings were transformed into a mixture of artists’ studios and dotcom offices and are now at the heart of the Brick Lane scene.  A scene which now hosts a large portion of the creative industries of London.

The irony is that this regeneration of one of the poorest neighborhoods in London was made possible through the arts, an area that cleggeron has massively cut in their budget.  (We can argue another day whether gentrification actually benefits any of the original residents.)  The point is more that Cameron chose to give a speech about brining economic growth to East London from a building which symbolises the power and potential of such growth, but did so through a revenue stream that the current government sees no merit in.

However, I found the content of Cameron’s speech far more irritating than its venue.  It spoke of East London (in this case meaning roughly Old Street Station to the Olympic site) becoming a new Silicon Valley.  I very much support this.  If you take a stroll around the Shoreditch triangle, through the Old Street circle and up Kingsland Road or further east down Bethnal Green Road you will pass a great number of high-tech and creative industries that call East London home.  Extending  this area towards the Olympic site (and consequently straight through Queen Mary, University of London where I work) only makes sense.  The problem I have is with the details of visas and foreign talent.

To get what is called a Tier 1 (General) visa (a non-sponsored visa which allows you to work fulltime), you need to prove you can speak English, are educated to some degree, and have some amount of income.  You can read all about it here. (There are other Tier 1 visas such as the post-study work visa.)

In short: you need 100 points if you are in the UK not on a Tier 1 (General) or are outside the UK.  You get awarded points the following ways:

  • 20 points if you are under 30 years old (nothing if over 40)
  • 45 points for a PhD (35 if Master’s is highest degree or 30 if Bachelor’s)
  • 10 points for English language skills
  • 10 points for a minimum amount of  savings
  • 5 points if you studied for a least a year at a UK higher education institution and earned a degree

Additionally there are caps on the number of these visas they will give out.

For income requirements, you need to have a p.a. income of £30,000-£34,999, or the equivalent in another currency as determined by the UK Border Agency. This assumes you have a PhD and are under 30 plus other things. There are proposals to limit this even further though and limit the minimum income to £45,000. This would be detrimental for university researchers, particularly the masses of postdocs and technical staff with salaries far below the professors and lecturers. See as an example this series of job adverts, most of which require a PhD. The salary range for the jobs is £28,983 – 35,646 p.a. plus London Allowance of £2,795 p.a. If a limit of £45,000 is introduced, it would be more difficult for university researchers to move into entrepreneurial roles since their previous incomes will not earn them a non-sponsored visa.

This all hits very close to home as I am an American who just finished a PhD at Queen Mary, University of London. My research area is exactly the kind of thing this government would like to have help grow the British economy (I do digital music signal processing, computer science and electronic engineering). I am now working a series of short research contracts and will be getting a visa designed for people in my position – a Tier 1 post-study work visa. This visa is good for 2 years and after that I need to apply for another visa. However if I stay within university research, I have very little hope of qualifying for a full Tier 1 (General) visa if the requirements keep increasing the needed previous earnings.

This cannot truly be the intent of the British government. How can excluding non-EU nationals with the talent and education to grow their new Silicon Valley be justified by nothing more than a lack of a qualifying salary from a British educational institution?  At least there are loopholes to make sure foreign footballers still get visas without a problem, otherwise that would just be embarrassing.

A minor irritation

Apple’s HCI Guidelines have long been a document to refer to when designing the GUI for a new piece of software. I like their drag-and-drop tools for creating Cocoa applications. The little helper lines really can help create a more pleasant-looking piece of software.

So why does Apple violate its own rules with iTunes? I don’t know why this irritates me so much, but it irks me every time I open iTunes. This is an excerpt from their guidelines of what a generic window should look like:

And this is what the iTunes window looks like:

Notice the difference in the max/min/close buttons?  Why would you make this change? (And I don’t know when this change was made, I only noticed it in iTunes 10, perhaps it’s been like this forever.)

When you train the users of your OS to expect certain functionality to be consistently placed within an application – and then even publish why and how that should occur – why break your own rules? I just don’t see any clear benefit (and “cleaner lines” at the sacrifice of usability is not a benefit).

My open source morning

I am currently at the Grace Hopper Celebration of Women in Computing. The conference is a gathering of about 2100 technical women (and some men) with a mixture of technical content and soft skill/career-building workshops. While there are about 900 students in attendance, there are also numerous heads of academic and industrial research labs.

Before getting into the open source track sessions I attended, I have to address the keynote by Duy-Loan Le. It was completely amazing and only wish it was longer. It comparable to a talk I heard from Maya Angelou in its eloquence and inspiration.

After the absolutely amazing keynote, I attended two panels in the open source software track. The first one addressed was an introduction to open source development the surrounding community. The moderator was Natalia Vinnik, Yahoo! Inc. and the panelists were Sara Ford from Microsoft/CodePlex,Olga Natkovich from Yahoo! Inc./Apache/Hadoop and Stormy Peters, Executive Director of the GNOME Foundation. The main points touched on were:

  • There was a very brief declaration that “free” and “open source” are largely interchangeable terms and it’s best to observe community practice. This elicited a lot of muttering and disagreeing from the back of the room, but there wasn’t a lengthy discussion.
  • Education support and resources for teachers are available. I didn’t catch the urls, but GNOME has some resources.
  • There are many corporations involved with foss development and support such activities. This includes Microsoft, Yahoo!, and Google. It is a strong resume builder.
  • A huge draw to foss development that people like collaborating and the projects produce high-quality code.
  • The main skill needed is English. There are lots of non-coding opportunities such as bug reporting, logo design, usability improvements, etc.

The second panel addressed how foss can be a career path. The moderator was Cat Allman, Program Manager at Google, and the panelists were: Jenny Han Donnelly, Senior Engineer, Yahoo! Inc.; Margo Seltzer, Associate Professor of Computer Science, Harvard University; and Sarah Sharp, Linux Kernel Hacker, Intel.

  • **See code.google.com/opensource tomorrow for a big announcement**
  • Just today there were 593 job postings with the keyword open source on monster.com.
  • Community development teaches you communication skills, particularly when interacting with people with less than polished social skills.
  • Jenny comes from front-end development, so is used to view-source, open source was a natural progression (YUI)
  • Lurking is encouraged, excellent learning tool
  • Margo’s research group builds a lot of system software, likes people with foss experience.
  • Women are too hard on ourselves, need to be ok with making mistakes and with those mistakes being public.
  • Women often expect an invite, foss doesn’t invite but expects volunteers; women just need to be aware of this
  • Mistakes can be great in foss because more people are there to identify and help correct them.
  • Any environment where they treat you like an adult and allow you to be flexible will create better work.
  • READ LICENSES
  • Understand the difference between free speech and free beer.
  • Open is not a defined business model but can be used with different models.

Sleepycat was offered an example of how a career can be built off foss. It was profitable from day 1, never took outside investment and only product was open (Berkley DB). When it started getting popular and more features were requested and payment was offered.

Q & A brought a question from some Raytheon employees about whether foss can be secure or just be riddled with malicious content. The response was “security through obscurity is no security at all.” Emphasizing that proprietary software does not mean security. This didn’t seem to be completely accepted by the Raytheon employees.

What I really wished I had asked during the sessions is how academic research can be best be moved into foss development. I think of this as I have thousands of lines of Python and Java sitting on my machine from my PhD work. My research group has a new project starting to try and address this, Sound Software. It’ll be interesting to see where it leads.

21 Ways to Listen to Your Music

There may be 85+ ways to visualize your music, but there are also 21 ways to listen to it. This is the first of what will be a series of posts summarizing some of the content of my PhD thesis. My whole thesis will eventually be available here after my viva.

I’ve gathered a list of interfaces that help explore a collection of audio. The primary focus is on interfaces which use spatial audio, but I have included some interfaces with interesting design ideas that don’t use spatial audio though use audio playback in an innovative way. The earliest work focuses on efficiently browsing speech signals while later work focuses on presenting multiple music streams at the same time, usually by sonifying a 2D arrangement of music. Some of the most recent work looks at using virtual 3D worlds to explore music as an individual or collaboratively. Continue reading

Ada Lovelace Day

It’s Ada Lovelace Day , a day of blogging about women in science and technology. I’m afraid this is not a very thorough post, things like a PhD thesis and various papers are occupying my time, but I still wanted to point out a female influence on my work.

In my fantasy spatial audio conference/dinner party I would like to strike up a conversation with Elizabeth Wenzel. (My fantasy conference/dinner party would also include Barbara Shinn-Cunningham and Sally Jo Cunningham.)

There are not a lot of women that lead major research groups in anything related to audio; there certainly aren’t many women in spatial audio research. Elizabeth Wenzel first stood out to me in her publications; I like her writing style. She then stood out even more so when I learned she is the director of a research lab at NASA-Ames. I’ve read and referred to her work for a couple years now, but I don’t know much about her beyond her NASA personnel pages here and here.

She and her research lab have done significant work in spatial auditory display, especially display using binaural audio. I appreciate the clear research questions that are asked and then answered. I think they are good examples of rigorous basic scientific research. Her lab is the primary (if not only) US government funded organization I would be interested in working with.

Map to Santa's House

For the DMRN day in a little over a week, I’m using a Christmas-themed data set for my audio collection browser demo. Mostly because this is a workshop 3 days before Christmas.

I’ve grown quite frustrated with automatic arrangements of songs, at least based on content analysis. Metadata-based techniques can be better, but you run into problems with long-tail content and it’s difficult to arrange individual songs, you’re usually only getting reliable results at the artist level. Audio-only interfaces are very fragile; they need to be robust and intuitive or users get confused. This means that it needs to be obvious why songs are where they are. So since I’m using a small handpicked set anyway, I’m just going to arrange them by hand as opposed to automatically by some algorithm.

These are the 44 tracks (with SoundCloud or last.fm links where possible):
Blue Christmas – Elvis Presley
Carol of the Bells – uploaded by stretta on SoundCloud
Cha-Cha All the Way – Capitol Studio Orchestra
Chilly Winds Don’t Blow – Nina Simone
Christmas Is – Lou Rawls
Christmas Kisses – Ray Anthony
The Christmas Song – Mel Torme
The Christmas Song – Nat King Cole
Christmas Trumpets/We Wish You a Merry Christmas – Ray Anthony
(Everybody’s Waitin’ For) The Man with the Bag – Kay Starr
Frosty the Snowman – uploaded by Jofro on SoundCloud
God Rest Ye Merry Gentlemen – Jimmy Smith
Good Morning Blues – Count Basie
Holiday on Skis – Al Caiola and Riz Ortolani
I Am Blessed – Nina Simone
I Saw Mommy Kissing Santa Claus/Jingle Bells Bossa Nova – Eddie Dunstedter
I’d Like You for Christmas – Julie London
I’ll Be Home for Christmas – Elvis Presley
I’ll Be Home for Christmas/Baby It’s Cold Outside – Jackie Gleason and Jack Marshall
I’ve Got My Love to Keep Me Warm – Billie Holiday
I’ve Got My Love to Keep Me Warm – Dean Martin
If Everyday Was Like Christmas – Elvis Presley
It’s Christmas Time, Pretty Baby – Elvis Presley
Jingle – uploaded by DJD MUSIC on SoundCloud
Jingle Bells – Johnny Mercer
Jingle Bells/Jingle Bell Rock – Hollubridge Strings
Last Christmas – uploaded by amysue on SoundCloud, by Amy Subach
The Nutcracker Suite – Les Brown & His Band Of Renown
Ring Those Christmas Bells – Fred Waring & The Pennsylvanians
Rudolf the Red-Nosed Reindeer Mambo – Alvin Stoller
Rudolph the Red Nose Reindeer – uploaded on SoundCloud by nickmcIntyre
Run, Run Rudolph – uploaded on SoundCloud by rnbmama
Santa Claus is Coming to Town/White Christmas – Jimmy McGriff
Silent Night – Dinah Washington
Silver Bells – uploaded by amysue on SoundCloud, by Amy Subach
Toys for Tots – Peggy Lee
What a Wonderful World – Louis Armstrong
What Are You Doing New Year’s Eve? – Ella Fitzrgerald
What Are You Doing New Year’s Eve? – Nancy Wilson
White Christmas – uploaded by amysue on SoundCloud, by Amy Subach
White Christmas – Elvis Presley
Winter Wonderland – Peggy Lee
Winter Wonderland – Shirley Horn
‘Zat You, Santa Claus? – Louis Armstrong

What I really like about this collection is that it’s music that I’d enjoy to discover. Yes, there are a ton of terrible (Christmas) songs, but why would you want an interface to “discover” them?

So how would you arrange this music? What would make sense to you? I’m thinking something like one dimension being upbeat songs to slower, low key songs versus a second dimension of large orchestrated pieces to smaller instrumentations. So the four corners of the above list would be something like: What a Wonderful World (big instrumentation, low key song), a guitar/vocal version of White Christmas (small instrumentation of low key song), Jingle Bell Rock (big instrumentation of upbeat song), and guitar/vocal Rudolph the Red-Nosed Reindeer (small instrumentation of upbeat song). But there are also a number of different version of the same song and common styles, like latin beats or big band or only instrumental, that could instead influence the arrangement.

Changing the arrangement of the songs is not difficult and can be done up to the day before DMRN, so let the suggestions flow. How would you arrange these songs in two dimensions?

A Room of My Own

Right now in my literature review I’m interested in why researchers want to create virtual acoustic environments. It’s not difficult to find incremental research improving a particular model or testing the psychoacoustical limits of a technique, but it takes more effort to determine why the researcher cares. I’ve found several common motivations and have highlighted some key examples, though I add the disclaimer that much more has contributed to the field than is mentioned here.

Architects and Those That Hire Them

Big money is invested in new performance spaces and investors like to have a guarantee that their money will be well spent. Initial auralization work was done without the assistance of computers. Acousticians would make tiny scale models of spaces and study how sound waves would travel around the model in hopes of extrapolating what would happen in full-sized rooms. This work is now done with computer models by studying the physics of how sound travels and interacts with objects. Two of the major software packages used are CATT-Acoustic and Odeon. Though computers can create very complex models to precisely predict how sound will move throughout a space, the limitation is that the sound cannot be rendered in real-time. Moving and listening through a space cannot be modeled in an interactive manner. CATT-Acoustic is addressing this issue by looking at rendering the data required to compute the audio offline so that some amount of movement can be approximated. However, the approach they are taking, computing a large number of impulse responses calculated by the CATT-Acoustic software, requires a large amount of data storage.

Education and Posterity

The field of archaeological acoustics has grown in recent years as researchers have began to discover similar aural phenomenon across multiple archaeological sites. The questions then emerge: did previous civilizations have an understanding of acoustics; were these acoustic phenomena intentional design features; did these phenomena have a direct impact and role in the society such as in religious ceremonies? (The third chapter in Spaces Speak is an excellent reference on the subject.) Engineers approach these questions by meticulously measuring the spaces so that the spaces can be auralized and further studied.

More recently, audio engineers have acknowledged a need to preserve and record spaces of significance such as famous concert halls. Angelo Farina (see this paper in particular) and Damian Murphy have been two of the researchers actively trying to accurately capture and then model acoustic spaces of some historical note.

I attended the Audio, Acoustics, Heritage Workshop in 2008 which addressed a lot of these issues. I was particularly interested in the work presented by Gui Campos from the University of Aveiro in Portugal. The Painted Dolmen (Anta Pintada) is a neolithic site in Portugal with fragile paintings which have already been significantly damaged in previous archaeological excursions, so it is not open to the public. The Portuguese government wanted to create educational tools so that the public could still experience the heritage site without causing further damage. This seems to be an increasingly popular enterprise for governments, both the British and Italian governments have funded similar projects.

Researchers from the University of Aveiro used a laser scanner to precisely measure the space and then model it for virtual reality simulation. Though the data existed to create a complex, detailed model of the space, it could not be auralized in real-time, so a simplified model was instead implemented. A similar application was developed for an archaeological park in Italy using GPS and custom software for mobile phones (see the paper for details). The researchers found that including sounds to recreate the soundscape was well-received by the students that tested the system. However, even though they have 3D models of the ruins, they did not use any auralization, real-time nor previously rendered.

Entertainment

Interactive 3D environments are becoming increasing common and complex for the average consumer as video game hardware advances. A PS3 and a 5.1 surround sound systems trumps most research setups of the past decade. An enclave of the industrial and academic research lab is the CAVE. CAVEs are immersive visual environments that can use loudspeaker or binaural (headphones) technology for audio playback and usually have projected images that encompass an entire room. There are a number of applications that have been developed for CAVE-like environments. You can find a description of several different applications here.

The Acoustics research group at the Helsinki University of Technology developed at system at the turn of the century called DIVA (Digital Interactive Virtual Acoustics). It models performing musicians and allows a listener to move virtually around them while listening to their performance. The major compromise in such systems is accuracy for interactivity. It is deemed more desirable to have an immersive, engaging virtual system which only approximates a space that might exist in reality rather to be hung up on details and causing longer processing times. This is approach taken in all video games: perceptual approximation overrides absolute reality.

What Conclusions Can We Draw?

Applications using virtual acoustic environments are being developed for differing end-users with priorities ranging from high-precision acoustic recreation with a lesser focus on interactivity to a large focus on interactivity at the expense of accurate acoustic models. In between is the emerging field of edutainment which hopes to use the interactivity of virtual environments to attract and teach students about specific acoustic environments. The signal processing is falling short though. While great advances are being made in auralizing 3D models of spaces, complementary technology has not been sufficiently developed to aid in the real-time interaction with this data.

A visual parallel is computer animation. Feature-length films are created in non-real-time by the computers that are rendering the images as opposed to video games which require the hardware to produce images as the player moves in the game. The visuals in video games do not look as good as movies, but they are quickly approaching that quality as the hardware improves. The same is true of virtual acoustics, high-quality audio can be rendered offline, but it is only a matter of hardware in order for real-time, interactive audio of the same quality to be generated.

For the time being, clever algorithms need to decrease the need on heavy processor loads and large amounts of data so that high-quality, interactive audio can generated on mobile devices. A significant portion of my PhD work looks at efficiently summarizing and interacting with a large database of impulse responses, the data that generates the audio of a 3D model, so that lightweight applications can be created without compromising the audio quality of the original 3D model. I am also looking at clever ways of compressing the data so that less storage is required.

Knitting in Osaka and Kobe

This is a post for a future google search for Kobe, Osaka, and knitting, a search I made a couple weeks ago. The most useful hit for me was this blog post. I found this store in Osaka because of the excellent directions. I bought Noro Kujaku, a now discontinued yarn for ¥780 (currently about £5 or $8) and a yarn I can’t find in ravelry or online elsewhere for ¥280 (£2 or $3) . The second yarn was knit up into a furry hat that is extremely popular in Japan at the moment. I indicated I liked it and the woman working in the shop insisted I take the pattern.

The real find was in Kobe, however. Doi Shugei is a very well-stocked general craft store with the basement floor full of yarn. They had a number of Japanese brands I haven’t seen elsewhere. They seemed to be clearing out their stock of Noro, so I picked up Noro Kureyon for ¥515 (about £3 or $6) and Noro Silk Garden for ¥672 (£4 or $7). The shop is in one of the large arcades near Sannomiya station. They have an English translation of the address on their site.

There are also a large number of fabric and yarn shops beneath Sannomiya station. Walk towards the city away from the station under the rail lines, the shops are across the street from a block of nothing but Pachinko. They seem to be only visited by the locals and I doubt you’d find any English, but you might be able to find a good bargain.

You want the third song on the left.

Researchers working with large collections of music really really like to take those collections, stare at them cross-eyed and then push them onto a two-dimensional map of some sort.  If you don’t believe me, go here for 30 seconds and then we can continue.

A lot of the resulting visuals are beautiful and can be informative. Some look like 1994, but we’ll just assert that’s an intentional retro design aesthetic that was carefully perfected.

The thing is pretty pictures can only serve so much purpose. You’re still dealing with music and music needs to be heard. I’m interested in how interactive audio, primarily but not exclusively spatial audio, can be used to enhance or even replace some of these elaborate visuals.

One tool I’m currently messing around with is spatial databases for fast querying of spatial data.  I’ve just finished setting up a postgresql database  that is spatially-enabled (everyone loves enablers) with postgis.

I have a collection of about 50,000 tracks that have been analyzed by SoundBite. As a part of his MSc research at QM, Steve Lloyd took the feature vectors computed by SoundBite and arranged the 50,000 tracks on a 2D map. He used Landmark Multidimensional Scaling (LMDS) with 1000 randomly selected landmark points. He also fixed the maximum track-to-track dissimilarity to a ceiling to prevent outliers from ending up too far from the main group of tracks. So in summary, I have a collection of 50,000 songs, all the associated metadata, the audio content, and (x,y) coordinates placing each song on map.

So, a short tutorial on how to create a spatially-enabled database:

  1. Install a database and the required dependencies.  I’m using postgresql with postgis which requires Proj4 and  GEOS.  But there are other options.
  2. Create the database and set it up for spatial queries. At the command line:
    createdb [yourdatabase]
    createlang plpgsql [yourdatabase]
    psql -d [yourdatabase] -f postgis.sql
    psql -d [yourdatabase] -f spatial_ref_sys.sql

    If you used macports, you will find the sql files in /opt/local/share/postgis.

  3. Create table to hold any metadata.  In psql:
    CREATE TABLE mytable(id integer, track_name text);
    Note the semicolon, the statement won’t execute without it.
  4. Add a spatially-enabled column to hold the coordinate information.
    SELECT AddGeometryColumn('', 'mytable', 'mygeocolumn', -1, 'POINT', 2);
    The arguments are:

    • The schema name, we don’t use one here and could have used only 5 arguments.
    • The table name.
    • The new column’s name.
    • The SRID, a reference to mapping system to use, -1 means we aren’t using a pre-defined system.
    • The type of geometry.
    • The number of dimensions.
  5. Now to start inserting data into the table. This can be done row-by-row by through the insert command:
    INSERT INTO mytable(id, name, mygeocolumn)
    VALUES (1, 'First Song', ST_GeomFromText('POINT(0 5)', -1);

    ST_GeomFromText will create the proper datatype. Note that there is only a space between the two coordinate values.

  6. It’s likely that you don’t want to type out every individual track.  In my case that’d be 50,000 entries to process by hand.  Instead use the copy functionality by creating a text file describing the tracks and then copy that file into the table.  This has the benefit of letting postgres check things like primary keys so that the table isn’t half-built when an error is encountered leaving you with half a table.  Instead it’s all or nothing.  It’s fairly simple to create a text file according to whatever delimiter you’d like; it’s all well documented.  However, I had a problem trying to import geometry data.  This is the fix I found.  I make no guarantees that it won’t break something, but it has worked out for me so far.  In the text file, define the Points as:
    SRID=-1;POINT(6 10)
    Then to copy the text file into a table using a tab as the delimiter:
    \COPY mytable FROM '/the/path/to/myfile.txt' WITH DELIMITER E'\t'
  7. To take advantage of the postgis, you need to compute the spatial indices.
    CREATE INDEX myindexname ON mytable USING GIST(mygeocolumn);
    VACUUM ANALYZE mytable(mygeocolumn);
  8. Query away.  See the PostGis docs for tips on efficient querying, but here’s an example of finding all tracks that are within 10 units from (3,4).
    SELECT ST_AsText(mygeocolumn) AS location FROM mytable
    WHERE ST_DWithin(mygeocolumn, 'POINT(3 4)', 10.);