Exploring the Existence of "Sound of an Image"

Ashique KhudaBukhsh

Due Date: 10.00am on Thursday, December 15, 2011

Introduction

I always wondered whether for an image, there exists a particular sound. A sound that the image can call its own -- "the sound of an image". In order to explore this idea, in this project, I have framed my curiosity as an exploratory study where I am analyzing the effect of sound (in this case, song) on different subjects' preferences on a pair of images. We are playing a song and showing the subject a pair of images. We are asking the subject to select the image that she thinks suits better with the song. Simply put, I am trying to explore to what extent different human subjects agree on the association of a given image with a given tune.

I understand that this study can also be seen as "image of a song" rather than "sound of an image". But imagine how hard it will be to design a study where a subject will be shown an image and played multiple sounds and then asked to match the image with any of those sounds. The very temporal nature of sounds makes it hard to compare, whereas, comparing among two images by putting them side my side is much easier.

The second thing that marvels me is to know what happens in our head when we listen to a song or see an image. If we really feel that a particular song goes extremely well with a particular image, does our neural activity give some hints on our preferences? To explore this question, in this project, I have made some progress in recording the EEG responses of subjects while they see the images and listen to the songs separately.

User study format

The study has two components. The first one is a music-image matching study; the second one is EEG response recording. The music-image matching study is designed as follows. For a given song, we present a pair of images side by side. The subject is asked to choose one image that she thinks suits better with the song. For every song, we construct a tournament of images. This means, the selected images in the first round are paired among themselves, and again, the subject is asked the same question with these image pairs. We designed our image selection in a tournament format for the following reasons. First, it is always easier to select an image between a pair than selecting one from a set of images, especially, when the set is big. Secondly, this procedure promises to give more stable overall winners as the winner image will have to win through many rounds. Finally, this multiple level selection process gives us a wonderful opportunity to explore questions like -
a) are there images that most users selected as the finalist for a particular song
b) are there images that were a finalist for different songs.

The music-image matching software is written using Windows Presentation Foundation (C# code-behind, XAML for UI).

In our second component of study, we show the subject each image six times for 10 seconds. Between two images, we show a blank image for 7 seconds. For music, we play each 30 seconds song clip six times. Between two successive music playing, we take a gap of 10 seconds. During this entire process, we record the EEG response of the user using a EEG headset.

For our music-image matching study, we performed the study on 25 subjects. Unfortunately, within this time, we could only run our EEG component on three subjects.

Data

We used fifteen images selected from artchive.com. These images are painted by noted abstract painters like Henri Mattisse, Jackson Pollock, Pablo Picasso, Salvador Dali, Wassily Kandinsky, Franz Marc, Paul Klee, Kurt Schwitters and Marc Chagall. For the music, we have downloaded fifteen songs from All Music Guide.

A Collage of Sample Images

Results and Analysis

To summarize our results, I would like to show few interesting plots. Each plot is for a given pair of images and the vertical line indicates the songs. Each plot summarizes the preference results of all subjects for that particular image pair on every song in the dataset. Along the vertical axis, each bar summarizes the preference of all subjects for a given song. We need to interpret these graphs in the following way. The middle vertical line indicates that 0.5 fraction of the total number of subjects chose either of the image pairs. The blue bar shows what fraction of the total number of subjects above 0.5 preferred the right image, and the green bar shows the same measure for the left image (in negative values).

The graph above shows that on one hand, there exists song #4 where most of the subjects preferred the right image. On the other hand, there also exists song #10 where most of the subjects preferred the other way. To summarize, this graph shows that for a given image pair, there were many songs where the subjects showed a strong inclination to select one image over another.

Interestingly, here is an example where for almost all songs, most subjects chose the right image over the left one.

For me, a possibly explanation could be that for this example, the subjects actually found the right image to be much more expressive than the left one and could not associate any sound with the left image.

If the distribution is very close to the mean, two possible things can be interpreted. First, may be, none of these images has any sound, which made all subjects totally clueless. Second, possibly, both images had very similar sounds which confused the selection of the subjects.

In this case, I think the second thing happened for the image pair has certain "similarities" in the painting style, the images are segmented in blocks in a similar way which made it very difficult to choose between them.

Challenges are fun!

One of the biggest challenges in this project was time estimation. Designing the tests, getting started with EEG experiments, then finding subjects amid the crazy term-ending situation for all my friends, and then finally, analyzing data -- time wise, this project was very hard. The fun thing was this was my first stab at doing a user study of an intriguing question. And I am very excited because I will be continuing in this research direction in my next semester, with analyzing the EEG responses I will continue gathering.

Future work

Analyzing the EEG data and exploring how useful the EEG responses are in terms of predicting a subject's preference.
An interesting direction can be considering the timing information while the music-image matching software runs. This will give us a way to know which image pair and song combination makes it hard for us to decide.
Also, I feel that using background scores of movies will be far more interesting as
a) for a short clip, most of these background scores emphasize on a single mood
b) these scores are designed keeping a particular scene in mind, so a considerable amount of thinking in visual terms goes into making these scores.
c) I don't know whether I am explaining it well, but I feel music without words connect to us in a far more abstract way, which makes it more intriguing.
Finally, as Alyosha rightly points out, we need to start looking at the pixels, and the notes.

Acknowledgement

I express my heartfelt thanks to Prof. Alyosha Efros for teaching such a wonderful course that opened the world of images in front of me. His constant encouragement for creative thinking was intrumental behind taking up this exciting project.
I express my heartfelt thanks to Prof. Tai Sing Lee for letting me use his EEG equipments, test suite, and for his inputs on this project.
I express my heartfelt thanks to Prof. Roger Dannenberg for his inputs on this project.
I express my heartfelt thanks to Dan Howarth for helping me in getting started with the EEG equipments and test suite.
Finally, thanks to Natasha for being such a wonderful TA, and who can forget to thank all my wonderful friends who volunteered for the tests. Without them, this project couldn't have been possible.

Exploring the Existence of "Sound of an Image" Ashique KhudaBukhsh