I always wondered whether for an image, there exists a particular sound. A sound that the image can call its own -- "the sound of an image". In order to explore this idea, in this project, I have framed my curiosity as an exploratory study where I am analyzing the effect of sound (in this case, song) on different subjects' preferences on a pair of images. We are playing a song and showing the subject a pair of images. We are asking the subject to select the image that she thinks suits better with the song. Simply put, I am trying to explore to what extent different human subjects agree on the association of a given image with a given tune.
I understand that this study can also be seen as "image of a song" rather than "sound of an image". But imagine how hard it will be to design a study where a subject will be shown an image and played multiple sounds and then asked to match the image with any of those sounds. The very temporal nature of sounds makes it hard to compare, whereas, comparing among two images by putting them side my side is much easier.
The second thing that marvels me is to know what happens in our head when we listen to a song or see an image. If we really feel that a particular song goes extremely well with a particular image, does our neural activity give some hints on our preferences? To explore this question, in this project, I have made some progress in recording the EEG responses of subjects while they see the images and listen to the songs separately.
The study has two components. The first one is a music-image matching study; the second one is EEG response recording.
The music-image matching study is designed as follows. For a given song, we present a pair of images side by side. The subject is asked to choose one image that she thinks suits better with the song. For every song, we construct a tournament of images. This means, the selected images in the first round are paired among themselves, and again, the subject is asked the same question with these image pairs. We designed our image selection in a tournament format for the following reasons. First, it is always easier to select an image between a pair than selecting one from a set of images, especially, when the set is big. Secondly, this procedure promises to give more stable overall winners as the winner image will have to win through many rounds. Finally, this multiple level selection process gives us a wonderful opportunity to explore questions like -
a) are there images that most users selected as the finalist for a particular song
b) are there images that were a finalist for different songs.
The music-image matching software is written using Windows Presentation Foundation (C# code-behind, XAML for UI).
In our second component of study, we show the subject each image six times for 10 seconds. Between two images, we show a blank image for 7 seconds. For music, we play each 30 seconds song clip six times. Between two successive music playing, we take a gap of 10 seconds. During this entire process, we record the EEG response of the user using a EEG headset.
For our music-image matching study, we performed the study on 25 subjects. Unfortunately, within this time, we could only run our EEG component on three subjects.
We used fifteen images selected from artchive.com. These images are painted by noted abstract painters like Henri Mattisse, Jackson Pollock, Pablo Picasso, Salvador Dali, Wassily Kandinsky, Franz Marc, Paul Klee, Kurt Schwitters and Marc Chagall. For the music, we have downloaded fifteen songs from All Music Guide.
The graph above shows that on one hand, there exists song #4 where most of the subjects preferred the right image. On the other hand, there also exists song #10 where most of the subjects preferred the other way. To summarize, this graph shows that for a given image pair, there were many songs where the subjects showed a strong inclination to select one image over another.
Interestingly, here is an example where for almost all songs, most subjects chose the right image over the left one.For me, a possibly explanation could be that for this example, the subjects actually found the right image to be much more expressive than the left one and could not associate any sound with the left image.
One of the biggest challenges in this project was time estimation. Designing the tests, getting started with EEG experiments, then finding subjects amid the crazy term-ending situation for all my friends, and then finally, analyzing data -- time wise, this project was very hard. The fun thing was this was my first stab at doing a user study of an intriguing question. And I am very excited because I will be continuing in this research direction in my next semester, with analyzing the EEG responses I will continue gathering.