Music Transcription

Below are links to pieces of music and recordings of several notes. You are required to transcribe the music.

For transcription, you will have to determine the note or set of notes being played at each time

A. Blowin'  in  the  wind

[This file] contains a recording of a harmonica piece rendering the song "Blowin' in the wind". Also given are a collection of notes, and an example of musical scale. Transcribe both the musical scale and the main song in terms of notes

[wav] version of music_extract.mp3

B. Polyushka  Polye

[This] is a recording of "Polyushka Polye", played on the harmonica. It has been downloaded with permission from the artist

Below are a set of notes from a harmonica

Note Wav file
E e.wav
F f.wav
G g.wav
A a.wav
B b.wav
C c.wav
D d.wav
E2 e2.wav
F2 f2.wav
G2 g2.wav
A2 a2.wav

Matlab  Instructions

Download the following matlab files: [stft.m]

You can read a wav file into matlab as follows:

[s,fs] = wavread('filename');
s = resample(s,16000,fs);

The recordings of the notes can be computed to a spectrum as follows:

spectrum = mean(abs(stft(s',2048,256,0,hann(2048))),2);

'spectrum' will be a 1025 x 1 vector

The recordings of the complete music can be read just as you read the notes. To convert it to a spectrogram, do the following:

sft = stft(s',2048,256,0,hann(2048));
sphase = sft./abs(sft);
smag = abs(sft);

'smag' will be a 1025 x K matrix where K is the number of spectral vectors in the matrix. We will also need 'sphase' to reconstruct the signal later

Additional  Info

Compute the spectrum for each of the notes. Compute the spectrogram matrix 'smag' for the music signal. This matrix is composed of K spectral vectors. Each vector represents 16 milliseconds of the signal.

You may find, projections, pseudo inverses, and dot products useful. If you know of any other techniques, you can use those too. Tricks like thresholding (setting all values of some variable that fall below a threshold to 0) might also help.

The output should be in the form of a matrix:
1 1 0 0 0 0 0 1 ...
0 0 0 1 1 0 1 1 ...
0 1 1 1 0 1 1 1 ...
...........................

Each row of the matrix represents one note. Hence there will be as many rows as you have notes in table 1.

Each column represents one of the columns in the spectrogram for the music. So if there are K vectors in the spectrogram, there will be K vectors in your output.

Each entry will denote if a note was found in that vector or not. For instance, if matrix entry (4,25) = 0, then the fourth note (d) was not found in the 25th spectral vector of the signal.

C. Synthesizing  Audio

You can use the notes and the transcription matrix thus obtained to synthesize audio. Note that matrix multiplying the notes and the transcription will simply give you the magnitude spectrum. In order to create meaningful audio, you will need to use the phases as well. Once you have the phases included, you can use the stft to synthesize a signal from the matrix. Submit the synthesized audio along with the matrix.

 

Linear  Algebra

Linear  Algebra   I

Let's warm up with a simple problem.

A. Rotational Matrices:

A rotation in 3-D space is characterized by two angles. We will characterize them as a rotation along the $X-Y$ plane, and a rotation along the $Y-Z$ plane. Derive the equations that transform a vector $[x, y, z]^\top$ to a new vector $[x', y', z']^\top$ by rotating it counterclockwize by angle $\theta$ along the $X-Y$ plane and by an angle $\delta$ along the $Y-Z$ plane. Represent this as a matrix transformation of the column vector $[x, y, z]^\top$ to the column vector $[x', y', z']^\top$. The matrix that transforms the former into the latter is a rotation matrix.

B. Projecting Instrument Notes:

For this problem you will transform the harmonica notes of problem 1 to piano notes, by a matrix transform. The piano notes can be downloaded from [here]. Note that, in this case, you don't know which piano notes correspond to which notes from the harmonica. There are 3 parts to this problem:

Linear  Algebra  II

The following matrix transforms 4-dimensional vectors into 3-dimensional ones:

  \[ A = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 3 & 4 & 5 & 7 \\ 5 & 7 & 9 & 11 \end{bmatrix} \]

A.   Length of projected vector

A 4x1 vector $v$ of length 4 is transformed by $A$ as $u = Av$. What is the longest that $u$ can be? What is the shortest length of $u$?

B.   Restricted Isometry

The “Restricted Isometry Property” (RIP) constant of a matrix characterizes the change in length of vectors transformed by sub-matrices of the matrix. For our matrix $A$, let $A_s$ be a matrix formed of any $s$ columns of $A$. If $A$ is $M \times N$, $A_s$ will be $M\times s$. We can form $A_s$ in $^NC_s$ ways from the $N$ columns of $A$ (we assume that the order of vectors in $A_s$ is immaterial). Let $w$ be an $s \times 1$ vector of length 1. Let $l_{max}$ be the longest vector that one can obtain by transforming $w$ by any $A_s$. Let $l_{min}$ be the shortest vector obtained by transforming $w$ by any $A_s$. The RIP-$s$ constant $\delta_s$ of the matrix $A$ is defined as:
\[ \delta_s = \frac{l_{max} - l_{min}}{l_{max} + l_{min}} \] What is $\delta_2$ (i.e. $\delta_s$ for $s = 2$) for the matrix $A$ given above? Hint: You must consider all $^4C_2$ possible values for $A_s$.

Linear  Algebra  III

In this problem we will perform basic matrix calculus.

We begin by stating a few rules we will use. Everywhere the notation $x_i$ refers to the $i^{\rm th}$ component of a vector ${\mathbf x}$ and $x_{i,j}$ refers to the $(i,j)^{\rm th}$ component of a matrix ${\mathbf X}$.

  1. The derivative of a scalar $z$ with respect to an $N \times 1$ vector ${\mathbf x}$ is an $N \times 1$ vector. The $i^{\rm th}$ component of this vector is $\frac{dz}{dx_i}$.
  2. The derivative of a scalar $z$ with respect to an $N \times M$ matrix ${\mathbf X}$ is an $N \times M$ matrix, whose $(i,j)^{\rm th}$ component is $\frac{dz}{dx_{i,j}}$.
  3. The derivative of an $N \times 1$ vector ${\mathbf y}$ with respect to an $M \times 1$ vector ${\mathbf x}$ is an $N \times M$ matrix, whose $(i,j)^{\rm th}$ component is $\frac{dy_i}{dx_j}$.

Note the order of the indices in the following

  1. The derivative of an $N \times 1$ vector ${\mathbf y}$ with respect to an $M \times L$ matrix ${\mathbf X}$ is an $N \times L \times M$ tensor (note the reversal of the order of $L$ and $M$), whose $(i,j,k)^{\rm th}$ component is $\frac{dy_i}{dx_{k,j}}$ (again, note the reversal of the order of indices in the denominator -- to get the derivative in the $i,j,k$ location, we differentiate w.r.t to the variable in the $k,j$ location of ${\mathbf X}$.
  2. The derivative of an $N \times K$ matrix ${\mathbf Y}$ with respect to an $M \times L$ matrix ${\mathbf Y}$ is an $N \times K \times L \times M$ tensor, whose $(i,j,k,l)^{\rm th}$ component is $\frac{dy_{i,j}}{dx_{l,k}}$.
  3. The derivative of an $N_1 \times N_2 \times \cdots \times N_L$ tensor ${\mathbf Y}$ with respect to an $M_1\times M_2 \times \cdots \times M_K$ tensor is an $N_1 \times N_2 \times \cdots \times N_L \times M_K \times M_{K-1} \times \cdots \times M_1$ tensor.

The transpose of any $N_1 \times N_2$ matrix is an $N_2 \times N_1$ matrix. We will represent the transposition operation by the superscript $\top$. Let ${\mathbf Y} = {\mathbf X}^\top$. Then $y_{i,j} = x_{j,i}$ ({\em i.e.} the $(i,j)$-th component of ${\mathbf Y}$ is the $(j,i)$-th component of ${\mathbf X}$.

For the purposes of the computation here, we will expand the notion of transposes to tensors. Let ${\mathbf Y}$ be any $N_1 \times N_2 \times \cdots \times N_K$ tensor. By our definition ${\mathbf X} = {\mathbf Y}^\top$ is an $N_K \times N_{K-1} \times \cdots \times N_1$ tensor, whose components are given by $x_{i_1,i_2,\cdots,i_K} = y_{i_K,i_{K-1},\cdots,i_i}$.

Using the above definitions, we can also write the following chain rule.

  1. Let ${\mathbf X}$ be any matrix (noting that a vector is also a one-column matrix, so the same rule also applies to vectors, or tensors in general). Let $g(\bullet)$ and $h(\bullet)$ be two functions. The functions may have scalar, vector or tensor outputs. Let \[ y = g\left(h\left( {\mathbf X}\right)\right) \] Then \[ \frac{dy}{d{\mathbf X}} = \left(\frac{dh({\mathbf X})}{{\mathbf X}}\right) ^{\top} \frac{d g}{dh} \] where $dg$ is shorthand for $d(g(h({\mathbf X})))$ and $dh$ stands for $d(h({\mathbf X}))$. Note the order of multplication.
  2. In general, if $f_1(\bullet)$, $f_2(\bullet)$, $f_3(\bullet) \cdots$ are functions, then if \[ y = f_1\left(f_2\left(f_3\left(\cdots f_K\left({\mathbf X}\right)\right)\right)\right) \] then \[ \frac{dy}{d{\mathbf X}} = \left(\frac{df_K({\mathbf X})}{{\mathbf X}}\right)^\top \left( \frac{d f_{K-1}}{df_K}\right)^\top \left(\frac{d f_{K-2}}{df_{K-1}}\right)^\top \cdots \left(\frac{d f_2}{d f_1}\right)^\top \frac{d f_1}{df_2} \] Once again, note the order of computation.
Finally, we give the following rule for multiplying tensors.
  1. Let ${\mathbf X}$ be an $N \times M \times L$ tensor. It can only left multiply $L \times 1$ vectors. Let ${\mathbf y}$ be an $L \times 1$ vector. Then ${\mathbf Z} = {\mathbf Xy}$ is an $N\times M$ matrix whose components are given by \[ z_{i,j} = \sum_k x_{i,j,k}y_k \]
  2. Let ${\mathbf X}$ be an $N \times M \times L$ tensor. It can only left multiply $L \times M$ matrices. Let ${\mathbf Y}$ be an $L \times M$ matrix. Then ${\mathbf Z} = {\mathbf XY}$ is an $N\times 1$ vector whose components are given by \[ z_i = \sum_j\sum_k x_{i,j,k}y_{k,j} \]
We are now set to state our problems.

A.   Derivative of scalar function w.r.t. vector argument

Let ${\mathbf x}$ and ${\mathbf y}$ be $N\times 1$ vectors. Let \[ e = \| z - {\mathbf y}^\top{\mathbf x}\|^2 \] Show that \[ \frac{de}{d{\mathbf x}} = -2\left(z - {\mathbf y}^\top{\mathbf x}\right) {\mathbf y} \] The derivation is simple, based on rule number 1.

B.   Derivative of scalar function w.r.t. matrix argument

Let ${\mathbf X}$ be an $N\times M$ matrix, ${\mathbf z}$ be an $N \times 1$ vector, ${\mathbf y}$ be $M\times 1$ vectors. Let \[ e = \| {\mathbf z} - {\mathbf X}{\mathbf y}\|^2 \] Show that \[ \frac{de}{d{\mathbf X}} = -2\left({\mathbf z} - {\mathbf X}{\mathbf y}\right) {\mathbf y}^\top \]

HINT:

For part B., the chain rule gives us: \[ \frac{de}{d{\mathbf X}} = \left(\frac{d{\mathbf X}{\mathbf y}}{d{\mathbf X}}\right)^\top \frac {d \| {\mathbf z} - {\mathbf X}{\mathbf y}\|^2} {d{\mathbf X}{\mathbf y}} \] Note that ${\mathbf X}{\mathbf y}$ is an $N \times 1$ vector whose $i^{\rm th}$ component is given by $\sum_j x_{i,j}y_j$. Therefore, representing ${\mathbf u} = {\mathbf X}{\mathbf y}$, \[ \frac {d u_i}{d x_{j,k}} = \begin{cases} y_k~~~~if~~j=i\\ 0~~~~~otherwise \end{cases} \] Thus, although $\frac{d{\mathbf u}}{\mathbf X}$ is an $N \times M \times N$ tensor, only the entries along the diagonal plane $i=j$ are non-zero, and all other terms are zero. That is, if we let ${\mathbf V} = \frac{d{\mathbf u}}{{\mathbf X}}$, using rule 4 we get \[ v_{i,k,j} = \begin{cases} y_k~~~~if~~i=j\\ 0~~~~~otherwise \end{cases} \] The resulting Tensor has the structure shown below. Only the diagonal plane shown in the figure has non-zero values. All the remaining values are 0. Each row in the diagonal plane is the vector ${\mathbf y}^\top$.

tensor

Because of the zero-valued off diagonal elements, tensor transposition has no effect on the tensor. Multiplying this tensor with any $M\times 1$ vector ${\mathbf t}$ has the following geometric interpretation. The figure below illustrates the operations.

tensor

The depth dimension of the tensor, which is its trailing dimension, multiplies the vector. The highlighted points and the arrows indicate corresponding elements that are multiplied together. The products are summed.

Clearly, since only the diagonal elements have non-zero values, only one of the component-wise products has a non-zero value, and the sum takes this value.

This has the effect that for any $M \times 1$ vector ${\mathbf t}$, the matrix ${\mathbf S} = {\mathbf V}^\top{\mathbf t}$ is an $M \times N$ matrix whose components (according to rule 9) are given by \[ s_{k,j} = \sum_{i} v_{i,j,k}t_i = t_ky_j \] The indices can be verified by referring to the illustration above.

Due  Date

The assignment is due at the beginning of class on March 10th 2015. Each day of delay thereafter will automatically deduct 5% of the maimum points from your score.

Solutions may be emailed to Pallavi Baljekar, and must be cc-ed to Bhiksha. The message must have the subject line "MLSP assignment 1". It should include a report (1 page or longer) of what you did, and the resulting matrix as well as the synthesized audio.