We have ${\mathbf y}^\top{\mathbf x} = \sum_i y_i x_i$. So we can write \[ e = (z - \sum_i y_i x_i)^2 \] Differentiating with respect to $x_i$ \[ \frac{de}{dx_i} = -2 (z - \sum_i y_i x_i) y_i = -2 (z - {\mathbf y}^\top{\mathbf x}) y_i \] So, from rule 1, we get \[ \frac{de}{d{\mathbf x}} = -2(z - {\mathbf y}^\top{\mathbf x}){\mathbf y} \]
Let ${\mathbf h} = {\mathbf z} - {\mathbf X}{\mathbf y}$ (we're using the notation ${\mathbf h}$, instead of ${\mathbf h}({\mathbf X})$ for brevity). ${\mathbf h}$ is an $N \times 1$ vector
We can write $e = {\mathbf h}^\top{\mathbf h}$, and
\[ \frac{de}{d{\mathbf X}} = \left(\frac{d{\mathbf h}}{d{\mathbf X}}\right)^\top\frac{de}{d{\mathbf h}} \]Let us consider each of the two terms on the RHS, starting with the outer term:
First
\[ \frac{de}{d{\mathbf h}} = 2{\mathbf h} \]Now for the second term. The $i^{\rm th}$ element of ${\mathbf h}$ is given by
\[ h_i = z_i - \sum_l X_{i,l} y_l \]Let us represent
\[ {\mathbf h}' = \frac{d\mathbf h}{d\mathbf X} \]By rule 4, ${\mathbf h}'$ is an $N \times M \times N$ tensor, whose $(i,j,k)^{\rm th}$ element is given by \[ h'_{i,j,k} = \frac{d h_i}{d X_{k,j}} = \frac{d(z_i - \sum_l X_{i,l} y_l)}{d X_{k,j}} \]
If none of the numerator terms include $X_{k,j}$, this derivative is 0. The numerator only includes $X_{k,j}$ if $i=k$. In this case, separating out the $X_{k,j}$ term in the numerator, it can be written as $z_i - \sum_{l\neq j}X_{k,l} y_l - X_{k,j} y_j$. Only the second term depends on $X_{k,j}$. Consequently \[ h'_{i,j,k} = \begin{cases} -y_j \,\,\, {\rm if}\,\, i = k \\ 0\,\,\, {\rm else} \end{cases} \] This is a “diagonal” tensor as mentioned in the hint.
Note that $h'_{i,j,k} = h'_{k,j,i}$. Thus, the transpose, ${\mathbf h}'^\top = {\mathbf h}'$.
Let us represent the overall derivative that we are trying to compute as ${\mathbf E}$ for brevity, i.e. \[ \frac{de}{d{\mathbf X}} = {\mathbf E} \]
We have \[ {\mathbf E} = \frac{d{\mathbf h}}{d{\mathbf X}} \frac{de}{d{\mathbf h}} = 2{\mathbf h}'^\top {\mathbf h} \]
The $(i,j)^{\rm th}$ element of ${\mathbf E}$ is given by
\[ E_{i,j} = 2\sum_k h'_{i,j,k} h_k \]But, since $h'_{i,j,k}$ only takes non-zero values for $i=k$, and is equal to $-y_j$ at $i=k$, we can write \[ E_{i,j} = -2y_j h_i \] i.e. the $(i,j)^{\rm th}$ entry of ${\mathbf E}$ is -2 times the product o the $i^{\rm th}$ entry of ${\mathbf h}$ and the $j^{\rm th}$ entry of ${\mathbf y}$. In other words, ${\mathbf E}$ is simply -2 times the outer product of ${\mathbf h}$ and ${\mathbf y}$ \[ {\mathbf E} = -2 {\mathbf h} {\mathbf y}^\top \]
Putting in the actual value of ${\mathbf h}$, we get our answer \[ \frac{de}{d{\mathbf X}} = -2 ({\mathbf z} - {\mathbf X}{\mathbf y}){\mathbf y}^\top \]