Kernel Regression

Introduction

Welcome to the programming component of this assignment!

This assignment includes an autograder for you to grade your answers on your machine. This can be run with the command:

python3.6 autograder.py

The code for this assignment consists of several Python files, some of which you will need to read and understand in order to complete the assignment, and some of which you can ignore. You can download and unzip all the code, data, and supporting files from hw8_programming.zip.

Files you will edit

`kernel_regression.py`	Your code to implement kernel regression tasks.
`additional_code.py`	You shouldn't need this file for this assignment, but it is provided just in case you have additional code that doesn't fit into kernel_regression.py for some reason. If you do submit this file the code should be runnable by calling `python3.6 additional_code.py`, but there are no requirements on the format and it will not be executed by the autograder.

Files you might want to look at

`util.py`	Convenience methods to generate various plots that will be needed in this assignment.
`test_cases/Q/.py`	These are the unit tests that the autograder runs. Ideally, you would be writing these unit tests yourself, but we are saving you a bit of time and allowing the autograder to check these things. You should definitely be looking at these to see what is and is not being tested. These test cases also generate plots related to the task; the plots are saved in a directory named `figures`. The autograder on Gradescope may run a different version of these unit tests.

Files you can safely ignore

`autograder.py`	Autograder infrastructure code.

Files to Edit and Submit: You will fill in portions of kernel_regression.py (and additional_code.py, if necessary) during the assignment. You should submit this file containing your code and comments to the Programming component on Gradescope. Please do not change the other files in this distribution or submit any of our original files other than these files. Please do not change the names of any provided functions or classes within the code, or you will wreak havoc on the autograder.

Report: Many of the sections in this programming assignment will contain questions that are not autograded. You will place the requested results in the appropriate locations within the PDF of the Written component of this assignment.

Evaluation: Your assignment will be assessed based on your code, the output of the autograder, and the required contents of in the Written component.

Academic Dishonesty: We will be checking your code against other submissions in the class for logical redundancy. If you copy someone else's code and submit it with minor changes, we will know. These cheat detectors are quite hard to fool, so please don't try. We trust you all to submit your own work only; please don't let us down. If you do, we will pursue the strongest consequences available to us.

Getting Help: You are not alone! If you find yourself stuck on something, contact the course staff for help. Office hours, recitation, and Piazza are there for your support; please use them. If you can't make our office hours, let us know and we will schedule more. We want these assignments to be rewarding and instructional, not frustrating and demoralizing. But, we don't know when or how to help unless you ask.

Question 1: Kernel Functions

In kernel_regression.py, implement the following kernels in their respective functions. See function docstring for details.

Kernel	Function	Equation
Boxcar	`kernel_boxcar(x, z, width)`	\(1\), if \(\\|\mathbf{x}-\mathbf{z}\\|_2 \leq \frac{width}{2}\) \(0\), otherwise
RBF	`kernel_rbf(x, z, gamma)`	\(e^{-\gamma \\|\mathbf{x}-\mathbf{z}\\|_2^2}\)
Linear	`kernel_linear(x, z)`	\(\mathbf{x}^T\mathbf{z}\)
Polynomial	`kernel_polynomial(x, z, d)`	\((\mathbf{x}^T\mathbf{z}+1)^d\)

You may run the following command to run a quick unit test on your Q1 implementation:

python3.6 autograder.py -q Q1

We encourage you to write your own code to test out your implementation as you work through the assignment. For example, you may want to use some of the functions in util.py to plot the functions you just implemented.

The autograder will also generate plots of your kernel functions for a fixed 2D point \(\mathbf{z}\) and various hyperparameter settings. It will save these plots as as png files in a new directory named figures. You are required to include these some of these figures as part of the written component of this assignment. See the written component for details about which plots to include.

Question 2: Kernel Regression

In the predict_kernel_regression function in kernel_regression.py, implement kernel regression as defined in lecture and use it to predict the output values for a set of input points, \(\mathbf{X}\). See function docstring for details. We have implemented a naïve version of kernel regression predict_naive_kernel_regression, which my be helpful for implelmentation and debugging.

You may run the following command to run a quick unit test on your Q2 implementation:

python3.6 autograder.py -q Q2

The autograder will also generate plots your regression prediction for different kernels, different numbers of training points, and different hyperparameter setting. It will save this plot as gradient_descent.png in a new directory named figures. You are required to include some of these figures as part of the written component of this assignment. See the written component for details about which plots to include.

Question for the write-up: Explain the relationship between settings of gamma in the RBF filter and over/under fitting.

Question for the write-up: For the linear kernel with N=200 training points, why is the prediction surface significantly below the training points?

Question for the write-up: Among all of the kernels and hyperparameter settings that the autograder test cases ran through, which kernel and hyperparameter combination should you choose? Why?

Submission

Complete all questions as specified in the above instructions. Then upload kernel_regression.py (and additional_code.py if you used it) to Gradescope. Your submission should finish running within 10 minutes, after which it will time out on Gradescope.

Don't forget to include any request results in the PDF of the Written component, which is to be submitted on Gradescope as well.

You may submit to Gradescope as many times as you like. You may also run the autograder on your own machine to speed up the development process. Just note that the autograder on Gradescope will be slightly different than the local autograder. The autograder can be invoked on your own machine using the command:

python3.6 autograder.py

Note that running the autograder locally will not register your grades with us. Remember to submit your code when you want to register your grades for this assignment.

The autograder on Gradescope might take a while but don't worry: so long as you submit before the deadline, it's not late.