16-811: Math Fundamentals for Robotics, Fall 2025

Brief Summaries of Lectures

Num	Date	Summary
01	26.August	We discussed course mechanics for about half the lecture. We looked at the Jacobian of a simple 2-link manipulator. The Jacobian is a matrix that relates differential joint motions to differential motions of the arm's end-effector. Using a virtual work argument we showed how the transpose of this matrix relates desired Cartesian forces imparted by the end-effector to the joint torques necessary to generate those forces. A summary of our discussion appears here. We further considered the case of multiple fingers gripping an object (or multiple legs standing on some surface) and wrote down a linear equation of the form `-F = Wc` to describe the contact forces `c` needed to oppose a generalized force `F` (force and torque) acting on some object. Here `W` is the so-called wrench matrix that models forces and torques induced by contact forces. A summary of that discussion appears here. We discussed the importance of bases in representing linear transformations, and in particular, how the matrix used to represent a given transformation changes as one changes bases. A basis for a vector space is a set of linearly independent vectors that spans the vector space. This means that every vector in the vector space can be written in a unique way as a finite linear combination of basis vectors. The representation of a linear function by a matrix depends on two bases: one for the input and one for the output. As we will see in future lectures, choosing these bases well allows one to represent the linear function by a diagonal matrix. Diagonal matrices describe decoupled linear equations, which are very easy to solve. For next time, please go back to your linear algebra course and remind yourself of the method one uses to generate linear coordinate transformations, represented as matrices: The columns of such a matrix are the vectors of one basis expressed as linear combinations of the other basis's vectors. This example illustrates some of the relevant points. Briefly, at the end of the lecture, we reviewed some facts from linear algebra. We defined the column space, row space, and null space of a matrix (and linear functions more generally). As an example, we considered the matrix linked here.
02	28.August	We started the lecture with a review of some facts from last time. Subsequently, we discussed some conditions under which a square matrix is not invertible. During much of the lecture we discussed Gaussian Elimination. We computed the `PA = LDU` decomposition for some matrices. Such a decomposition exists whenever the columns of `A` are linearly independent (and may exist more generally for other matrices). For an invertible matrix `A`, the `PA = LDU` decomposition makes solving `Ax = b` simple: First, solve `Ly = Pb` for `y`. This is easy because `L` is lower-triangular, so one can basically just "read off" the components of `y` using substitution. Second, solve `Ux = D^-1 y` for `x`. This is easy because `U` is upper-triangular, so one can again "read off" the components of `x`, now using backward substitution. Also note that `D^-1` stands for inverse of the matrix `D`. This matrix is easy to compute: Its entries are all `0`, except on the diagonal. Where `D` has entry `d` on the diagonal, `D^-1` has entry `1/d`. Here are the examples we considered in lecture. Near the end of lecture, we started our discussion of diagonalization based on eigenvectors. We will work through an example next week. This method serves as a springboard for Singular Value Decomposition (SVD), which we will also discuss next week.
03	2.September	Today, we first looked at diagonalization based on eigenvectors, then used that as springboard for Singular Value Decomposition. Here is a summary of the eigenvector discussion. The reason one wants to diagonalize a matrix is that solving `Ax = b` is simple if `A` is diagonal. For then one has a set of independent linear systems that are solved independently as `x_i = b_i / a_ii`. For many physical systems, one can obtain a basis of eigenvector that permits diagonalization of the matrix. And in some cases, the basis may even be "nice", by which we mean that the vectors have unit length and are pairwise perpendicular. This isn't always possible, but if the matrix `A` has real entries and satisfies `A^TA = AA^T`, then it is possible. In that case, `A = S Λ S^-1`, with the columns of `S` being the eigenvectors of `A` and with `Λ` a diagonal matrix consisting of the corresponding eigenvalues. Some cases are particularly nice. For instance, if `A` is symmetric, then the eigenvalues are real and `S` is orthogonal. That means `S^TS = I = SS^T`, so the inverse of `S` is very easy to compute: `S^-1 = S^T`. (This idea generalizes to the complex setting; look up "unitary" matrix.) So, solving `Ax = b` for `x` now amounts to (1) solving `Λy = S^Tb` for `y`, which is easy since `Λ` is diagonal, then (2) converting back to `x`-coordinates, by `x=Sy`. Or, on one line: `x = S Λ^-1 S^-1 b`. In the second part of lecture we began our exploration of Singular Value Decomposition (SVD): `A = U Σ V^T`. The main idea is that one can employ two possibly different coordinate transformations for the input and output spaces (domain and range), even if they are the same space, to obtain a simple diagonal representation of any matrix `A` (in particular, the matrix need not be square). (We considered this introductory example.) Each coordinate transformation is given by an orthogonal matrix, thus amounting to a rotation (possibly with a reflection). SVD chooses these coordinates so that the first `k` columns of the output coordinate transformation `U` constitute an orthonormal basis for the column space of `A` and all but the first `k` columns of the input coordinate transformation `V` constitute an orthonormal basis for the null space of `A`. Here `k` is the rank of `A`. The diagonal matrix `Σ` has `k` nonzero entries, all positive. (Aside: If some of these are nearly zero, it can be convenient to artificially set them to be zero, since the corresponding row space vectors are almost vectors in the null space of `A`.) The SVD decomposition facilitates solving linear equations. We wrote down a formal solution, which we refer to as the SVD solution: `x = V (1/Σ) U^T b`. A key insight is to realize that pre-multiplying a vector by an orthogonal matrix amounts to taking dot products of that vector with elements of an orthonormal basis. With that intuition, one can see that `x` lies in the row space of `A` and minimizes `‖Ax - b‖` (minimized over all possible `x`). In particular, if `b` is in the column space of `A`, then `x` is an exact solution satisfying `Ax = b`. There could be more than one `x` that minimizes `‖Ax - b‖` (in particular, there could be more one exact solution to `Ax = b`). This occurs when `A` has a non-trivial null space; the SVD solution `x` is the solution with minimum norm, i.e., the solution closest to the origin. Here is a summary of one path to these insights. We will discuss SVD further in the next lecture.
04	4.September	We reviewed some of the material from the previous lecture, including some additional details posted previously. We worked through an example. We sketched the internals of the SVD algorithm, without detail, except to say that the algorithm uses Householder reflections. We defined Householder reflections (see also page 22 of the linear algebra notes). We worked out some properties. This example illustrates the use of Householder reflections as a method for zeroing-out entries when diagonalizing a a matrix (see also page 21 of the linear algebra notes for the encompassing context). At the end of lecture, we mentioned that the "pseudo-inverse" solution one sees in some settings is the same as the SVD solution. Here is a brief summary. We very briefly mentioned the trace operator `Tr` for square matrices, along with the fact that `Tr(AB) = Tr(BA)` for any pair of square matrices `A` and `B` of the same dimension.
05	9.Sept	Today, we discussed polynomial interpolation. Given `n+1` datapoints of the form `(x₀, f₀), …, (x_n, f_n)`, with the `x_i` all distinct, there is a unique polynomial `p_n(x)` of degree at most `n` such that `p_n(x_i) = f_i` for all `i`. One speaks of "interpolation" since one can think of the polynomial `p_n(x)` as giving an approximation to some underlying `f(x)` based on the measured datapoints. (The polynomial `p_n(x)` matches `f(x)` exactly at the datapoints `{x_i}`, but not necessarily at other `x`.) Comment: If one does not say "degree at most `n`" then there can be infinitely many different polynomials that pass through the given datapoints. To avoid that, one constrains the degree. If the datapoints are degenerate, then the interpolating polynomial models that degeneracy correctly. For instance, if one asks for at most a quadratic that passes through three points, and the three points happen to lie on a straight line, then the resulting quadratic will in fact also be a line. In lecture, we discussed the method of Divided Differences for computing interpolating polynomials. This method is useful when a new datapoint arrives, since one can construct a new interpolating polynomial from the old one by adding one term of higher degree. See the polynomial interpolation notes for details. (Those notes also the Lagrange method.) We defined the error in approximating a function `f(x)` by an interpolating polynomial `p_n(x)` of degree `n` (or less) to be `e_n(x) = f(x) - p_n(x)` . It turns out that this error is given by the "next" term one would write down when constructing an interpolating polynomial of degree `n+1`, much like the situation ones sees with Taylor approximations. If the function `f(x)` has sufficiently many derivatives, that error can then be expressed in terms of the `n+1^st` derivative of `f(x)` at some (generally unknown) intermediate point ξ. Specifically: `e_n(x) = f⁽ⁿ⁺¹⁾(ξ)(x-x₀)⋯(x-x_n)/(n+1)!` with ξ lying between the smallest and largest of the `x`-coordinates `x₀, ..., x_n`. One possibility is to use all the data available to construct a high-order interpolating polynomial. This can lead to over-fitting. A different application is to interpolate a known function using low order polynomials, but with varying datapoints --- a sliding window of such datapoints. A question is how many datapoints one needs to obtain a desired accuracy. We worked an example.
06	11.Sept	Today, we discussed numerical root-finding, that is, solving equations of the form `f(x) = 0` for `x`, with `f` no longer linear. We motivated that discussion with a very brief mention of robot motion planning. We discussed the following root-finding methods: Bisection with bracketing, Regula Falsi, Secant method, Newton's method, and Müller's method. See the notes on root-finding for most of this material and pages 11-16 in these notes for Müller's method. We showed that in general Newton's method has order two (aka 'quadratic') convergence, or better. We did so by writing a Taylor expansion for the error function, then observing that the constant and linear terms in this error function disappear, assuming `f′(ξ)` is nonzero, with `ξ` being the relevant root of `f(x)`. (You will explore in the homework the situation in which `f′(ξ) = 0`.) In this case, the quadratic term in the Taylor series is proportional to `f″(ξ)/f′(ξ)`. If the second derivative `f″(ξ)` at the root is also nonzero, then Newton's method converges quadratically. If the second derivative is zero, then Newton's method converges more quickly than quadratically. Near the end of lecture, we wrote a linear system of equations that implements Newton's method for finding simultaneous roots in higher dimensions. We sketched a two-dimensional example. Please see pages 20-25 here for further details.
07	16.Sept	Today we discussed the method of resultants. (The method is also known as parameter elimination.) The method is useful for deciding whether two polynomials have a common root. We illustrated the method using simple quadratics in three settings: deciding whether two univariate polynomials share a root, deciding whether the zero contours of two bivariate polynomials have a common intersection, and implicitizing a 2D curve parameterized by polynomials. See pages 1-3 and 11-24 of these notes.

Summaries of later lectures

Back to the webpage for 16-811