16-811: Math Fundamentals for Robotics, Fall 2025

Summaries of Lectures

Num	Date	Summary
01	26.August	We discussed course mechanics for about half the lecture. We looked at the Jacobian of a simple 2-link manipulator. The Jacobian is a matrix that relates differential joint motions to differential motions of the arm's end-effector. Using a virtual work argument we showed how the transpose of this matrix relates desired Cartesian forces imparted by the end-effector to the joint torques necessary to generate those forces. A summary of our discussion appears here. We further considered the case of multiple fingers gripping an object (or multiple legs standing on some surface) and wrote down a linear equation of the form `-F = Wc` to describe the contact forces `c` needed to oppose a generalized force `F` (force and torque) acting on some object. Here `W` is the so-called wrench matrix that models forces and torques induced by contact forces. A summary of that discussion appears here. We discussed the importance of bases in representing linear transformations, and in particular, how the matrix used to represent a given transformation changes as one changes bases. A basis for a vector space is a set of linearly independent vectors that spans the vector space. This means that every vector in the vector space can be written in a unique way as a finite linear combination of basis vectors. The representation of a linear function by a matrix depends on two bases: one for the input and one for the output. As we will see in future lectures, choosing these bases well allows one to represent the linear function by a diagonal matrix. Diagonal matrices describe decoupled linear equations, which are very easy to solve. For next time, please go back to your linear algebra course and remind yourself of the method one uses to generate linear coordinate transformations, represented as matrices: The columns of such a matrix are the vectors of one basis expressed as linear combinations of the other basis's vectors. This example illustrates some of the relevant points. Briefly, at the end of the lecture, we reviewed some facts from linear algebra. We defined the column space, row space, and null space of a matrix (and linear functions more generally). As an example, we considered the matrix linked here.
02	28.August	We started the lecture with a review of some facts from last time. Subsequently, we discussed some conditions under which a square matrix is not invertible. During much of the lecture we discussed Gaussian Elimination. We computed the `PA = LDU` decomposition for some matrices. Such a decomposition exists whenever the columns of `A` are linearly independent (and may exist more generally for other matrices). For an invertible matrix `A`, the `PA = LDU` decomposition makes solving `Ax = b` simple: First, solve `Ly = Pb` for `y`. This is easy because `L` is lower-triangular, so one can basically just "read off" the components of `y` using substitution. Second, solve `Ux = D^-1 y` for `x`. This is easy because `U` is upper-triangular, so one can again "read off" the components of `x`, now using backward substitution. Also note that `D^-1` stands for inverse of the matrix `D`. This matrix is easy to compute: Its entries are all `0`, except on the diagonal. Where `D` has entry `d` on the diagonal, `D^-1` has entry `1/d`. Here are the examples we considered in lecture. Near the end of lecture, we started our discussion of diagonalization based on eigenvectors. We will work through an example next week. This method serves as a springboard for Singular Value Decomposition (SVD), which we will also discuss next week.
03	2.September	Today, we first looked at diagonalization based on eigenvectors, then used that as springboard for Singular Value Decomposition. Here is a summary of the eigenvector discussion. The reason one wants to diagonalize a matrix is that solving `Ax = b` is simple if `A` is diagonal. For then one has a set of independent linear systems that are solved independently as `x_i = b_i / a_ii`. For many physical systems, one can obtain a basis of eigenvector that permits diagonalization of the matrix. And in some cases, the basis may even be "nice", by which we mean that the vectors have unit length and are pairwise perpendicular. This isn't always possible, but if the matrix `A` has real entries and satisfies `A^TA = AA^T`, then it is possible. In that case, `A = S Λ S^-1`, with the columns of `S` being the eigenvectors of `A` and with `Λ` a diagonal matrix consisting of the corresponding eigenvalues. Some cases are particularly nice. For instance, if `A` is symmetric, then the eigenvalues are real and `S` is orthogonal. That means `S^TS = I = SS^T`, so the inverse of `S` is very easy to compute: `S^-1 = S^T`. (This idea generalizes to the complex setting; look up "unitary" matrix.) So, solving `Ax = b` for `x` now amounts to (1) solving `Λy = S^Tb` for `y`, which is easy since `Λ` is diagonal, then (2) converting back to `x`-coordinates, by `x=Sy`. Or, on one line: `x = S Λ^-1 S^-1 b`. In the second part of lecture we began our exploration of Singular Value Decomposition (SVD): `A = U Σ V^T`. The main idea is that one can employ two possibly different coordinate transformations for the input and output spaces (domain and range), even if they are the same space, to obtain a simple diagonal representation of any matrix `A` (in particular, the matrix need not be square). (We considered this introductory example.) Each coordinate transformation is given by an orthogonal matrix, thus amounting to a rotation (possibly with a reflection). SVD chooses these coordinates so that the first `k` columns of the output coordinate transformation `U` constitute an orthonormal basis for the column space of `A` and all but the first `k` columns of the input coordinate transformation `V` constitute an orthonormal basis for the null space of `A`. Here `k` is the rank of `A`. The diagonal matrix `Σ` has `k` nonzero entries, all positive. (Aside: If some of these are nearly zero, it can be convenient to artificially set them to be zero, since the corresponding row space vectors are almost vectors in the null space of `A`.) The SVD decomposition facilitates solving linear equations. We wrote down a formal solution, which we refer to as the SVD solution: `x = V (1/Σ) U^T b`. A key insight is to realize that pre-multiplying a vector by an orthogonal matrix amounts to taking dot products of that vector with elements of an orthonormal basis. With that intuition, one can see that `x` lies in the row space of `A` and minimizes `‖Ax - b‖` (minimized over all possible `x`). In particular, if `b` is in the column space of `A`, then `x` is an exact solution satisfying `Ax = b`. There could be more than one `x` that minimizes `‖Ax - b‖` (in particular, there could be more one exact solution to `Ax = b`). This occurs when `A` has a non-trivial null space; the SVD solution `x` is the solution with minimum norm, i.e., the solution closest to the origin. Here is a summary of one path to these insights. We will discuss SVD further in the next lecture.
04	4.September	We reviewed some of the material from the previous lecture, including some additional details posted previously. We worked through an example. We sketched the internals of the SVD algorithm, without detail, except to say that the algorithm uses Householder reflections. We defined Householder reflections (see also page 22 of the linear algebra notes). We worked out some properties. This example illustrates the use of Householder reflections as a method for zeroing-out entries when diagonalizing a a matrix (see also page 21 of the linear algebra notes for the encompassing context). At the end of lecture, we mentioned that the "pseudo-inverse" solution one sees in some settings is the same as the SVD solution. Here is a brief summary. We very briefly mentioned the trace operator `Tr` for square matrices, along with the fact that `Tr(AB) = Tr(BA)` for any pair of square matrices `A` and `B` of the same dimension.
05	9.Sept	Today, we discussed polynomial interpolation. Given `n+1` datapoints of the form `(x₀, f₀), …, (x_n, f_n)`, with the `x_i` all distinct, there is a unique polynomial `p_n(x)` of degree at most `n` such that `p_n(x_i) = f_i` for all `i`. One speaks of "interpolation" since one can think of the polynomial `p_n(x)` as giving an approximation to some underlying `f(x)` based on the measured datapoints. (The polynomial `p_n(x)` matches `f(x)` exactly at the datapoints `{x_i}`, but not necessarily at other `x`.) Comment: If one does not say "degree at most `n`" then there can be infinitely many different polynomials that pass through the given datapoints. To avoid that, one constrains the degree. If the datapoints are degenerate, then the interpolating polynomial models that degeneracy correctly. For instance, if one asks for at most a quadratic that passes through three points, and the three points happen to lie on a straight line, then the resulting quadratic will in fact also be a line. In lecture, we discussed the method of Divided Differences for computing interpolating polynomials. This method is useful when a new datapoint arrives, since one can construct a new interpolating polynomial from the old one by adding one term of higher degree. See the polynomial interpolation notes for details. (Those notes also the Lagrange method.) We defined the error in approximating a function `f(x)` by an interpolating polynomial `p_n(x)` of degree `n` (or less) to be `e_n(x) = f(x) - p_n(x)` . It turns out that this error is given by the "next" term one would write down when constructing an interpolating polynomial of degree `n+1`, much like the situation ones sees with Taylor approximations. If the function `f(x)` has sufficiently many derivatives, that error can then be expressed in terms of the `n+1^st` derivative of `f(x)` at some (generally unknown) intermediate point ξ. Specifically: `e_n(x) = f⁽ⁿ⁺¹⁾(ξ)(x-x₀)⋯(x-x_n)/(n+1)!` with ξ lying between the smallest and largest of the `x`-coordinates `x₀, ..., x_n`. One possibility is to use all the data available to construct a high-order interpolating polynomial. This can lead to over-fitting. A different application is to interpolate a known function using low order polynomials, but with varying datapoints --- a sliding window of such datapoints. A question is how many datapoints one needs to obtain a desired accuracy. We worked an example.
06	11.Sept	Today, we discussed numerical root-finding, that is, solving equations of the form `f(x) = 0` for `x`, with `f` no longer linear. We motivated that discussion with a very brief mention of robot motion planning. We discussed the following root-finding methods: Bisection with bracketing, Regula Falsi, Secant method, Newton's method, and Müller's method. See the notes on root-finding for most of this material and pages 11-16 in these notes for Müller's method. We showed that in general Newton's method has order two (aka 'quadratic') convergence, or better. We did so by writing a Taylor expansion for the error function, then observing that the constant and linear terms in this error function disappear, assuming `f′(ξ)` is nonzero, with `ξ` being the relevant root of `f(x)`. (You will explore in the homework the situation in which `f′(ξ) = 0`.) In this case, the quadratic term in the Taylor series is proportional to `f″(ξ)/f′(ξ)`. If the second derivative `f″(ξ)` at the root is also nonzero, then Newton's method converges quadratically. If the second derivative is zero, then Newton's method converges more quickly than quadratically. Near the end of lecture, we wrote a linear system of equations that implements Newton's method for finding simultaneous roots in higher dimensions. We sketched a two-dimensional example. Please see pages 20-25 here for further details.
07	16.Sept	Today we discussed the method of resultants. (The method is also known as parameter elimination.) The method is useful for deciding whether two polynomials have a common root. We illustrated the method using simple quadratics in three settings: deciding whether two univariate polynomials share a root, deciding whether the zero contours of two bivariate polynomials have a common intersection, and implicitizing a 2D curve parameterized by polynomials. See pages 1-3 and 11-24 of these notes.
08	18.Sept	First, we discussed metrics and looked at the iso-contours of various `p`-norms. Then, we discussed Best Uniform Approximation. Specifically, we looked at the problem of approximating known functions using polynomials of order `n` (or less) so as to minimize the maximum error (i.e., minimize worst-case error). A key tool is the Chebyshev Equioscillation Theorem. See page 3 of the notes on approximation. For a more detailed analysis along with proofs, see this article. We also looked at Chebyshev polynomials more generally, and showed how they sometimes are useful for constructing best uniform approximations.
09	23.Sept	Least-Squares Approximation (of sampled functions): In lecture, we discussed the normal equations for approximating data with functions that are linear combinations of "basis" functions `φ₁(x),...,φ_k(x)`. (One chooses these functions in an application-dependent way. We considered an example with `φ₁(x)=1`, `φ₂(x)=x`, and `φ₂(x)=x²`.) We discussed inner products. We showed how to define an inner product on function spaces using an integral. Least-Squares Approximation (of known functions): We began our discussion of orthogonal approximation in function spaces, based on inner products. "Least squares" means we are trying to select a function `p(x)` from some class of functions (such as all polynomials of a certain maximal degree) to approximate a known function `f(x)` so as minimize a cost of the form `∫_a^b(e(x))²w(x)dx`. Here `e(x) = f(x) - p(x)` is the error of the approximation and `w(x)` is some strictly positive weighting function on `[a, b]` (perhaps simply the constant `1`). Such a cost is the inner product of `e(x)` with itself, where the inner product is defined as the integral `<f,g> = ∫_a^bf(x)g(x)w(x)dx`. Given an inner product, one can construct an orthogonal set of functions `{p₀(x), p₁(x), …}` that spans a desired space of interest, for instance, all cubic polynomials. One then approximates `f(x)` by projecting orthogonally onto this subspace. Orthogonal projection ensures that one is minimizing distance, i.e., the least squares error. The key step is expressing the projection as a linear sum of the `p_i(x)` by computing the inner product of `f(x)` with each `p_i(x)`: `p(x) = Σ_id_ip_i(x)`, with `d_i = <f,p_i>/<p_i,p_i>`. We will work through an example in the next lecture.
10	25.Sep	Least-Squares Approximation (of known functions): We finished our discussion of Least-Squares Approximation, in the context of function spaces. As an example, we constructed the Legendre polynomials, then used those to approximate the exponential function. Least-Squares Approximation (of periodic functions): We briefly discussed Fourier series approximations to periodic functions, using exponential functions of the form `e^ik(2π/τ)x`, with τ the period of the function being approximated, `i=√-1`, and `k` varying over all integers. (Such complex exponentials effectively amount to cosines and sines). Once again we took the perspective that these functions constitute an orthonormal basis for a vector space, now extended to infinite sums. The Fourier coefficients `{f_k}` of a given function `f` are the inner products of `f` with each of these basis functions: `f_k = <f,e^ik(2π/τ)x>`. In other words, they are projections onto "perpendicular coordinate axes" in function space. (Each basis function `e^ik(2π/τ)x` gives one such coordinate axis.) Each Fourier coefficient `f_k` is a number computed via an integral. Computing the integral numerically amounts to sampling the function, meaning one is really computing a discrete inner product, leading to aliasing. We mentioned the Nyquist Sampling Theorem.
11	30.Sep	We covered various parts of the material contained on pages 1 to 26 of the notes on differential equations. Specifically: We discussed first-order and second-order differential equations. We discussed the analytic homogeneous solutions one obtains when those equations are linear. We considered some numerical methods for solving first-order initial-value problems, including Euler's method, two of the Runge-Kutta methods, and two of the Adams-Bashforth methods. We discussed local error, noting that Euler's method has a local error proportional to `h²`, with `h` being the step size, while Runge-Kutta 4 has a local error proportional to `h⁵`, as does the Adams-Bashforth method of order 4.
12	2.Oct	Today we began our discussion of optimization. Throughout, we were working with real-valued functions defined on Euclidean space (or possibly some convex subdomain of Euclidean space). We assumed that all our functions had continuous second partial derivatives. We defined a function's gradient `∇f` and Hessian `[∇²f]`. The gradient is the vector `(∂f/∂x_i)` of first partial derivatives of `f`. The Hessian is the matrix `[∂²f/∂x_i∂x_j]` of second partial derivatives of `f`. The Hessian is symmetric since `f` has continuous second partials. We also defined the critical points of a function as the set of points where the function's gradient is zero. Much as in one-dimensional high-school calculus, if a function `f` has a local minimum at point `x^`, then the gradient `∇f(x^)` at that point must be zero and the Hessian `[∇²f](x^)` at that point must be symmetric positive semi-definite. The converse holds as well if the Hessian is symmetric positive definite, in which case one can even conclude that the local minimum is a strict local minimum. We looked at a 2D example: `f(x,y) = xy`. This function has a single critical point, namely the origin. The shape of the function there is what is known as a saddle point, meaning `f(x,y)` looks to have a local minimum along one direction and a local maximum along another direction. We could see this by looking at the directions of the gradient of `f` near the critical point, or by looking at the the Hessian, in particular its eigenvalues. Here is a rough picture (the black vectors are perpendicular to the surface): We reviewed Golden Section Search for 1D optimization, as a generalization of bisection for root finding. (That technique winds up being useful for line optimizations in arbitrary dimensions.) We discussed the significance of positive semi-definite matrices, observing that these characterize convex functions. We proved that every local minimum of a convex function is a global minimum and we proved that the set of all points at which the function attains this minimum is a convex set. We discussed Gradient Descent* (also known as Steepest Descent) for finding local minima. We noted the potential for zigzag behavior in ellipsoidal troughs of iso-contours. We observed that those troughs depend on the eigenvalues of the Hessian. This suggested turning ellipsoidal troughs into circles or spheres, something we will investigate further next time. We mentioned Newton's Method, in the context of optimization.
13	7.Oct	Last time we saw that Gradient Descent has the potential for zigzag behavior in ellipsoidal troughs of iso-contours. The shape of the troughs depends on the eigenvalues of the Hessian of the function being optimized. Today we thought about choosing descent directions more carefully, so as to reduce such zigzagging. Near a local optimum a function looks much like a quadratic (by Taylor's Theorem), so it is convenient to focus on purely quadratic functions as models in developing an algorithm that mitigates zigzagging. The algorithm we derived is called the Conjugate Gradient Algorithm. It chooses descent directions in a manner that effectively turns ellipsoidal troughs into circles or spheres (for the purely quadratic case). Recall that a purely quadratic function is of the form `f(x) = c + b^Tx + (¹⁄₂)x^TQx`, with `c` a real number, both `b` and `x` real `n`-dimensional vectors, and `Q` a real symmetric positive-definite `nxn` matrix. There are several magical steps in deriving the Conjugate Gradient Algorithm, meaning that doing something local has a global effect. A first piece of magic is the simple fact that we can view a global optimization as a collection of local projections, using the inner product defined by the second derivative matrix Q. A second instance of magic occurs in the Expanding Subspace Theorem: Doing a line optimization in that particular setting actually ensures that the function attains an optimum over an entire affine subspace. This magic is a consequence of function convexity and Q-orthogonality of the descent directions. A third instance of magic appears in the Conjugate Gradient Theorem, in defining the descent directions. Even though we pick the next direction merely to be Q-orthogonal to a single previous direction, it actually turns out to be Q-orthogonal to all (other) directions. Details: Recall (see notes for a proof) that a set of nonzero pairwise `Q`-orthogonal vectors is linearly independent, and so can form a basis for its span. That idea appears throughout the development of the Conjugate Gradient Algorithm. We used this independence to express the vector `x^* - x₀` in terms of projections onto a `Q`-orthogonal basis `{d₀, ..., d_n-1}`. Here `x^` is the location of function `f`'s global optimum and `x₀` is the (arbitrary) starting point of the optimization process. Specifically: `x^ - x₀ = Σα_id_i`, with `α_i = <x^* - x₀, d_i> / <d_i, d_i> = d^T_iQ (x^* - x₀) / d^T_iQ d_i.` In an actual optimization, `x^` is unknown. We need to replace `x^` with some computable quantity. For the pure quadratic, we know that `x^` satisfies `Qx^=-b`. That means we can simplify the numerator of the projection coefficients `α_i` from `d^T_iQ(x^*-x₀)` to `-d^T_i(b+Qx₀)`. Observe that `b+Qx₀` is the gradient of `f` evaluated at the starting point `x₀`. In fact, in the formula for `α_i`, one can replace `b+Qx₀` with `b+Qx_i`, the gradient of `f` evaluated at the point `x_i`. Here `x_i = x_i-1 + α_i-1d_i-1`, starting from `x₀`. That idea is useful in generalizing the algorithm from pure quadratics to other functions. So: `α_i = - d^T_ig_i / d^T_iQ d_i`, with `g_i=b+Qx_i` being the gradient of `f` at `x_i`. (Although the proof of this latest formula for `α_i` is straightforward, it demonstrates a key purpose for using `Q`-orthogonality in the optimization algorithm, namely as a lever that turns certain local properties into global properties.) Observe how these equations let us view coordinate projections as steps in an optimization: A typical step is from `x_i` to `x_i+1`, by moving along direction `d_i` with a scale factor of `α_i`. Formally, we saw this in the Expanding Subspace Theorem, which states that each of the `α_i` may be computed via a line optimization, and that doing so actually optimizes `f` over an entire affine subspace spanned by all descent directions encountered thus far, not merely just over a line. This was an example of a local-to-global leveraging. (Another example occurs in choosing the descent directions. One computes `d_i+1 = -g_i+1 + β_id_i`, with `β_i` chosen so that `<d_i,d_i+1> = 0`. Although a local computation, one magically obtains `<d_i,d_j> = 0` for all distinct `i` and `j`.) Consequently, for a pure quadratic function, one finds the global optimum after `n` steps, with `n` the dimension of the space. For general functions, the Conjugate Gradient algorithm repeatedly performs packets of such `n`-step descents. Each descent step is computed by doing a one-dimensional line optimization. At the end of lecture, we briefly looked at an example in which the conjugate gradient algorithm converged faster than one might have expected based on the discussion above. The reason had to do with a symmetry in the Hessian of the function.
14	9.Oct	We discussed the first- and second-order necessary conditions for optimality in constrained optimization, using Lagrange Multipliers. We worked a simple example. Subsequently, we started our discussion of the Calculus of Variations. We derived the basic Euler-Lagrange Equation. As an example, we showed that the shortest curve between two points in the plane is a straight line. We considered a constrained version of this problem in which one seeks a shortest curve with a given area below the curve. The curve, when it exists, is now a circular arc. We solved this problem by extending the Calculus of Variations to include Lagrange multipliers. We further observed that the dual problem -- in which one seeks to maximize the area below a curve of fixed length -- gives rise to essentially the same Euler-Lagrange Equation as the original constrained problem, and thus has the same type of optimizing curve, when an optimum exists. In the context of Calculus of Variations, we observed that some cost functions do not have optimal solutions that are twice differentiable. In the constrained problem mentioned above this can happen when the endpoint and area (or length) conditions are incompatible. In that situation, one may be able to interpret the optimal curve as a circular arc with jump discontinuities at the endpoints.
15	21.Oct	We surveyed various generalizations of the so-called Simplest Problem in the Calculus of Variations that we had discussed last time. In particular, we considered optimization with free boundaries, surface integrals, multiple dependent variables, and higher order derivatives. We sketched examples for two of these generalizations, as follows: As an example of a problem with a free boundary, we worked through key parts of a brachistochrone example: finding the shape of a ski slope that minimizes horizontal traversal time for a skier whose motion is determined by gravity. We mentioned that when the cost integrand `F(x,y,y')` does not depend directly on `x`, it can be helpful to replace the Euler-Lagrange Equation with the following differential equation, called the Beltrami identity (see page 26 of the Calculus of Variations notes for a derivation): `y'F_y' - F = constant` We also considered the case of a free boundary point specified merely to lie on a curve given implicitly by an equation of the form `g(x,y)=0`. In that case the relevant endpoint condition becomes `F_y' - (g_yF)/(g_x + g_yy') = 0` (at the endpoint). The key steps in a derivation of this condition appear here. As an example of a problem involving a surface integral, we found the shape of a soap film hanging from a ring, in which the potential energy of stretching counteracts the potential energy of gravity. The film seeks an equilibrium shape at which the net potential energy is a minimum (or, more precisely, at which the potential energy is stationary, meaning differential perturbations in the soap film's shape do not change the potential energy). The basic technique is very similar to what we did originally when computing a cost as an integral over a one-dimensional interval (as for the shortest path problem), except now we compute a cost by integrating over a two-dimensional region. In order to obtain an appropriate Euler-Lagrange Equation, one needs a generalization of integration by parts to higher dimensions. That generalization is Stokes Theorem, which we mentioned but did not discuss in detail. The notes contain a description of a two-dimensional instance of Stokes Theorem known also as Green's Theorem (see pages 33 and 34).
16	23.Oct	Today, we began our discussion of Mechanics with a brief history of the Principle of Least Action. We mentioned generalized coordinates, generalized force, generalized velocity, and generalized momentum. (The phrase "generalized coordinate" is an older traditional term. Roboticists might simply speak of configuration parameters or state variables.) We introduced phase space, described in terms of (generalized) coordinates and momenta. (Here it is convenient to view velocities simply as time derivatives of coordinates, independent of momenta.) We also found it convenient to think in terms of two different coordinate systems for energy: One coordinate system has kinetic energy and potential energy as dimensions. The second coordinate system is obtained from the first by a 45 degree rotation. In this coordinate system, one dimension (called `L`) measures the difference between kinetic and potential energy while the second dimension (called `H`) measures the total energy in the system. The principle of least action says that a physical system moves so that variations in the integral ∫`Ldt` are zero, with variations being taken over system trajectories that are co-terminus in space and time. The different trajectories need not have the same energy nor do they need to conserve energy. Allowing such a multitude of variations means one may apply the Calculus of Variations to derive differential equations describing the system's behavior. There are two common formulations for deriving these differential equations, obtained by writing the Lagrangian cost function `L` in two different ways. In one formulation, one uses generalized coordinates and their time derivatives. In the other formulation, one uses a combination of generalized coordinates (and their time derivatives) and generalized momenta. Via these two formulations, we derived the equations for Lagrangian and Hamiltonian dynamics. We explored the meaning of Lagrangian and Hamiltonian dynamics in the 1D setting of a point mass hanging from a spring under the influence of gravity. Lagrangian Dynamics recapitulates Newton's `F = ma`, leading to a second-order differential equation that describes an undamped harmonic oscillator with a constant external force (gravity). The solution of that equation is a time-parameterized curve that traces out an ellipse in phase space. Hamiltonian Dynamics gives us two first-order differential equations. One equation expresses Newtonian dynamics as the time derivative of momentum, i.e., as `F = dp/dt`. The other equation provides a definition of momentum `p`. Plugging the second equation into the first produces the same second-order differential equation as did Lagrangian dynamics. Alternatively, we observed that Hamiltonian dynamics (for systems in which the Lagrangian has no explicit time-dependence) amounts to an assertion that the gradient of `H` is perpendicular to the system's motion in phase space, meaning `H` is constant. One may therefore view the Hamiltonian `H` directly as an algebraic equation for the solution ellipse obtained earlier from Lagrangian dynamics. Today's lecture material appears on pages 54, 55, and 57-63 of the Notes on the Calculus of Variations. We also discussed Euler's Theorem in lecture, which appears here.
17	28.Oct	Today, we worked out the Lagrangian dynamics of a two-link manipulator, based on the Principle of Least Action. We examined the meaning of some of the terms in the equations describing those dynamics. Here is a summary of the derivation.
18	30.Oct	Today we finished our discussion of the Principle of Least Action with a derivation of the wave equation and its solution using separation of variables. There are three interesting aspects to this process: (i) The Principle of Least Action can produce partial differential equations over spaces that contain both spatial and time coordinates, (ii) Some apparently continuous parameters can appear only in quantized form (in this case modeling spatial harmonics), (iii) solution of the differential equation produces a collection of orthonormal functions which one then uses to satisfy boundary conditions.
19	6.Nov	We started our discussion of Computational Geometry today. We spent about half of the lecture on a configuration space algorithm for planning shortest 2D translational motions of a polygonal robot moving among polygonal obstacles. (Our core step involved pairs of convex polygons. The algorithm generalizes by thinking of an arbitrary polygon as the union of convex polygons.) Several computational geometry and computer science tools are required to implement that algorithm. We began our discussion of those today: We discussed edge-edge intersections. We mentioned degeneracies. (In order to reduce worry about degeneracies, one sometimes assumes that the input to a geometric algorithm is in "general position", meaning a generic situation that will occur with probability 1, relative to some geometric probability. In practice, one may wish to randomize input, perturbing coordinates by tiny amounts.) The class came up with two algorithms for deciding whether a point `q` lies within a (possibly nonconvex) polygon. One algorithm entails summing (signed) changes in angle of the vector from the point `q` to vertices of the polygon, as one moves around the polygon. This sum will be `0` if `q` lies outside the polygon and `2π` if `q` lies within the polygon. Another algorithm entailed counting the number of intersections that a semi-infinite ray anchored at `q` makes with edges of the polygon. This number will be even if `q` lies outside the polygon and odd if `q` lies within the polygon. (Again, there are degeneracies which one needs to consider or ignore by a general position argument.) Next, we considered the point-in-polygon problem for the specific case of a convex polygon. For multiple-query (in which the convex polygon remains fixed while the query points vary), we observed that it is useful to first preprocess the polygon in order to represent the polygon as a collection of triangular wedges, meeting at an interior point. The natural sort of the wedges by angle around this interior point means that one can use binary search to find the key supporting line of the convex polygon against which to test a given point `q` (see "Additional Details" below). One may therefore perform each point-in-polygon test in `O(log(n))` time rather than `O(n)` time. (Here `n` is the number of vertices or edges in the polygon.) Additional Details: Fix some point interior to a convex polygon (perhaps the centroid of the polygon's vertices). Then each edge of the polygon defines a triangle when combined with this interior point. If we extend these triangular wedges out to infinity, then generally exactly one wedge will include a given query point `q` (for some degenerate cases, two touching wedges may contain `q`, but that presents no difficulties). The "key supporting line" is the line containing the edge of the polygon within that wedge. If `q` is on one side of the key supporting line, then one may conclude that `q` lies within the polygon; if `q` is on the other side of the line, then `q` lies outside the polygon; and if `q` is right on the line, then it is on the boundary of the polygon. Performing such a point versus line test entails plugging point `q`'s `(x,y)` coordinates into the expression `ax+by+c`, with `ax+by+c=0` being the relevant line equation. The sign of the value `ax+by+c` decides the point's location. For instance, if one chooses the vector `(a,b)` to point away from the polygon's interior, then nonzero positive sign means point `(x,y)` lies outside the polygon. Motivated by binary search, we discussed an efficient multiple-query algorithm for determining the location of points within regions of a given planar subdivision. We preprocessed the subdivision into horizontal "slabs", chosen so as to contain no vertices in their interiors. We could then use binary search on the `y`-coordinate of a query point `q` to determine the slab containing `q`. Given the slab, we could then use another binary search to find the region containing `q` within the slab, and thus the overall region in the plane. This second binary search again performed comparisons by plugging `q`'s coordinates into line equations (namely some of the lines within the relevant slab). We mentioned line-sweep as a tool for generating the slab decomposition described above. We mentioned the Euler characteristic for planar subdivisions.
20	11.Nov	Today, we first reviewed some basic facts about convex sets in arbitrary finite dimensions. We also introduced the ideas of an extreme point and a supporting hyperplane. (Here is a summary.) Then we discussed numerous algorithms for computing convex hull in two dimensions. Specifically: We developed an (inefficient) algorithm for computing the convex hull of a finite set of points in two-dimensional Euclidean space, based on the idea of finding extreme points. Given a finite set of points `S`, the algorithm first determines the extreme points of the convex hull of of `S` via triangle tests, then creates `S`'s convex hull by sorting these points about their centroid. The algorithm has time complexity `O(n⁴)`, with `n` the number of points in `S`. We then devised an efficient algorithm by essentially reversing the order of these steps. The algorithm sorts all the points of `S` about some interior point, then uses local convexity tests to retain only the extreme points. This efficient algorithm is called Graham's Scan. The algorithm has time complexity `O(nlog(n))`, which is asymptotically optimal. (Here is a description of some of the steps in the algorithm for a particular example.) Subsequently, we discussed Jarvis's March, which is an algorithm that constructs the convex hull of a finite set of 2D points by finding the boundary edges of the hull, much like wrapping a string around the set of points. Again, we first discussed a very inefficient process for finding such edges, then a more efficient version. We observed that computing the convex hull of finitely many 2D points is equivalent to sorting finitely many numbers, if one is permitted linear time work to convert data for one problem into data for the other problem. The optimal complexity of comparison-based sorting is `O(nlog(n))`, so this equivalence tells us that computing the convex hull of finitely many 2D points has an optimal complexity of `O(nlog(n))`, with `n` being the number of points. We briefly mentioned Mergehull, inspired by a similar algorithm for sorting numbers. The key step in Mergehull is a linear time algorithm for computing the convex hull of the union of two convex polygons. We also briefly mentioned Dynamic Hull, an algorithm that can efficiently create a convex hull from an existing convex polygon and a new point. The algorithm can perform such an update in `O(log(n))` time. (Here is a description of some of the steps in that algorithm for a particular example.)
21	13.Nov	We began our discussion of Markov chains today. We defined closed and irreducible subsets of states. A closed subset is a subset of states such that no state outside the subset is reachable via the given Markov chain transitions. An irreducible subset is a closed set that contains no proper nonempty closed subsets. These definitions therefore allow us to define corresponding closed and irreducible subchains. In an irreducible subchain every state is reachable from every other state (not necessarily with probability 1 if the chain is infinite). We defined the semantics of the stochastic matrix `P` associated with a Markov chain. We observed that `P` always has eigenvalue `1`. (In fact, the column vector consisting of `1s` is a right eigenvector for eigenvalue `1`.) For a finite chain, the multiplicity of eigenvalue `1` is equal to the number of recurrent classes in the Markov chain. (A recurrent class is an irreducible subchain in which every state is eventually reachable with probability 1 from every other state.) We classified states as periodic or aperiodic and as transient or persistent. For persistent states we further classified states into those with finite mean recurrence times and those with infinite mean recurrence times. (A persistent state with infinite recurrence time is called a null state. An aperiodic persistent state with finite mean recurrence time is called ergodic.) We stated that all states in an irreducible chain have the same type. We observed that in a finite chain not all states can be transient and that no persistent state can be a null state. A Markov chain that is irreducible and aperiodic has a stationary distribution, which we wrote as a row vector π. This vector is a left eigenvector of `P`, corresponding to eigenvalue `1`: π`=`π`P`. The mean recurrence time μ_i of any state in such a Markov chain i is the inverse of its steady-state probability π_i. In other words, μ_i = 1/π_i. For a finite but reducible Markov chain, there may be several dimensions of such left eigenvectors, one for each recurrent class. The left eigenvector associated with a recurrent class describes the stationary distribution for that recurrent class. The stochastic matrix `P` of such a Markov chain also has a right eigenvector (column vector) for each occurrence of the eigenvalue `1`, i.e., for each recurrent class. The eigenvector may be so chosen that its `i`th component is the probability of eventually entering the recurrent class from state `i`. Finally, very quickly, we considered a finite chain with two periodic recurrent classes. We observed that the eigenvalues of the stochastic matrix contain roots of unity, measuring the periodicities of the recurrent classes. (Here is a summary of some key definitions and properties.)
22	18.Nov	We applied our discussion of Markov Chains by a look at some gambling problems. First we looked at the Gambler's Ruin Problem. We computed ruin probabilities and expected game durations. We considered the consequences of halving stakes and of playing against an infinitely rich adversary. We discussed card shuffling and how perfect riffle shuffles create periodic subchains, without actually randomizing the cards. We observed that the stochastic matrix for any shuffling algorithm is doubly stochastic, so long as the shuffling is based simply on physical rearrangements. Consequently, the uniform distribution is a stationary distribution. If riffle shuffles have stochastic errors in them, then shuffling will converge to the uniform distribution in roughly 8 shuffles, for a 52 card deck. On the other hand, 8 perfect riffle shuffles will simply reproduce the cards in their initial order.
23	20.Nov	Today we considered some techniques from Differential Geometry for analyzing smooth curves in 3D Euclidean space. We defined the Frenet Frame Field, `[T,N,B]`, and derived the Frenet Formulas, focusing on unit-speed curves. The Frenet Formulas describe the derivatives of the vectors `T`, `N`, and `B` in terms of themselves, as one moves along the curve. The resulting matrix differential equation has an anti-symmetric structure, with two key parameters, the curvature and torsion of the curve. Curvature describes the extent to which a curve fails to be a straight line, while torsion describes the extent to which a curve with curvature is nonplanar. Two unit-speed curves with nonvanishing curvatures are congruent (meaning one can be superimposed on the other using a 3D rotation and translation, possibly with a reflection) precisely when they have matching curvatures and torsions (possibly with a sign switch), as functions of arclength. We used a helix as an example to illustrate the ideas.
24	25.Nov	We started the lecture by defining Covariant Derivatives in Euclidean Space. We defined Lie Brackets. We showed how these demonstrate the ability of a car to parallel park. Subsequently, we considered a variation of covariant derivatives for embedded manifolds, in particular for surfaces embedded in 3D Euclidean space. We defined the shape operator for a surface, and observed that, at each point of a surface, the shape operator is a linear operator on the tangent space at that point. We stated that the operator is symmetric. Consequently, one can represent the shape operator by a `2x2` symmetric matrix (at each point of the surface). The eigenvalues of this matrix are called the principal curvatures of the surface (at the given point) and the corresponding eigenvectors are the (orthogonal) principal directions of curvature. Very quickly near the end of lecture, we defined the Gaussian curvature as the determinant of the shape operator, then stated the Gauss-Bonnet theorem regarding the relationship between the total Gaussian Curvature of a surface and that surface's Euler characteristic. We derived the area of a sphere as a simple consequence of this theorem. Some of these ideas have been used in the past in Robotics for vision research.
25	2.Dec	We started our discussion of Algebraic Topology today, focusing on homology as a geometric/algebraic method for detecting holes of various dimensions in geometric objects. We computed the homology of the sphere, the torus, and the Klein bottle. We introduced homology groups and we discussed their connection to the Euler Characteristic. Here are some of the slides from today's lecture.
26	4.Dec	Today we considered some topics in Combinatorial Topology. We defined abstract simplicial complex, homotopy equivalence, deformation retraction, and elementary collapse. We considered the simplicial complex associated with a strongly connected directed graph. We gave examples of nondeterministic graphs and defined strategy complexes. We stated a theorem that a nondeterministic graph is fully controllable if and only if its strategy complex is homotopy equivalent to a sphere of dimension two less than the number of states in the graph. Here are some of the slides from today's lecture.

Back to the webpage for 16-811