Here's my list, more or less in bibtex format. (Note: my copy of Keane94 doesn't show what journal it was taken from.) -- Scott -------------------------------------------------------------------------- @article{Daniel76, author="J. W. Daniel", title="Splines and Efficiency in Dynamic Programming", journal="Journal of Mathematical Analysis and Applications", volume=54, pages="403--407", year=1976, note="Suppose you want to represent your (continuous-state space) value function as a linear combination of fixed functions. Splines can give you the same approximation accuracy as (state-space-wide) polynomials for the same storage requirements. However, since splines can be represented as sums of B-splines which only have non-zero values over small regions of the state space, it's more computationally efficient to use B-splines. Roughly speaking, B-splines are better by a factor of M^d, where d is the dimension of the state space and M is the order of the polynomial / grid size used for the splines."} @article{Johnson93, author="J. R. Stedinger, C. A. Shoemaker, Y. Li, and Jose Alberto Tejada-Guibert", title="Numerical Solution of Continuous-State Dyamic Programs Using Linear and Spline Interpolation", journal="Operations Research", volume=41, number=3, pages="484--500", year=1993, note="This paper argues for the use of cubic piecewise polynomials for interpolation rather than ``tensor product linear interpolants,'' better known to us as ``multilinear interpolation.'' The two main reasons why cubic polynomials are supposedly better: (1) better approximation accuracy lets you get away with fewer grid squares, and (2) because the resulting functions have second-degree continuity, you can use efficient Newton-type or quasi-Newton algorithms to solve the Bellman equation. Splines can be represented in ``pp-form,'' in which you keep one cubic polynomial per rectangular state-space volume, or in B-form, which is a sum of products of one-D B-splines. While B-splines are easier to calculate the coefficients for, they're not quite as efficient when you use them for interpolation, which is a more frequent operation. The paper also mentions cubic Hermite polynomials, which, like pp-form splines, are defined only within the confines of a single box. Differences: when using Hermite polynomials, you have to know the gradients at all the corners as well as the values; also, you don't get second-degree continuity. (pp-form splines are required to have matching first and second derivatives at the box boundaries.) Main disadvantage is supposedly that having to calculate the gradients ``requires a more elaborate computer code.'' Universal downside with these cubics: they all take O(4^d) time for each interpolation. This is a factor of O(2^d) worse than multilinear interpolation. ``Quasi-Newton'': no, the paper doesn't explain what this is. According to NRIC, however, it's similar to conjugate gradient methods in some respects. Potential snag: this paper isn't terribly explicit about when this method will and will not work. The fact that it mentions a ``concavity checker'' to ``identify situations likely to pose a problem'' seems to suggest that this method doesn't necessarily work well on arbitrary domains. Experimental results with a 4-reservoir water management problem suggest that using splines with quasi-Newton twiddling can result in speedup factors of about 250 over using multilinear interpolation (with a polytope routine, I believe). Most of this speedup is supposedly from the ``increased accuracy'' (and therefore having to deal with fewer grid points for the same accuracy), but a lot of it is supposedly from using the quasi-Newton method as well. This speedup was for a nice, smooth domain, however. With a slightly more complicated domain in which a quadratic penalty function is added, the speedup goes down to a factor of 40, and the accuracy of all the methods gets cut in half."} @article{Schweitzer85, author="P. J. Schweitzer and A. Seidmann", title="Generalized Polynomial Approximations in Markovian Decision Processes", journal="Journal of Mathematical Analysis and Applications", volume=110, pages="568--582", year=1985, note="Shows how to use linear programming, policy iteration, and gradient descent on the squared residual Bellman error to solve DP problems when using a linear combination of M fixed functions to represent the value function. Discretized continuous states; discusses infinite-horizon discounted and undiscounted cases, using average gain for the undiscounted case (with ``unichain'' assumption). Does not discuss relative merits of the algorithms. Is rather hand-wavy, and assumes a fair amount of background knowledge."} @article{Keane94, author="M. P. Keane and K. I. Wolpin," title="The Solution and Estimation of Discrete Choice Dynamic Programming Models By Simulation and Interpolation: Monte Carlo Evidence", journal="???", volume="???", pages="648--672", year=1994, note="Suppose you have a finite-system, finite-state problem in which once you enter a state, you observe some previously unknown random variables e1...eK that affect the immediate rewards you get for taking each of actions 1...K in the current state. In order to solve for the optimal sequence of actions in this sort of scenario, you wind up having to calculate K-dimensional integrals over (maximum of possible rewards) * the marginal probability of (e1...eK). Computationally expensive! This paper suggests (1) just averaging over a randomly sampled set of (e1...eK)s to get approximations for some of the states, and then using regression to relate even simpler-to-calculate quantities like Q-values to these approximations, so that approximations for other states can be calculated by just interpolating. The particular functions chosen for this regression are seriously ad-hoc. Test domain: people choose between working, going to school or lounging around when faced with rewards that don't become known for sure until right before the relevant choices."} @article{Gal87, author="S. Gal", title="The Parameter Iteration Method in Dynamic Programming", journal="Management Science", volume=35, number=6, pages="675--684", year=87, note="For finite-horizon, optionally discounted problems with continuous state spaces. General idea: suppose you have a parametric approximator H(x, A_t) to use for your value function, where x is the state variable and A_t is the set of parameters at time t. We figure out our time-dependent policies as follows: Suppose we start off with a ``reasonable policy.'' We simulate J trajectories using the current policy P. Now, letting t go backwards from T (the final time) to 1, find a set of parameters A_t such that the error between H(xt, A_t) and max(cost + discount * expected fut. rew.) is small over the points that were sampled during the J trajectories. Use the time-dependent value function parameters A_t to determine a new time-dependent policy. Repeat. No, the paper doesn't give any general algorithms for fitting A_t. Note that this is not really ``policy iteration'' -- here, our intermediate policies are just being used to tell us where we want to put our data points. (If there weren't any function approximation involved, and we had a manageably small number of discrete states, there wouldn't be any ``iterations''; we'd just solve for the values backwards from t=T and be done with it, and we wouldn't need any ``reasonable policy'' to start with.) Test domain: parts replacement. Assume we have a bunch of items, each of which can be in K states, and which when left to their own devices will change states probabilistically. We want to decide which items to ``replace'', i.e. put back in state 1, given that there are rewards/costs associated with having an item in state k, replacing an item, etc. The paper mumbles a bit about the special case in which the value function is roughly linear in the number of items in each state."} @article{Luus93, author="R. Luus", title="Application of Dynamic Programming to Differetial-Algebraic Process Systems", journal="Computers & Chemical Engineering", volume=17, number=4, year=1993, pages="373--377", note="Test domain: a fed-batch reactor with 4 continuous state variables and one continuous control (the feed rate). Paper apparently uses some sort of policy iteration on discretized versions of the problem (for a few different discretization resolutions). (The algorithm actually used is, alas, described elsewhere, but it doesn't seem like it's anything special.) Constraint on one of the state variables is finessed via adding ``penalty factor'' to the final reward which penalizes for violated constraints."} @article{Lamond95, author="B. F. Lamond, S. L. Monroe, and M. J. Sobel", title="A Reservoir Hydroelectric System: Exactly and Approximately Optimal Policies", journal="European Journal of Operational Research", volume=81, year=1995, pages="535--542", note="A single-reservoir water management problem. Modelled as discrete-time, discrete-state, discrete-action, finite-horizon. Makes a lot of simplifying assumptions -- e.g., assumes that the hydroelectric generators' efficiency is independent of the amount of water in the reservoir; completely running of water doesn't incur any massive penalties; etc. With these drastic simplifications, it is shown that the optimal policy can be determined and expressed in terms of two or three easily calculated parameters. The paper then discusses finding a two-piece piecewise-linear approximation of the value function for the infinite-horizon case by using linear programming. It's not clear to me that any of these hacks are applicable to any domains we're going to care about."} @article{Semmler95, author="W. Semmler", title="Solving Nonlinear Dynamic Models by Iterative Dynamic Programming", journal="Computational Economics", volume=8, pages="127-154", year=1995, note="Algorithmically, this paper has little new to offer -- it basically just uses value iteration on a grid. There are three different domains which might be interesting to use: Optimal extraction of interacting resources: two continuous state variables and one continuous control variable. (There's something stuck in one of the equations that looks like a second control variable, but supposedly isn't; the paper's exposition and notation are quite abominable in places.) Basically, you're trying to maximize profit by harvesting predators in a system that evolves according to classic predator / prey dynamics. The state variables are the populations of the predator and prey, and the control variable changes the rate at which the predator is harvested (and at which the prey is also harvested; perhaps they're using nuclear bombs for their harvesting). Industry regulation by an optimal tax rate: two or three continuous state variables -- one for the amount left of some resource; one for the effort spent by the extractive industry; and, optionally one for debt of firms in the industry. One continuous control variable: the tax rate on the goods produced. The industry effort spent is NOT a control variable here; we assume that, left to its own devices, the industry will blindly gorge itself and get bigger and bigger until there's no resources left, and then it will take a while for the industry to reduce its extraction expenditures. To be maximized: total income of industry (before taxes). Optimal Growth: the stochastic case is treated as a discrete-time problem with three state variables: capital, a stock of designs/knowledge, and another variable that does a random walk and affects the rate at which the other variables change. Two control variables: the flow of consumption goods, and the fraction of human resources to allocate to R&D instead of immediate production. Beware: the description of this domain is written rather poorly. The paper makes claims that the optimal behavior for discount factors greater than 0 tends to exhibit cycles similar to that produced when the discount factor is 0 (i.e., when the system behaves greedily). As far as I've seen, it does not bother to show what this really buys you or what it means quantitatively. (The latter is presumably explained in one of the papers referenced.)"} @article{Storen95, author="Sverre Storen and Terje Hertzberg", title="The Sequential Linear Quadratic Programming Algorithm for Solving Dynamic Optimization Problems -- a Review" journal="Computer & Chemical Engineering", volume="19, Suppl.", pages="S495--S500", year=1995, note="This paper is clearly a ``review'' rather than an introduction; if you're not already familiar with optimal control theory and other nonlinear optimization techniques, this paper isn't likely to help. The paper presents a method in which the problem is iteratively approximated as a linear-quadratic model to which optimality conditions are applied; a new solution is found based on these conditions, and then a new approximate model is fitted around the new solution. Unfortunately, it sheds rather little light on how the uninitiated might go about doing so."} @article{Pantoja91, author="J. F. A. De O. Pantoja and D. Q. Mayne", title="Sequential Quadratic Programming Algorithm for Discrete Optimal Control Problems with Control Inequality Constraints", journal="International Journal of Control", volume=53, number=4, pages="823--836", year=1991, note="Conventional sequential quadratic programming algorithms require a solution of quadratic programs of dimension N*m during every iteration, where N is the number of time steps and m is the number of possible control values. This paper presents an algorithm to solve the same problem but by solving N quadratic sub-problems of size m instead. Not for the faint of heart."} @article{Cuthrell87, author="J. E. Cuthrell and L. T. Biegler", title="Simultaneous Optimization and Solution Methods for Batch Reactor Control Profiles", journal="Tech Report: EDRC-6-40-87", note="Uses successive quadratic programming and orthogonal collocation on finite elements to solve fed-batch problems. Presumes familiarity with SQR, orthogonal collocation, Galerkin's method, and a zoo of other techniques and concepts for nonlinear optimization."} (Message schneider:190) Return-Path: Date: Thu, 11 Jul 96 14:57:29 EDT From: Jeff.Schneider@J.GP.CS.CMU.EDU To: awm@CS.CMU.EDU, cga@CC.GATECH.EDU, dilello@CS.CMU.EDU, gboone@CC.GATECH.EDU, gboone@CS.CMU.EDU, schneide@CS.CMU.EDU, scottd@CS.CMU.EDU Subject: Re: Paper reading Here's my list in bibtex format. It exists in ~schneide/l/doc/dp.bib. Jeff. ------------------------------------------------------------------------ @article{BenAri86, author="Y. Ben-Ari and S. Gal", title="Optimal Replacement Policy for Multicomponent Systems: An Application to a Dairy Herd", journal="European Journal of Operations Research", volume=23, pages="213--221", year=1986, note="Stochastic discrete state for fixed number of time steps. Basic technique is to use a parametric function to represent the value function and fit its parameters for each time step recursively working backward. In their case, the parametric function is linear. An important issue is what states should be used to do the fits. They build a model of the system that they can simulate. This allows them to iterate back and forth between simulate the system under the current policy to generate data and use the data to update the policy by refitting the value function. The application is somewhat interesting. Assume you have a heard and you are limited by the total number of animals you can keep. This includes all the cows and calves. At each time step, each animal falls into a certain age/production/weight class and moves stochastically (except for age presumably) between them. Death is also a state that is reached with some probability, as is birth of a new member. At each time step, you must decide which, if any, animals should be sold."} @article{Bean87, author="J. Bean and J. Birge and R. Smith", title="Aggregation in Dynamic Programming", journal="Operations Research", volume=35, number=2, year=1987, note="State aggregation in deterministic discrete problems on DAGs. Their algorithm combines the "solve the aggregated problem" and the "disaggregate the solution" phases. They show error bounds from the aggregation. No insight into what states to aggregate though."} @article{Aldhah91, author="R. Aldhaheri and H. Khalil", title="Aggregation of the Policy Iteration Method for Nearly Completely Decomposable Markov Chains", journal="IEEE Transactions on Automatic Control", volume=36, number=2, year=1991, note="Stochastic discrete state, infinite horizon. Assume the transition matrix is almost block diagonal (all the elements outside the blocks are near zero). Since the problem is infinite time horizon, the paper concentrates on determining the final steady-state probability distribution. Most of the paper focuses on methods of solving for that without any control. At the end, it discusses policy iteration and how solution of the no-control problem helps it. Again, an early disclaimer on the topic of how to aggregate."} @article{Dadebo95, author="S. Dadebo and K. Mcauley", title="Dynamic Optimization of Constrained Chemical Engineering Problems using Dynamic Programming" journal="Computers Chemical Engineering", volume=19, number=5, pages="513--525", year=1995, note="Deterministic continuous state fixed horizon with terminal state constraint. This paper does the usual discretize the state space method with an interesting exception in the way the state space is discretized. Start with a nominal trajectory (and in all future iterations use the optinal trajectory from the last iteration). Generate a grid over the state space moving forward through time by starting at a state along the given nominal trajectory. Then integrate forward in time using the nominal control and some number of controls within a region of the nominal control. Record the states that result as the grid points at the next time step. After a full forward pass along the nominal trajectory to generate a grid, then do a backward DP pass to generate a new nominal/optimal trajectory and repeat. At each iteration of the algorithm, reduce the region about the nominal control used to generate the state space gridding. This automatically starts with a coarse grid covering the entire state space and changes to a fine grid about the optimal. Secondly, the paper gives four example problems specified as differential equations. The last has six state variables and two control variables. Some of the problems are annotated with a publication history of the best solutions to the problems so far and the amount of computation time required to obtain them."} @article{Rosen89, author="O. Rosen and R. Luus", title="Sensitivity of Optimal Control to Final State Specification by a Combined Continuation and Nonlinear Programming Approach", journal="Chemical Engineering Science", volume=44, number=11, pages="2527--2534", year=1989, note="Deterministic continuous state with terminal state constraint. Continuation is the idea that you have some parameter to your system that you can slowly vary to change the system from an easy to solve approximation to the real problem. Then you solve the approximate/easy system first and use the trajectory from that to initialize the solution of successfully harder problems until you've solved the one you're originally interested in. Its not clear how this particular thing plays into the rest of the paper. The DP problem is solved through Non-linear Programming. At that point, the observation is made that the NLP solution also gives you the values of the Lagrange multipliers on each constraint. The magnitude of the multiplier is related to the derivative of the cost function at that constraint. So you can look at the results and easily determine whether relaxation of any of the constraints would significantly reduce the total cost in the solution. For some of their nonlinear examples, the answer is yes."} @article{Yakowi82, author="S. Yakowitz", title="Dynamic Programming Applications in Water Resources", journal="Water Resources Research", volume=18, number=4, pages="673--696", year=1982, note="The title says it all. Its a big review of various water resource management problems, and of DP solutions"} @article{Li90, author="C. Li and R. Yan and J. Zhou", title="Stochastic Optimization of Interconnected Multireservoir Power Systems", journal="IEEE Transactions on Power Systems", volume=5, number=4, year=1990, note="Stochastic continuous state with constraints. The problem is to schedule the power production from a set of interconnected hydroelectric and thermal generation facilites. The interconnection is only in the fact that the sum of power generated must equal demand. Their solution is to make use of this and solve the smaller problems of how to run each generation facility independently. Then the solutions are combined and checked for global consistency (ie was the right amount of power generated?). If not, the global aspects of the solution (amount of power generated) are used to modify the coefficients of the smaller problems and they are solved again and this process iterates. The paper is interesting in that there are very few examples of stochastic continuous state problems with any significant number of variables being solved, but the solution is specific to these types of {\em barely coupled} problems."} @article{Gal79, author="S. Gal", title="Optimal Management of a Multireservoir Water Suppy System", journal="Water Resources Research", volume=15, number=4, year=1979, note="Stochastic continuous state. Their example has 3 state variables and one control. They approximate the value function at each time step with a quadratic. Iteratively, they compute the value function, compute a policy, generate M monte carlo trajectories using the policy (where M is the number of coefficients in their quadratic model), then use the M trajectories to refit the value function approximations."} Subject: Re: Paper reading Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Mine is below. I put it into a bibtex format to facilitate citation. It's easy to do in emacs: just type M-x bibtex-mode and look at the 'describe mode' command in the help menu. Otherwise, just cut and paste. This file is also in /afs/cs.cmu.edu/user/gboone/notes/papers.txt :) Gary ----------------------- Gary N. Boone gboone@cc.gatech.edu Georgia Tech http://www.cc.gatech.edu/grads/b/Gary.N.Boone/Gary.N.Boone.html @InProceedings{Hanaoka, author = "T. Hanaoka", title = "A Unifying Approach by the Combined Algorithm for Optimal Control Problems of Flight Systems", editor = "B. Liu and J. M. Blosseville", pages = "675--680", booktitle = "Transportation Systems: Theory and Application of Advanced Technology", year = 1994, organization = "International Federation of Automatic Control", publisher = "Pergamon", month = "August", note = " Descretizes continuous problem so that a shortest path graph algorithm can provide minimum-time trajectory. The state space is divided into 'blocks,' each of which has a representative point that defines the minimum cost to arrive at the block. Trajectories which enter a block with larger costs are rejected. How the block extents are determined is not specified. A branch and bound algorithm is used to search the graph. Total energy is used as the lower bound in the branch and bound algorithm. " } @Article{Alekseev, author = "O. G. Alekseev and I. F. Volodos", title = "Combined Use of Dynamic Programming and Branch-and-Bound Methods in Discrete-Programming Problems", journal = "Automation and Remote Control", year = 1976, volume = 37, number = 1, pages = "557--565", note = "[Translated from avtomatika i Telmekhanika, No. 4, pp. 92--100, April 1976.] Applies branch and bound to DP by transforming the problem and bounding the costs. The set of admissible values can be represented by a nonincreasing sequence from which 'dominated' terms are removed. Then steepest descent is used to determine a bound by extending the slope calculated between adjacent values. The approximation given by removing 'dominated' terms gives the convex hull of the original set. Therefore, the slopes are nonincreasing, and the subsequent value must be over the value given by extending the slope, providing a bound. The paper shows how the processes of approximating and determining bounds can be combined and extended to multidimensional problems." } @Article{D'Eleutrio, author = "G. M. T. D'Eleutrio and C. J, Damaren", title = "The Relationship Between Recursive Multibody Dynamics and Discrete-Time Optimal Control", journal = "IEEE Transactions on Robotics and Control", year = 1991, volume = 7, number = 6, pages = "743--749", month = "dec", note = "Develops a Newton-Euler-based recursive algorithm for forward dynamics of n-link rigid body chains. An analogy is made between this formulation and a discrete-time optimal control problem of the form Sum (1/2 x'Mx + x'h - u't) subject to the linear state equation x_{k+1} = Ax + Bu. Note that this is not exactly a linear quadratic problem because the cost is linear in u. By introducing the Lagrange multiplier, the matrix Riccati equation is derived. Now note the analogy to recursive dynamics: The Riccati equation is solved backwards in time, then the optimal control is calculated forward in time using the state equation. The recursive dynamics equations solve the velocities and accelerations forward from the base, then the joint forces and torques backward from the end effector. A set of substitutions make the analogy explicit. Thus, rigid body dynamics can be viewed as the minimization of Sum (1/2 a'Ma + f'a + f_c'Jv_dot) subject to the kinematic motion constraints" } @Article{Luus, author = "R. Luus and S. Smith", title = "Application of Dynamic Programming to High-Dimensional Systems Described by Difference Equations", OPTcrossref = "", OPTkey = "", OPTjournal = "Chemical Engineering Technology", OPTyear = "1991", OPTvolume = "14", OPTnumber = "", OPTpages = "122--126", OPTmonth = "", OPTnote = " In this somewhat confused paper, Luus describes a technique which aids the application of DP in finding optimal policies for continuous, discrete-time systems. He gradually reduces the grid resolution while pruning the search to only accessible grid points. However, the algorithm described in the paper does not change the state grid, which is a fixed parameter set in step one. He observes that more ``stages,'' ie longer trajectories, result in greater computation. This is obvious; we know that the number of trajectories for P step trajectory with n possible actions at each step is n^P. He suggests that larger action descretizations (fewer actions) lead to longer trajectories. This is true if the goal is easy to find. To see this, consider navigating to the bottom of a large basin. If I cannot turn to directly orient toward the minima, I will require more steps to reach it. However, it is not generally true because many problems may not be solvable with fewer action descretizations. The algorithm reduces the action descretization after each iteration, which he calls 'domain contraction.' The algorithm carries out a P-step DP optimization, then reduces the number of allowed actions by a factor epsilon, then repeats. However, the algorithm centers the new descretization at the best previous value. Thus, the new searches are unlikely to improve, although they should converge faster as epsilon is increased. (The algorithm seeks a trajectory and not a policy.) These observations are confirmed by the comments at the top of page 125. OPTannote = "" } @InCollection{Pierson, author = "B. Pierson", title = "Sequential Quadratic Programming and Its Use in Optimal Control Model Comparisons", booktitle = "Optimal Control Theory and Economic Analysis 3", publisher = "Elsevier", year = 1988, editor = "G. Feichtinger", pages = "175--193", address = "North-Holland", note = "Sequential Quadratic Programming obtains an approximate solution to discrete optimal control problems by solving sequence of related quadratic programming problems. A quadratic programming problem has a quadratic objective function and linear constraints. The paper discusses the relationship of SQP with the standard Newton method and provides several insufficiently detailed examples." } @Article{Rockafellar, author = "R. T. Rockafellar", title = "Linear-Quadratic Programming and Optimal Control", journal = "SIAM Journal on Control and Optimisation", year = 1987, volume = 25, number = 3, pages = "781--814", month = "may", note = "This paper generalizes linear and quadratic programming problems to include their dual problems, related problems whose maximization provides the solution to the original minimization. Explicit representation of the dual allows additional constraints to be utilized. Each programming iteration then becomes the solution of a saddle point, the intersection of the dual problems. This paper is mathematically rigorous, impractical, and not insightful." } @Article{Judd, author = "K. Judd", title = "Projection Methods for Solving Aggregate Growth Models", journal = "Journal of Economic Theory", year = 1992, volume = 58, pages = "410--452", note = "Projection methods define a function whose zero is the solution of the optimization. Specifically, they choose a set of bases functions and a distance metric: the coefficients are a point which the algorithm chooses so that the distance of the weighted bases to the true function is zero. It is called projection because the bases are projected onto the zero of the function. There are several issues to be considered in implementing this approach, including choice of bases, evaluation conditions, and solution finding method. Galerkin's method is presented and demonstrated on several example problems from economics." }