CS294-3: Algorithms in the Real World (Guy Blelloch, Fall 97)


A.D. Wyner and J. Ziv. The sliding-window Lempel-Ziv algorithm is asymptotically optimal. Proceedings of the IEEE, 82(6) June 1994, pp. 872-7.

The sliding-window version of the Lempel-Ziv data-compression algorithm (sometimes called LZ '77) has been thrust into prominence recently. A version of this algorithm is used in the highly successful "Stacker" program for personal computers. If is also incorporated into Microsoft's new MS-DOS-6. Although other versions of the Lempel-Ziv algorithm are known to he optimal in the sense that they compress a data source to its entropy, optimality in this sense has never been demonstrated for this version. In this self-contained paper, we will describe the algorithm, and show that as the "window size," a quantity which is related to the memory and complexity of the procedure, goes to infinity, the compression rate approaches the source entropy. The proof is surprisingly general, applying to all finite-alphabet stationary ergodic sources.

E. Fiala and D. Greene. Data compression with finite windows. Communications of the ACM, 32(4), April 1989, pp. 490-505

Several methods are presented for adaptive, invertible data compression in the style of Lempel's and Ziv's (1977) first textual substitution proposal. For the first two methods, the article describes modifications of McCreight's (1976) suffix tree data structure that support cyclic maintenance of a window on the most recent source characters. A percolating update is used to keep node positions within the window, and the updating process is shown to have constant amortized cost. Other methods explore the tradeoffs between compression time, expansion time, data structure size, and amount of compression achieved. The article includes a graph-theoretic analysis of the compression penalty incurred by the codeword selection policy in comparison with an optimal policy, and it includes empirical studies of the performance of various adaptive compressors from the literature.

D. J. Zawack and G. L. Thompson. A dynamic space-time network flow model for city traffic congestion. Transportation Science, 21(3), Aug, 1987, pp. 153-62

A space-time network is developed that represents traffic flows over time for a capacitated road transportation system having one-way and two-way streets. Traffic signal lights are explicitly incorporated into the network structure so that total travel time is a piecewise linear convex function of the number of units traveling on the streets. Hence congestion effects are explicitly considered while maintaining the linear nature of the model. The first example presented has one source and one sink. There is a unimodal buildup of traffic at the source (say a factory) which enters the street network as quickly as its capacity permits and proceeds through the network, stopping at red lights when necessary, toward the sink (a residential area). Two efficient solution methods are used: a network flow solution suitable for a multiple-source single- destination network, and a shortest path solution suitable only for a single-source single-destination network. Computations show that the arrival rate has multiple peaks which are induced by the stop lights. The second example has multiple sources and one sink and gives similar results, except that the arrival rate has a single board peak which is due to the extreme symmetry of the constraints of the problem.

J. Abara. Applying integer linear programming to the fleet assignment problem. Interfaces, 19(4), July-Aug. 1989, pp. 20-8.

The fleet assignment problem is formulated and solved as an integer linear programming model, permitting assignment of aircraft from one, two or more fleets to a flight schedule simultaneously. The objective function can take a variety of forms including profit maximization, cost minimization, and the optimal utilization of a particular fleet type. Several departments at American Airlines use the model to assist in fleet planning and schedule development. It will become one of 10 key decision modules for the next generation scheduling system currently being developed by American Airlines Decision Technologies.

R. Subramanian, R. P. Scheff, Jr., J. D. Quillinan, D. S. Wiper, and R. E. Marsten. Coldstart: Fleet assignment at Delta Air Lines. Interfaces, 24(1), Jan.-Feb. 1994, pp. 104-20.

Delta Air Lines flies over 2,500 domestic flight legs every day, using about 450 aircraft from 10 different fleets. The fleet assignment problem is to match aircraft to flight legs so that seats are filled with paying passengers. Recent advances in mathematical programming algorithms and computer hardware make it possible to solve optimization problems of this scope for the first time. Delta is the first airline to solve to completion one of the largest and most difficult problems in this industry. Use of the Coldstart model is expected to save Delta Air Lines $300 million over the next three years.

G. L. Nemhauser. The age of optimization: solving large-scale real-world problems. Operations Research, 42(1), Jan.-Feb. 1994, pp. 5-13.

In the last decade, new advances in algorithms have been as important as the impressive advances in computer technology. Using the new interior-point algorithms and advanced implementations of simplex methods, we can now solve linear programs with more than one million variables and thousands of constraints. Preprocessing and polyhedral theory have yielded at least an order of magnitude improvement in branch-and-bound algorithms for solving mixed integer programs. Moreover, these algorithmic advances have been incorporated in commercially inexpensive software that is readily available, easily portable, and supported by a variety of systems that make it possible for unsophisticated users to input and check their models and obtain understandable outputs. This paper begins with some of the modern history of optimization, then surveys some recent developments (illustrating them with an application in the airline industry), and closes with some remarks about the future.

J. T. Kohl. The evolution of the KERBEROS authentication service. Proceedings of the Spring 1991 EurOpen Conference, May 1991, pp. 295-313.

The KERBEROS Authentication Service, developed at MIT, has been widely adopted by other organizations to eliminate the trusted- host problem in open networks. While a step up from traditional security in networked systems, KERBEROS version 4 is not sufficiently flexible for some environments. These inflexibilities and the remedies introduced with the KERBEROS version 5 are described.

A. Moffat. Implementing the PPM data compression scheme. IEEE Transactions on Communications, 38(11), Nov 1990, pp. 1917-21.

The prediction by partial matching (PPM) data compression algorithm developed by J. Cleary and I. Witten (1984) is capable of very high compression rates, encoding English text in as little as 2.2 b/character. It is shown that the estimates made by Cleary and Witten of the resources required to implement the scheme can be revised to allow for a tractable and useful implementation. In particular, a variant is described that encodes and decodes at over 4 kB/s on a small workstation and operates within a few hundred kilobytes of data space, but still obtains compression of about 2.4 b/character for English text.

E. L. Johnson and G. L. Nemhauser. Recent developments and future directions in mathematical programming. IBM Systems Journal, vol.31, no.1; 1992; pp. 79-93.

Advances in mathematical programming methodology have included: development of interior methods competing with the simplex method, improved simplex codes, vastly improved performance for mixed-integer programming using strong linear programming formulations, and a renewed interest in decomposition. In addition, use of vector and parallel processing has improved performance and influenced algorithmic developments. One sees the acceleration of better methods and improved codes moving together with faster, lower-cost, and more interesting hardware into a variety of application areas, thereby opening up new demands for greater function of optimization codes. These new functions might include, for example, more powerful nonlinear codes, decomposition techniques taking advantage of network and other problem-dependent structures, and mixed-integer capability in quadratic and general nonlinear problems. Stochastic scenario programming and multitime-period problems are becoming solvable and open up applications and algorithmic challenges. The IBM Optimization Subroutine Library has helped to accelerate these changes but will have to continue to change and expand in ways that are touched upon in this paper.

I. J. Lustig, R. E. Marsten and D. F. Shanno. Interior point methods for linear programming: computational state of the art. ORSA Journal on Computing, vol.6, no.1; Winter 1994; pp. 1-14.

A survey of the significant developments in the field of interior point methods for linear programming is presented, beginning with Karmarkar's projective algorithm and concentrating on the many variants that can be derived from logarithmic barrier methods. Full implementation details of the primal-dual predictor-corrector code OB1 are given, including preprocessing, matrix orderings, and matrix factorization techniques. A computational comparison of 0B1 with a state-of- the-art simplex code using eight large models is given. In addition, computational results are presented where OB1 is used to solve two very large models that have never been solved by any simplex code.

R. E. Bixby. Progress in linear programming. ORSA Journal on Computing, vol.6, no.1; Winter 1994; pp. 1-14.

There is little doubt that barrier methods are now indispensable tools in the solution of large-scale linear programming problems. However, it is our opinion that the results of Lustig, Marsten, and Shanno (hereafter LMS) somewhat overstate the performance of these methods relative to the simplex method. We present a slightly different view of progress in linear programming one in which barrier methods do not dominate in the solution of large-scale problems.

J. J. H. Forrest and J. A. Tomlin. Implementing the simplex method for the Optimization Subroutine Library. IBM Systems Journal, vol.31, no.1, 1992, pp. 11-25.

Describes the simplex algorithm and briefly discusses the interaction of the detailed implementation of the algorithm with changes in computer hardware. The authors give one example of the design changes needed to implement the method efficiently for the IBM 3090 vector architecture. For the later RISC System/6000 implementation, it was necessary to rethink this yet again. Finally they discuss the issue of robustness and the steps that were taken to give maximum reliability in the simplex algorithm in the IBM Optimization Subroutine Library

L. Greengard. The numerical solution of the N-body problem. Computers in Physics, vol.4, no.2, March-April 1990, pp. 142-52.

The article refers to the determination of all pairwise forces (or potentials) for a fixed configuration as the numerical N- body problem. In the last few years, a group of algorithms has been developed in the astrophysics community which have come to be known as tree codes or hierarchical codes. In addition in the field of fluid dynamics a scheme has been proposed for 2D calculations which uses asymptotic analysis. A closely related scheme is the fast multiple method (FMM). Before describing these algorithms in detail, the article illustrates the ubiquity of numerical N-body calculations by giving several examples of actual applications.

R. Anbil, R. Tanga, E. L. Johnson. A global approach to crew-pairing optimization. IBM Systems Journal, vol.31, no.1, 1992, pp. 71-8.

The problem addressed in this paper is crew-pairing optimization in airline flight planning: finding tours of duty (pairings) that are legal and cover every flight leg at the least cost. The legal rules and cost of a pairing are determined by complex Federal Aviation Agency and contractual requirements. In addition, the problem is made more difficult by the hub-and-spoke system used by airlines that multiplies the possible ways a pairing can link flight legs. The state-of- the-art crew-pairing TRIP system of American Airlines uses subproblem optimization and, as is true for other crew- scheduling systems, may not be able to improve a solution even though a better one exists. It reports on the methodology developed during a joint study by IBM and American Airlines Decision Technologies to use the IBM Optimization Subroutine Library in conjunction with TRIP to improve on crew-pairing solutions by taking a global approach. The resulting improvements have been a reduction of 5 to 11 percent in excess crew cost. Estimated total savings are five million dollars per year.

J. K. Salmon, M. S. Warren, G. S. Winckelmans. Fast parallel tree codes for gravitational and fluid dynamical N-body problems International Journal of Supercomputer Applications, vol.8, no.2; Summer 1994; pp. 129-42.

We discuss two physical systems from different disciplines that make use of the same algorithmic and mathematical structures as a way of reducing the number of operations necessary to complete a realistic simulation. In the gravitational N-body problem, the acceleration of an object is given by the familiar Newtonian laws of motion and gravitation. The computational load is reduced by treating groups of bodies as single multiple sources rather than as individual bodies. Both types of simulations are carried out on the Intel Touchstone Delta, a parallel MIMD computer with 512 processors.

L. Greengard. Fast algorithms for classical physics. Science, vol. 265, no. 5174, 12 Aug. 1994, pp. 909-14.

Some of the recently developed fast summation methods that have arisen in scientific computing are described. These methods require an amount of work proportional to N or N log N to evaluate all pairwise interactions in an ensemble of N particles. Traditional methods, by contrast, require an amount of work proportional to N/sup 2/. As a result, large-scale simulations can be carried out using only modest computer resources. In combination with supercomputers, it is possible to address questions that were previously out of reach. Problems from diffusion, gravitation, and wave propagation are considered.

J. A. Board, Z. S. Hakura, W. D. Elliot, and W. T. Rankin. Scalable variants of multipole-based algorithms for molecular dynamics applications Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, Feb. 1995, pp. 295-300.

Multipole-based algorithms allow for reduction in the effort required to solve the N-body problem of electrostatics or gravitation from O(N/sup 2/) to O(N log N) or even O(N) in the number of interacting bodies N. We consider serial and parallel implementations of variants of the algorithms of Barnes and Hut (1986), and of Greengard and Rokhlin (1987), as well as a hybrid algorithm. Coupled with other optimizations such as a Fourier-domain computation of multipole series translations, we find the hybrid algorithm to be most efficient over a wide range of accuracy and system sizes relevant to molecular dynamics. The algorithms have been demonstrated to scale to 64 processors, and we expect them to be valid out to at least 512 processors on large (1 million particle) problems

Guy Blelloch, blelloch@cs.cmu.edu.