Journal of Artificial Intelligence Research 4 (1996) 365-396 Submitted 4/95; published 5/96 E 1996 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved. Adaptive ProblemSolving for LargeScale Scheduling Problems: A Case Study Jonathan Gratch University of Southern California, Information Sciences Institute 4676 Admiralty Way, Marina del Rey, CA 90292, USA Steve Chien Jet Propulsion Laboratory, California Institute of Technology 4800 Oak Grove Drive, M/S 525-3660, Pasadena, CA, 91109-8099 GRATCH@ISI.EDU STEVE.CHIEN@JPL.NASA.GOV Abstract Although most scheduling problems are NPhard, domain specific techniques perform well in practice but are quite expensive to construct. In adaptive problemsolving, domain specific knowledge is acquired automatically for a general problem solver with a flexible control architecture. In this approach, a learning system explores a space of possible heuristic methods for one wellsuited to the eccentricities of the given domain and problem distribution. In this article, we discuss an application of the approach to scheduling satellite communications. Using problem distributions based on actual mission requirements, our approach identifies strategies that not only decrease the amount of CPU time required to produce schedules, but also increase the percentage of problems that are solvable within computational resource limitations. Introduction With the maturation of automated problemsolving research has come grudging abandonment of the search for the" domainindependent problem solver. General problemsolving tasks like planning and scheduling are provably intractable. Although heuristic methods are effective in many practical situations, an ever growing body of work demonstrates the narrowness of specific heuristic strategies (e.g., Baker, 1994, Frost & Dechter, 1994, Kambhampati, Knoblock & Yang, 1995, Stone, Veloso & Blythe, 1994, Yang & Murray, 1994). Studies repeatedly show that a strategy that excels on one task can perform abysmally on others. These negative results do not entirely discredit domainindependent approaches, but suggest that considerable effort and expertise is required to find an acceptable combination of heuristic methods, a conjecture that is generally by published accounts of realworld implementations (e.g., Wilkins, 1988). The specificity of heuristic methods is especially troubling when we consider that problemsolving tasks frequently change over time. Thus, a heuristic problem solver may require expensive tuneups" as the character of the application changes. Adaptive problem solving is a general method for reducing the cost of developing and maintain ing effective heuristic problem solvers. Rather than forcing a developer to choose a specific heuristic strategy, an adaptive problem solver adjusts itself to the idiosyncrasies of an application. This can be seen as a natural extension of the principle of least commitment (Sacerdoti, 1977). When solving a problem, one should not commit to a particular solution path until one has information to distinguish that path from the alternatives. Likewise, when faced with an entire distribution of problems, it makes sense to avoid committing to a particular heuristic strategy until one can make an informed decision on which strategy performs better on the distribution. An adaptive problem solver embodies a space of heuristic methods, and only settles on a particular combination of these methods after a period of adaptation, during which the system automatically acquires information about the particu lar distribution of problems associated with the intended application. In previous articles, Gratch and DeJong have presented a formal characterization of adaptive problem solving and developed a general method for transforming a standard problem solver into an adaptive one (Gratch & DeJong, 1992, Gratch & DeJong, 1996). The primary purpose of this article is twofold: to illustrate the efficacy of learning approaches for solving realworld problem solving tasks, and to build empirical support for the the specific learning approach we advocate. After re viewing the basic method, we describe its application to the development of a largescale scheduling system for the National Aeronautics and Space Administration (NASA). We applied the adaptive problem solving approach to a scheduling system developed by a separate research group, and with out knowledge of our adaptive techniques. The scheduler included an expertcrafted scheduling strategy to achieve efficient scheduling performance. By automatically adapting this scheduling sys tem to the distribution of scheduling problems, the adaptive approach resulted in a significant im provement in scheduling performance over an expert strategy: the best adaptation found by machine learning exhibited a seventy percent improvement in scheduling performance (the average learned strategy resulted in a fifty percent improvement). .Adaptive Problem Solving An adaptive problem solver defers the selection of a heuristic strategy until some information can be gathered about their performance over the specific distribution of tasks. The need for such an approach is predicated on the claim that it is difficult to identify an effective heuristic strategy a priori. While this claim is by no means proven, there is considerable evidence that, at least for the class of heuristics that have been proposed till now, no one collection of heuristic methods will suffice. For example, Kambhampati, Knoblock, and Yang (1995) illustrate how planning heuristics embody design tradeoffs -- heuristics that reduce the size of search space typically increase the cost at each node, and vice versa -- and that the desired tradeoff varies with different domains. Similar observations have been made in the context of constraint satisfaction problems (Baker, 1994, Frost & Dechter, 1994). This inherent difficulty in recognizing the worth (or lack of worth) of control knowledge has been termed the utility problem (Minton, 1988) and has been studied extensively in the machine learning community (Gratch & DeJong, 1992, Greiner & Jurisca, 1992, Holder, 1992, Subramanian & Hunter, 1992). In our case the utility problem is determining the worth of a heuristic strategy for specific problem distribution. Formulation of Adaptive problem solving Before discussing approaches to adaptive problem solving, we formally state the common definition of the task (as proposed by Gratch & DeJong, 1992, Greiner & Jurisca, 1992, Laird, 1992, Subramanian & Hunter, 1992). Adaptive problem solving requires a flexible problem solver, meaning the problem solver possesses control decisions that may be resolved in alternative ways. Given a flexible problem solver, PS, with several control points, CP1...CPn (where each control point CPi corresponds to a particular control decision), and a set of alternative heuristic methods for each control point, {Mi,1...Mi,k ,}, .Note that a method may consist of smaller elements so that a method may be a set of control rules or a combination of heuristics. a control strategy defines a specific method for every control point (e.g., STRAT = M1,3,M 2,6,M3,1,...>). A control strategy determines the overall behavior of the problem solver. Let PSSTRAT be the problem solver operating under a particular control strategy. Adaptive Problem Solving The quality of a problem solving strategy is defined in terms of the decision theoretic notion of expected utility. Let U(PSSTRAT , d), be a real valued utility function that is a measure of the goodness of the behavior of the problem solver on a specific problem d . More generally, expected utility can be defined formally over a distribution of problems D: The goal of adaptive problem solving can be expressed as: given a problem distribution D, find some control strategy in the space of possible strategies that maximizes the expected utility of the problem solver. For example, in the PRODIGY planning system (Minton, 1988), control points include: how to select an operator to use to achieve the goal; how to select variable bindings to instantiate the operator; etc. A method for the operator choice control point might be a set of control rules to determine which operators to use to achieve various goals. A strategy for PRODIGY would be a set of control rules and default methods for every control point (e.g., one for operator choice, one for binding choice, etc.). Utility might be defined as a function of the time to construct a plan for a given planning problem. Approaches to Adaptive Problem Solving Three potentially complementary approaches to adaptive problem solving have been discussed in the literature. The first, what we call a syntactic approach, is to preprocess a problemsolving domain into a more efficient form, based solely on the domain's syntactic structure. For example, Etzioni's STATIC system analyzes a portion of a planing domain's deductive closure to conjecture a set of search control heuristics (Etzioni, 1990). Dechter and Pearl describe a class of constraint satisfaction techniques that preprocess a general class of problems into a more efficient form (Dechter & Pearl, 1987). More recent work has focused on recognizing those structural properties that influence the effectiveness of different heuristic methods (Frost & Dechter, 1994, Kambhampati, Knoblock & Yang. 1995, Stone, Veloso & Blythe, 1994). The goal of this research is to provide a problem solver with what is essentially a big lookup table, specifying which heuristic strategy to use based on some easily recognizable syntactic features of a domain. While this later approach seems promising, work in this area is still preliminary and has focused primarily on artificial applications. The disadvantage of purely syntactic techniques is that that they ignore a potentially important source of information, the distribution of problems. Furthermore, current syntactic approaches to this problem are specific to a particular, often unarticulated, utility function (usually problems olving cost). For example, allowing the utility function to be a user specified parameter would require a significant and problematic extension of these methods. The second approach, which we call a generative approach, is to generate custommade heuris tics in response to careful, automatic, analysis of past problemsolving attempts. Generative ap proaches consider not only the structure of the domain, but also structures that arise from the problem solver interacting with specific problems from the domain. This approach is exemplified by SOAR (Laird, Rosenbloom & Newell, 1986) and PRODIGY/EBL (Minton, 1988). These techniques analyze past problemsolving traces and conjectures heuristic control rules in response to specific problem solving inefficiencies. Such approaches can effectively exploit the idiosyncratic structure of a do main through this careful analysis. The limitation of such approaches is that they have typically fo cused on generating heuristics in response to particular problems and have not well addressed the issue of adapting to a distribution of problems While generative approaches can be trained on a problem distribution, learning typically occurs only within the context of a single problem. These systems will often learn knowledge which is helpful in a particular problem but decreases utility overall, necessitating the use of utility analysis techniques. Furthermore, as with the syntactic approaches, thus far they have been directed towards a specific utility function. The final approach we call the statistical approach. These techniques explicitly reason about performance of different heuristic strategies across the distribution of problems. These are generally statistical generateandtest approaches that estimated the average performance of different heuris tics from a random set of training examples, and explore an explicit space of heuristics with greedy search techniques. Examples of such systems are COMPOSER (Gratch & DeJong, 1992), PALO (Grein er & Jurisca, 1992), and the statistical component of MULTI TAC (Minton, 1993). Similar approaches have also been investigated in the operations research community (Yakowitz & Lugosi, 1990). These techniques are easy to use, apply to a variety of domains and utility functions, and can provide strong statistical guarantees about their performance. They are limited, however, as they are computational ly expensive, require many training examples to identify a strategy, and face problems with local maxima. Furthermore, they typically leave it to the user to conjecture the space of heuristic methods (see Minton, 1993 for a notable exception). In this article, we adopt the statistical approach to adaptive problem solving due to its generality and ease of use. In particular we use the COMPOSER technique for adaptive problem solving (Gratch & DeJong, 1992, Gratch & DeJong, 1996), which is reviewed in the next section. Our implementa tion incorporates some novel features to address the computational expense of the method. Ideally, however, an adaptive problem solver would incorporate some form of each of these methods. To this end we are investigating how to incorporate other methods of adaptation in our current research. .COMPOSER COMPOSER embodies a statistical approach to adaptive problem solving. To turn a problem solver into an adaptive problem solver, the developer is required to specify a utility function, a representative sample of training problems, and a space of possible heuristic strategies. COMPOSER then adapts the problem solver by exploring the space of heuristics via statistical hillclimbing search. The search space is defined in terms of a transformation generator which takes a strategy and generates a set of transformations to it. For example, one simple transformation generator just returns all single method modifications to a given strategy. Thus a transformation generator defines both a space of possible heuristic strategies and the nondeterministic order in which this space may be searched. COMPOSER's overall approach is one of generate and test hillclimbing. Given an initial problem solver, the transformation generator returns a set of possible transformations to its control strategy. These are statistically evaluated over the expected distribution of problems. A transformation is adopted if it increases the expected performance of solving problems over that distribution. The generator then constructs a set of transformations to this new strategy and so on, climbing the gradient of expected utility values. Formally, COMPOSER takes an initial problem solver, PS0, and identifies a sequence of problem solvers, PS0, PS1 , ... where each subsequent PS has higher expected utility with probability 1-d (where d > 0 is some user-specified constant). The transformation generator, TG, is a function that takes a problem solver and returns a set of candidate changes. Apply(t, PS) is a function that takes a transformation, t Q TG(PS ) and a problem solver and returns a new problem solver that is the result of transforming PS with t. Let U j(PS) denote the utility of PS on problem j. The change in utility that a transformation provides for the jth problem, called the incremental utility of a transformation, is denoted by DUj(t|PS ). This is the difference in utility between solving the problem with and with out the transformation. COMPOSER finds a problem solver with high expected utility by identifying transformations with positive expected incremental utility. The expected incremental utility is esti mated by averaging a sample of randomly drawn incremental utility values. Given a sample of n val ues, the average of that sample is denoted by DU n(t|PS). The likely difference between the average and the true expected incremental utility depends on the variance of the distribution, estimated from a sample by the sample variance , and the size of the sample, n. COMPOSER provides a statisti cal technique for determining when sufficient examples have been gathered to decide, with error d, that the expected incremental utility of a transformation is positive or negative. Because COMPOSER presumes that the relevant distributions are normally distributed, COMPOSER requires at that each esti mate of incremental utility be based on a minimum number of samples n0 to be determined for each application. The algorithm is summarized in Figure . COMPOSER's technique is applicable in cases where the following conditions apply: 1. The control strategy space can be structured to facilitate hillclimbing search. In general, the space of such strategies is so large as to make exhaustive search intractable. COMPOSER requires a transformation generator that structures this space into a sequence of search steps, with relatively few transformations at each step. In Section we discuss some techniques for incorporating domain specific information into the structuring of the control strategy space. 2. There is a large supply of representative training problems so that an adequate sampling of problems can be used to estimate expected utility for various control strategies. 3. Problems can be solved with a sufficiently low cost in resources so that estimating expected utility is feasible. 4. There is sufficient regularity in the domain such that the cost of learning a good strategy can be amortized over the gains in solving many problems. .The Deep Space Network The Deep Space Network (DSN) is a multinational collection of groundbased radio antennas responsible for maintaining communications with research satellites and deep space probes. DSN Operations is responsible for scheduling communications for a large and growing number of spacecraft. This already complex scheduling problem is becoming more challenging each year as budgetary pressures limit the construction of new antennas. As a result, DSN Operations has turned increasingly towards intelligent scheduling techniques as a way of increasing the efficiency of network utilization. As part of this ongoing effort, the Jet Propulsion Laboratory (JPL) has been given the responsibility of automating the scheduling of the 26meter subnet; a collection of 26meter antennas at Goldstone, CA, Canberra, Australia and Madrid, Spain. In this section we discuss the application of adaptive problemsolving techniques to the develop ment of a prototype system for automated scheduling of the 26meter subnet. We first discuss the development of the basic scheduling system and then discuss how adaptive problem solving enhanced the scheduler's effectiveness. The Scheduling Problem Scheduling the DSN 26meter subnet can be viewed as a large constraint satisfaction problem. Each satellite has a set of constraints, called project requirements, that define its communication needs. A typical project specifies three generic requirements: the minimum and maximum number of communication events required in a fixed period of time; the minimum and maximum duration for these communication events; and the minimum and maximum allowable gap between communica tion events. For example, Nimbus7, a meteorological satellite, must have at least four 15minute communication slots per day, and these slots cannot be greater than five hours apart. Project requirements are determined by the project managers and tend to be invariant across the lifetime of the spacecraft. In addition to project requirements, there are constraints associated with the various antennas. First, antennas are a limited resource - two satellites cannot communicate with a given antenna at the same time. Second, a satellite can only communicate with a given antenna at certain times, de pending on when its orbit brings it within view of the antenna. Finally, antennas undergo routine maintenance and cannot communicate with any satellite during these times. Scheduling is done on a weekly basis. A weekly scheduling problem is defined by three ele ments: (1) the set of satellites to be scheduled, (2) the constraints associated with each satellite, and (3) a set of time periods specifying all temporal intervals when a satellite can legally communicate with an antenna for that week. Each time period is a tuple specifying a satellite, a communication time interval, and an antenna, where (1) the time interval must satisfy the communication duration constraints for the satellite, (2) the satellite must be in view of the antenna during this interval. Anten na maintenance is treated as a project with time periods and constraints. Two time periods conflict if they use the same antenna and overlap in temporal extent. A valid schedule specifies a noncon flicting subset of all possible time periods where each project's requirements are satisfied. The automated scheduler must generate schedules quickly as scheduling problems are frequently overconstrained (i.e., the project constraints combined with the allowable time periods produces a set of constraints which is unsatisfiable). When this occurs, DSN Operations must go through a com plex cycle of negotiating with project managers to reduce their requirements. A goal of automated scheduling is to provide a system with relatively quick response time so that a human user may inter act with the scheduler and perform what if" reasoning to assist in this negotiation process. Ultimate ly, the goal is to automate this negotiation process as well, which will place even greater demands on scheduler response time (Chien & Gratch, 1994). For these reasons, the focus of development is upon heuristic techniques that do not necessarily uncover the optimal schedule, but rather produce adequate schedules quickly. The LR26 Scheduler LR26 is a heuristic scheduling approach to DSN scheduling being developed at the Jet Propulsion Laboratory (Bell & Gratch, 1993). LR26 is based on a 0-1 integer linear programming formulation of the scheduling problem (Taha, 1982). Scheduling is cast as the problem of finding an assignment to integer variables that maximizes the value of some objective function subject to a set of linear constraints. In particular, time periods are treated as 01 integer variables: 0 (or OUT) if the time period is excluded from the schedule; 1 (or IN) if it is included. The objective is to maximize the number of time periods in the schedule and the solution must satisfy the project requirements and antenna constraints (expressed as sets of linear inequalities). A typical scheduling problem under this formulation has 700 variables and 1300 constraints. In operations research, integer programs are solved by a variety of techniques including branch andbound search, the gomory method (Kwak & Schniederjans, 1987), and Lagrangian relaxation (Fisher, 1981). In artificial intelligence such problems are generally solved by constraint propagation search techniques (e.g., Dechter, 1992, Mackworth, 1992). To address the complexity of the schedul ing problem LR26 uses a hybrid approach that combines Lagrangian relaxation with constraint propa gation search. Lagrangian relaxation is a divideandconquer method which, given a decomposition of the scheduling problem into a set of easier subproblems, coerces the subproblems to be solved in such a way that they frequently result in a global solution. One specifies a problem decomposition by identifying a subset of problem constraints that, if removed, result in one or more independent and computationally easy subproblems. ', .A problem consists of independent subproblems it the global objective function can be maximized by finding some maximal solution for each subproblem in isolation. These problematic constraints are relaxed," meaning they no longer act as constraints but instead are added to the objective function in such a way that (1) there is incentive to satisfying these relaxed constraints when solving the subproblems and, (2) the best solution to the relaxed problem, if it satisfies all relaxed constraints, is guaranteed to be the best solu tion to the original problem. Furthermore, this relaxed objective function is parameterized by a set of weights (one for each relaxed constraint). By systematically changing these weights (thereby modulating the incentives for satisfying relaxed constraints) a global solution can often be found. Even if this weight search does not produce a global solution, it can make the solution to the subprob lems sufficiently close to a global solution that a global solution can be discovered with substantially reduced constraint propagation search. In the DSN domain, the scheduling problem is decomposed by scheduling each antenna indepen dently. Specifically, the constraints associated with the complete problem can be divided into two groups: those that refer to a single antenna, and those that mention multiple antennas. The later are relaxed and the resulting singleantenna subproblems can be solved in time linear in the number of time periods associated with that antenna (see below). LR 26 solves the complete problem by first trying to coerce a global solution by performing a search in the space of weights and then, if that fails to produce a solution, resorting to constraint propagation search in the space of possible schedules. Schedules We now describe the formalization of the problem. Let P be a set of projects, A a set of antennas, M = {0,..,10080}, and V be an enumeration, V={0, 1, *}, denoting whether a time period is excluded from the schedule (0), included (1), or uncommitted. Note that P, A, and M, are specified in advance and V is to be determined by the scheduler and is initially always uncommitted. Let S a PyAyMyM yV denote the set of possible time periods for a week, where a given time period specifies a project, antenna and the start and end of the communication event, respectively. For a given s Q S, we define project(s), antenna( s), start(s), end( s), and value(s) to denote the corresponding elements of s. We also define length(s) = end( s) - start(s) to simplify some subsequent notation. A ground schedule is an assignment of 0 (excluded) or 1 (included) to each time period in S. This can be seen as the application to S of some function G that maps each element of S to 0 or 1. We denote this by SG. A partial schedule refers to a schedule with only a subset of its time periods committed, which we denote via some mapping function M that maps elements of S to 0, 1, or *. A partial sched ule corresponds to a set of possible ground schedules (i.e., those that result from forcing each uncom mitted time period either in or out of the schedule). We denote this by SM. We define a particular partial schedule S0 to denote the completely uncommitted partial schedule (with all time periods as signed a value of *). Constraints The scheduler must identify some ground schedule that satisfies a set of project and antenna constraints, which we now formalize. Project Requirements. Each project pn Q P has associated with it a set of constraints called project requirements. All constraints are processed and translated into simple linear inequalities over elements of S. The complete set of project requirements, denoted PR, is the union of the requirements from each individual projects. Each requirement can be expressed as integer linear inequality: where ai represents a weighting factor indicating the degree to which the ith time period (if included) contributes to satisfying a particular requirement. For example, the requirement that a project, p, must have at least 100 minutes of communication time in a week is expressed: Where I(project(s)) equals one if s belongs to that project; otherwise zero. Note that time periods with zero weight play no role and are not explicitly mentioned in the actual constraint representation. Constraints on the length of individual time periods are represented similarly: For efficiency, however, time periods which do not satisfy these unary inequalities are simply eliminated from S in a preprocessing step. .Note that this is an inherent limitation in the formalization as the scheduler cannot entertain variable length communication events - communication events must be discretized into a finite set of fixed length intervals. Antenna Constraints. Each of the three antennas has the constraint that no two projects can use the antenna at the same time. This can be translated into a set of linear inequalities ACa,for each antenna a as follows: Problem Formulation The scheduling objective used by LR26 is to find some ground schedule, denoted by S*, that maximizes the number of time periods in the schedule subject to the project and antenna constraints: .This might correspond to a desire to maintain maximum downlink flexibility. Problem:DSN Find: Subject to:AC1| AC2|AC3 |PR where ZG is the value of the objective function for some ground schedule and arg max" denotes the argument that leads to the maximum. With Lagrangian relaxation, certain constraints are folded into the objective function in a stan dardized fashion. The intuition is to add some factor into the objective function that is negative iff the relaxed constraint is unsatisfied. If a constraint is of the form Saisi .b, then u[S aisi-b] is added to the objective function, where u is a nonnegative weighting factor. Likewise, if the constraint is of the form Sais i3b, then u[b- Saisi] is added. In LR26, only project requirements are relaxed: Problem:DSN(u ) Find: () =}}\ {u\ sub\ j }\ {left\ [\ {{b\ sub\ j}\ \X0107\ {sum\ from\ {{s\ sub\ i}\ member\ {S\ sup\ G}}}\ {a\ sub\ {i\ j}}\ cdot\ v\ a\ l\ u\ e\ {{\(\ s}\ sub\ i}\ \)}\ right\ ]} }\ right\ }}) (T15,2,12,,0.8,0.2187858,5,127,5,7,127,1,0,6, S*(u) = Subject to:AC1| AC2|AC3 where Zs(u) is the relaxed objective function and u is a vector of nonnegative weights of length |PR| (one for each relaxed constraint). Note that this defines a space of relaxed solutions that depend on the weight vector u. Let Z* denote the value of the optimal solution of the original problem (Definition ), and let Z*(u) denote value of the optimal solution to the relaxed problem (Definition ) for a particular weight vector u. For any weight vector u, Z*( u) can be shown to be an upper bound on the value of Z*. Thus, if a relaxed solution satisfies all of the original problem constraints, it is guaranteed to be the optimal solution to the original problem. Lagrangian relaxation proceeds by incrementally tightening this upper bound (by adjusting the weight vector) in the hope of identifying a global solution. A global solution cannot always be identified in this manner, so a complete scheduler must combine Lagrangian relaxation with some form of search. Search If a solution cannot be found through weight adjustment, LR-26 resorts to basic refinement search (Kambhampati, Knoblock & Yang, 1995) (or splitandprune search (Dechter & Pearl, 1987)) in the space of partial schedules. In this search paradigm a partial schedule is recursively refined (split) into a set of more specific partial schedules. In the context of the DSN scheduling problem, refinement corresponds to forcing uncommitted time periods in or out of the schedule. A partial schedule would be pruned if all of its ground schedules violate the constraints. The scheduler is applied recursively to each refined partial schedule until some satisfactory ground schedule is found or all schedules are pruned. Each refinement is further refined by propagating the local consequence of new commitment. After a variable is set to a particular value, each individual constraint which references that variable is analyzed to determine which time period would be forced in or out of the schedule as a result of the assignment. LR-26 performs only partial constraint propagation, because complete propagation is computationally expensive. Specifically, if constraint C1 references time periods s2, s4 and s5 , and s2 is assigned a value, LR-26 analyzes C1 to see if the new assignment determines the value of s 4 and/ or s5. If, for example, s4 is constrained to take on a particular value, this triggers analysis of all constraints which contain s4. This can be viewed as performing arc-consistency (Dechter, 1992). During the constraint propagation it may be possible to show that the refinement contains no valid ground schedule. In this case the partial schedule may be pruned from the search. LR26 augments this basic refinement search with Lagrangian relaxation to heuristically reduce the combinatorics of the problem. The difficulty with refinement search is that it may have to perform considerable (and poorly directed) search through a tree of refinements to identify a single satisficing solution. If an optimal solution is sought, every leaf of this search tree must be examined. .Partial schedules may also be pruned, as in branchandbound search, if they can be shown to contain lower value solutions that other partial schedules. In practice L R26 is run in a satisficing mode, meaning that search terminates as soon as a ground schedule is found (not necessarily optimal) that satisfies all of the problem constraints. trast, by searching through the space of relaxed solutions to a partial schedule, one can sometimes identify the best schedule without any refinement search. Even when this is not possible, Lagrangian relaxation heuristically identifies a small set of problematic constraints, focusing the subsequent re finement search. Thus, by performing some search in the space of relaxed solutions at each step, the augmented search method can significantly reduce both the depth and breadth of refinement search. The augmented procedure works to the extent that it can efficiently solve relaxed solutions, ideal ly allowing the algorithm to explore several points in the space of weight vectors in each step of the refinement search. LR26 solves relaxed problems in linear time, O(|AC1| AC2|AC3|). To see this, note that each time period appears on exactly one antenna. Thus, Z s(u) can be broken into the sum of three objective functions, each containing only the time periods associated with a particular anten na. Furthermore, the relaxed objective function can be re-expressed as the weighted sum of each of the time periods on that antenna, and the unrelaxed constraints are simple pair-wise exclusion constraints between individual time periods. Combine this with the fact that time periods are partially ordered by their start time and the problem simplifies to identifying some non-exclusive sequence of time periods with the maximum cumulative weight. This is easily formulated and solved as a dy namic programming problem (see Bell & Gratch, 1993 for more details). The augmented refinement search performed by LR26 is summarized in Figure LR26 Scheduler Figure : The basic LR26 refinement search method. Performance Tradeoffs Perhaps the most difficult decisions in constructing the scheduler involve how to flesh out the details of steps 1,2, 3, and 4. The constraint satisfaction and operations research literatures have proposed many heuristic methods for these steps. Unfortunately, due to their heuristic nature, it is not clear what combination of methods best suits this scheduling problem. The power of a heuristic method depends on subtle factors that are difficult to assess in advance. Additionally, when considering multiple methods, one has to consider interactions between methods. In LR26 a key interaction arises in the tradeoff between the amount of weight vector search vs. refinement search performed by the scheduler (as determined by Step 2). At each step in the refine ment search, the scheduler has the opportunity to search in the space of relaxed solutions. Spending more effort in this weight search can reduce the amount of subsequent refinement search. But at some point the savings in reduced refinement search may be overwhelmed by the cost of performing the weight search. This is a classic example of the utility problem, and it is difficult to see how best to resolve the tradeoff without intimate knowledge of the form and distribution of scheduling problems. Another important issue for improving scheduling efficiency is the choice of heuristic methods for controlling the direction of refinement search (as determined by steps 1, 3, and 4). Often these methods are stated as general principles (e.g., first instantiate variables that maximally constrain the rest of the search space", Dechter, 1992, p. 277) and there may be many ways to realize them in a particular scheduler and domain. Furthermore, there are almost certainly interactions between methods used at different control points that make it difficult to construct a good overall strategy. These tradeoffs conspire to make manual development and evaluation of heuristics a tedious, un certain, and time consuming task that requires significant knowledge about the domain and schedul er. In the case of LR26, its initial control strategy was identified by hand, requiring a significant cycle of trialanderror evaluation by the developer over a small number of artificial problems. Even with this effort, the resulting scheduler is still expensive to use, motivating us to try adaptive techniques. Rather than committing on a particular combination of heuristic strategies, Adaptive Select some partial schedule The first decision in the refinement search is to choose some partial schedule from the agenda. This selection policy defines the character of the search. Maintaining the agenda as a stack implements depthfirst search. Sorting the agenda by some value function implements a bestfirst search. In Adaptive LR26 we restrict the space of methods to variants of depthfirst search. Each time a set of refinements is created (Decision 4), they are added to the front of the agenda. Search always proceeds by expanding the first partial schedule on the agenda. Heuristics act by ordering refinements before they are added to the agenda. The grammar specifies several ordering heuristics, sometimes called value ordering heuristics, or look-ahead schemes in the constraint propagation literature (Dechter, 1992, Mackworth, 1992). As these methods are entertained during refinement construction, their detailed description is delayed until that section. Lookahead schemes decide how to refine partial schedules. Lookback schemes handle the re verse decision of what to do whenever the scheduler encounters a dead end and must backtrack to another partial schedule. Standard depthfirst search performs chronological backtracking, backing up to the most recent decision. The constraint satisfaction literature has explored several heuristic alternatives to this simple strategy, including backjumping (Gaschnig, 1979), backmarking (Haralick & Elliott, 1980), dynamic backtracking (Ginsberg, 1993), and dependencyd irected backtracking (Stallman & Sussman, 1977) (see Backer & Baker, 1994, and Frost and Dechter, 1994, for a recent evaluation of these methods). We are currently investigating lookback schemes for the control grammar but they will not be discussed in this article. Search for some relaxed solution The next dimension of flexibility is in weightadjusting methods to search the space of possible relaxed solutions for a given partial schedule. The general goal of the weight search is to find a relaxed solution that is closest to the true solution in the sense that as many constraints are satisfied as possible. This can be achieved by minimizing the value of Z *(u) with respect to u. The most popular method of searching this space is called subgradient optimization (Fisher, 1981). This is a standard optimization method that repeatedly changes the current u in the direction that most decreases Z*(u). Thus at step i , ui+1 = ui + tidi where ti is a step size and di is a directional vector in the weight space. The method is expensive but it is guaranteed to converge to the minimum Z*(u) under certain conditions (Held & Karp, 1970). A less expensive technique, but without the convergence guarantee, is to consider only one weight at a time when finding an improving direction. Thus ui+1 = ui + tidi where di is a directional vector with zeroes in all but one location. This method is called dualdescent. In both of these methods, weights are adjusted until there is no change in the relaxed solution: S*(ui) = S*(ui+1). While better relaxed solutions will create greater reduction in the amount of subsequent refine ment search, it is unclear just where the tradeoff between these two search spaces lies. Perhaps it is unnecessary to spend much time improving relaxed schedules. Thus a more radical, and extremely efficient, approach is to settle for the first relaxed solution found. We call this the firstsolution meth od. A more moderate approach is to perform careful weight search at the beginning of the refinement search (where there is much to be gained by reducing the subsequent refinement search) and to per form the more restricted firstsolution search when deeper in the refinement search tree. The trun cateddualdescent method performs dualdescent at the initial refinement search node and then uses the firstsolution method for the rest of the refinement search. The control grammar includes four methods for performing weight space search (Figure ). 2a:Subgradientoptimization 2c:Truncateddualdescent 2b:Dualdescent 2d:Firstsolution ) (T15,2,12,,1.6666667,0.6666667,5,127,5,7,127,1,0,2, Figure : Weight Search Methods ) (E16,0,0,,5,1,1,0.0533333,1,15,0,0,1,0,0,0,1,5,127,7,0,0,7,0,1,1,0.0666667,0.06 66667,6,6,0,0.0666667,6))> Select some constraint If the scheduler cannot find a relaxed solution that solves the original problem, it must break the current partial schedule into a set of refinements and explore them nondeterministically. In Adaptive LR26, the task of creating refinements is broken into two decisions: selecting an unsatisfied constraint (Decision 3), and creating refinements that make progress towards satisfying the selected constraint (Decision 4). Lagrangian relaxation simplifies the first decision by identifying a small subset of constraints that appear problematic. However, this still leaves the problem of choosing one constraint in this subset on which to base the subsequent refinement. The common wisdom in the search community is to choose a constraint that maximally constrains the rest of the search space, the idea being to minimize the size of the subsequent refine ment search and to allow rapid pruning if the partial schedule is unsatisfiable. Therefore, our control grammar incorporates several alternative heuristic methods for locally assessing this factor. Given that the common wisdom is only a heuristic, we include a small number of methods that violate this intuition. All of these methods are functions that look at the local constraint graph topology and re turn a value for each constraint. Constraints can then be ranked by their value and the highest value constraint chosen. The control grammar implements both a primary and secondary sort for constraints. Constraints that have the same primary value are ordered by their secondary value. For the sake of simplicity we only discuss measures for constraints of the form Sas . b. (Analo gous measures are defined for other forms.) We first define measures on time periods. Measures on constraints are functions of the measures of the time periods that participate in the constraint. Measures on Time Periods. An unforced time period is one that is neither in or out of the schedule (value(s)=*). The conflictedness of an unforced time period s (with respect to a current partial schedule) is the number of other unforced time periods that will be forced out if s is forced into the schedule (because they participate in an antenna constraint with s ). If a time period is already forced out of the current partial schedule, it does not count toward s 's conflictedness. Forcing a time period with high conflictedness into the schedule will result in many constraint propagations, which reduces the number of ground schedules in the refinement. The gain of an unforced time period s (with respect to a current partial schedule) is the number of unsatisfied project constraints that s participates in. Preferring time periods with high gain will make progress towards satisfying many project constraints simultaneously. The loss of an unforced time period s (with respect to a current partial schedule) is a combination of gain and conflictedness. Loss is the sum of the gain of each unforced time period that will be forced out if s is forced into the schedule. Time period with high loss are best avoided as they prevent prog ress towards satisfying many project constraints. To illustrate these measures, consider the simplified scheduling problem in Figure . Figure : A simplified DSN scheduling problem based on four time periods. There are two project constraints, and two antenna constraints. For example, P1 signifies that at least two of the first three time periods must appear in the schedule, and A1 signifies that either s1 or s3 may appear in the schedule, but not both. In the solution, only s2 and s3 appear in the schedule. With respect to the initial partial schedule (with none of the time periods forced either in or out) the conflictedness of s2 is one, because it appears in just one antenna constraint (A2). If subsequently, s4 is forced out, then the conflictedness of s2 drops to zero, as conflictedness is only computed over unforced time periods. The initial gain of s2 is two, as it appears in both project constraints. Its gain drops to one if s3 and s4 are then forced into the schedule, as P2 becomes satisfied. The initial loss of s2 is the sum of the gain of all time periods conflicting with it (s4). The gain of s4 is one (it appears in P2) so that the loss of s2 is one. Measures on Constraints. Constraint measures (with respect to a partial schedule) can be defined as functions of the measures of the unforced time periods that participate in a constraint. The functions max, min, and total have been defined. Thus, totalconflictedness is the sum of the conflictedness of all unforced time periods mentioned in a constraint, while maxgain is the maximum of the gains of the unforced time periods. Thus, for the constraints defined above, the initial totalconflictedness of P1 is the conflictedness of s1, s2 and s 3, 1 + 1 + 1 = 3. The initial max-gain of constraint P1 is the maximum of the gains of s1, s2, and s3 or max{1,2,2} = 2. We also define two other constraint measures. The unforcedp eriods of a constraint (with respect to a partial schedule) is simply the number of unforced time periods that are mentioned in the constraint. Preferring a constraint with a small number of unforced time periods restricts the number of refinements that must be considered, as refinements consider combinations of time periods to force into the schedule in order to satisfy the constraint. Thus, the initial unforcedperiods of P1 is three (s1, s2, and s3). The satisfactiondistance of a constraint (with respect to a partial schedule) is a heuristic measure the number of time periods that must be forced in order to satisfy the constraint. The measure is heu ristic because it does not account for the dependencies between time periods imposed by antenna constraints. The initial satisfactiondistance of P1 is two because two time periods must be forced in before the constraint can be satisfied. Given these constraint measures, constraints can be ordered by some measure of their worth. For example we may prefer constraints with high total conflictedness, denoted as prefertotalconflicted ness. Not all possible combinations seem meaningful so the control grammar for Adaptive LR26 im plements nine constraint ordering heuristics (Figure ). 3a:Prefermaxgain 3f:Penalizetotalconflictedness 3b:Prefertotalgain 3g:Preferminconflictedness 3c:Penalizemaxloss 3h:Penalizeunforcedperiods 3d:Penalizemaxconflictedness 3i:Penalizesatisfactiondistance 3e:Prefertotalconflictedness ) (T15,2,12,,1.56,1.1333333,5,127,5,7,127,1,0,2, Figure : Constraint Selection Methods ) (E16,0,0,,5,1,1,0.0533333,1,15,0,0,1,0,0,0,1,5,127,7,0,0,7,0,1,1,0.0666667,0.06 66667,6,6,0,0.0666667,6))> Refine partial schedule Given a selected constraint, the scheduler must create a set of refinements that make progress towards satisfying it. If the constraint is of the form Sas . b then some time periods on the lefthand side must be forced into the schedule if the constraint is to be satisfied. Thus, refinements are constructed by identifying a set of ways to force time periods in or out of the partial schedule s such that the refinements form a spanning set: |{Si } = S. These refinements are then ordered and added to the agenda. Again, for simplicity we restrict discussion to constraints of form Sas . b. The Basic Refinement Method. The basic method for refining a partial schedule is to take each unforced time period mentioned in the constraint and create a refinement with the time period vj forced into the schedule. Thus, for the constraints defined above, there would be three refinements to constraint P1, one with s1 forced in: one with s2 forced in, and one with s3 forced in. Each refinement is further refined by performing constraint propagation (arc consistency) to de termine some local consequences of this new restriction. Thus, every time period that conflicts with vj is forced out of the refined partial schedule, which in turn may force other time periods to be in cluded, and so forth. By this process, some refinements may be recognized as inconsistent (contain no ground solutions) and are pruned from the search space (for efficiency, constraint propagation is only performed when partial schedules are removed from the agenda). Once the set of refinements has been created, they are ordered by a value ordering heuristic before being placed on the agenda. As with constraint ordering heuristics, there is a common wisdom for creating value ordering heuristics: prefer refinements that maximized the number of future options available for future assignments (Dechter & Pearl, 1987, Haralick & Elliott, 1980). The control grammar implements several heuristic methods using measures on the time periods that created the refinement. For example, one way to keep options available is to prefer forcing in a time period with minimal conflictedness. As the common wisdom is only heuristic, we also incorporate a method that violates it. The control grammar includes five value ordering heuristics that are derived from the measures on time periods (Figure ), where the last method, arbitrary, just uses the ordering of the time periods as they appear in the constraint. 1a:Prefergain 1d:Preferconflictedness 1b:Penalizeloss 1e:Arbitrary 1c: Penalizeconflictedness ) (T15,2,12,,1.8266667,0.7333333,5,127,5,7,127,1,0,2, Figure : Value Ordering Methods ) (E16,0,0,,5,1,1,0.0533333,1,15,0,0,1,0,0,0,1,5,127,7,0,0,7,0,1,1,0.0666667,0.06 66667,6,6,0,0.0666667,6))> The Systematic Refinement Method. The basic refinement method has one unfortunate property that may limit its effectiveness. The search resulting from this refinement method is unsystematic in the sense of McAllester and Rosenblitt (1991). This means that there is some redundancy in the set of refinements: Si\S jpj. Unsystematic search is inefficient in that the total size of the refinement search space will be greater than if a systematic (nonredundant) refinement method is used. This may or may not be a disadvantage in practice as scheduling complexity is driven by the size of the search space actually explored (the effective search space) rather than its total size. Nevertheless, there is good reason to suspect that a systematic method will lead to smaller effective search spaces. A systematic refinement method chooses a time period that helps satisfy the selected constraint and then forms a spanning set of two refinements: one with the time period forced in and one with the time period forced out. These refinements are guaranteed to be nonoverlapping. The systematic method incorporated in the control grammar uses the value ordering heuristic to choose which un forced time period to use. The two refinements are ordered based on which makes immediate prog ress towards satisfying the constraint (e.g., s=1 is first for constraints of form Sas . b). The control grammar includes both the basic and systematic refinement methods (Figure ). 4a:BasicRefinement 4b:SystematicRefinement ) (T15,2,12,,2.0266667,0.4666667,5,127,5,7,127,1,0,2, Figure : Refinement Methods ) (E16,0,0,,5,1,1,0.0533333,1,15,0,0,1,0,0,0,1,5,127,7,0,0,7,0,1,1,0.0666667,0.06 66667,6,6,0,0.0666667,6))> For the problem specified in Figure , when systematically refining constraint P1, one would use the value ordering method to select among time periods s1, s2, and s3. If s2 were selected, two refine ments would be proposed, one with s2 forced in and one with s2 forced out. The control grammar is summarized in Figure . The original expert control strategy developed for LR26 is a particular point in the control space defined by the grammar: the value ordering method is arbitrary (1e); the weight search is by dualdescent (2b); the primary constraint ordering is penal izeunforcedperiods (3h); there is no secondary constraint ordering, thus this is the same as the pri mary ordering; and the basic refinement method is used (4a). Metacontrol Knowledge The constraint grammar defines a space of close to three thousand possible control strategies. The quality of a strategy must be assessed with respect to a distribution of problems, therefore it is prohibitively expensive to exhaustively explore the control space: taking a significant number of examples (say fifty) on each of the strategies at a cost of 5 CPU minutes per problem would require approximately 450 CPU days of effort. COMPOSER requires a transformation generator to specify alternative strategies, which are ex plored via hillclimbing search. In this case, the obvious way to proceed is to consider all single meth od changes to a given control strategy. However the cost of searching the strategy space and quality of the final solution depend to a large extent on how hillclimbing proceeds, and the obvious way need not be the best. In Adaptive LR26, we augment the control grammar with some domainspecific knowledge to help organize the search. Such knowledge includes, for example, our prior expectation that certain control decisions would interact, and the likely importance of the different control deci sions. The intent of this metacontrol knowledge" is to reduce the branching factor in the control strategy search and improve the expected utility of the locally optimal solution found. This approach led to a layered search through the strategy space. Each control decision is assigned to a level. The control grammar is search by evaluating all combinations of methods at a single level, adopting the best combinations, and then moving onto the next level. The organization is shown below: Level 0:{Weight search method} Level 1:{Refinement method} Level 2:{Secondary constraint ordering, Value ordering} Level 3:{Primary constraint ordering} The weight search and refinement control points are separate, as they seem relatively independent from the other control points, in terms of their effect on the overall strategy. While there is clearly some interaction between weight search, refinement construction, and the other control points, a good selection of methods for pricing and alternative construction should perform well across all ordering heuristics. The primary constraint ordering method is relegated to the last level because some effort was made in optimizing this decision in the expert strategy for LR26, and we believed that it was unlikely the default strategy could be improved. Given this transformation generator, Adaptive LR26 performs hillclimbing across these levels. It first entertains weight adjustment methods, then alternative construction methods, then combina tions of secondary constraint sort and child sort methods, and finally primary constraint sort methods. Each choice is made given the previously adopted methods. This layered search can be viewed as the consequence of asserting certain types of relations be tween control points. Independence relations indicate cases in which the utility of methods for one control point is roughly independent of the methods used at other control points. Dominance rela tions indicate that the changes in utility from changing methods for one control point are much larger than the changes in utility for another control point. Finally, inconsistency relations indicate when a method M1 for control point X is inconsistent with method M2 for control point Y. This means that any strategy using these methods for these control points need not be considered. EXPECTED UTILITY As previously mentioned, a chief design requirement for LR 26 is that the scheduler produce solutions (or prove that none exist) efficiently. This behavioral preference can be expressed by a utility function related to the computational effort required to solve a problem. As the effort to produce a schedule increases, the utility of the scheduler on that problem should decrease. In this paper, we characterize this preference by defining utility as the negative of the CPU time required by the scheduler on a problem. Thus, Adaptive LR26 tunes itself to strategies that minimize the average time to generate a schedule (or prove that one does not exist). Other utility functions could be entertained. In fact, more recent research has focused on measures of schedule quality (Chien & Gratch, 1994). Problem Distribution Adaptive LR26 needs a representative sample of training examples for its adaptation phase. Unfortunately, DSN Operations has only recently begun to maintain a database of scheduling problems in a machine readable format. While this will ultimately allow the scheduler to tune itself to the actual problem distribution, only a small body of actual problems was available at the time of this evaluation. Therefore, we resorted to other means to create a reasonable problem distribution. We constructed an augmented set of training problems by syntactic manipulation of the set of real problems. Recall that each scheduling problem is composed of two components: a set of project re quirements, and a set of time periods. Only the time periods change across scheduling problems, so we can organize the real problems into a set of tuples, one for each project, containing the weekly blocks of time periods associated with it (one entry for each week the project is scheduled). The set of augmented scheduling problems is constructed by taking the cross product of these tuples. Thus, a weekly scheduling problem is defined by combining one weeks worth of time periods from each project (time periods for different projects may be drawn from different weeks), as well as the project requirements for each. This simple procedure defines set of 6600 potential scheduling problems. Two concerns led us to use only a subset of these augmented problems. First, a significant per centage of augmented problems appeared much harder to solve (or prove unsatisfiable) than any of the real problems (on almost half of the constructed problems the scheduler did not terminate, even with large resource bounds). That such hard" problems exist is not unexpected as scheduling is NP hard, however, their frequency in the augmented sample seems disproportionately high. Second, the existence of these hard problems raises a secondary issue of how best to terminate search. The stan dard approach is to impose some arbitrary resource bound and to declare a problem unsatisfiable if no solution is found within this bound. Unfortunately this raises the issue of what sized bound is most reasonable. We could have resolved this by adding the resource bound to the control grammar, how ever, at this point in the project we settled for a simpler approach. We address this and the previous concern by excluding from the augmented problem distribution those problems that seem funda mentally intractable." What this means in practice is that we exclude problems that could not be solved by any of a large set of heuristic methods within a five minute resource bound, the determina tion of which is discussed in Appendix A. This results in a reduced set of about three thousand sched uling problems. The use of a resource bound can be problematic for evaluating the power of a learning technique. As noted by Segre, Elkan, and Russell (1991), a learning system that greatly improves problem solv ing performance under a given resource bound may perform quite differently under a different re source bound. Some researchers suggest statistical analysis methods for assessing the significance of this factor (e.g., see Etzioni and Etzioni, 1994). In this study, however, we do not address the issue of how results might change given different resource bounds. We note that COMPOSER's statistical properties suggest that problem solving performance should be no worse after learning, whatever the resource bound, but the performance improvement many vary considerably. To give at least some A third evaluation assesses the relative quality of the strategies identified by Adaptive LR26 when compared with other strategies in the strategy space. This is inferred by comparing the expected util ity of the learned strategies with several strategies drawn randomly from the space. This also pro vides an opportunity to assess the quality of the expert strategy, and thus give a sense of how challeng ing it is to improve it. COMPOSER, the statistical component of the adaptive scheduler, has two parameters that govern its behavior. The parameter d specifies the acceptable level of statistical error (this is the chance that the technique will adopt a bad transformation or reject a good one). In Adaptive LR26, this is set to a standard value of 5%. COMPOSER bases each statistical inferences on a minimum of n0 example s. In Adaptive LR26, n0 is set to the empirically determined value of fifteen. Overall Results -- DSN DISTRIBUTION Figure summarizes The results support the two primary claims. First, the system learned search control strategies that yielded a significant improvement in performance. For the weight search, all of the learned strategies used the firstsolution method (2d). It seems that, at least in this domain and problem distribution, the reduction in refinement search space that results from better relaxed solutions is more than offset by the additional cost of the weight search. The scheduler did, however, benefit from the reduction in size that results from a systematic refine ment method. More interestingly, Adaptive LR26 seems to have rediscovered" the common wisdom in heuris tic constraintsatisfaction search. When exploring new refinements, it is often suggested to chose the least constrained value of the most constrained constraint. The best learned strategies follow this advice while the worst strategies violate it. In the best strategy, the time period with lowest con flictedness is least constraining (in the sense that it will tend to produce the least constraint propaga tions) and thus produces the least commitments on the resulting partial schedule. By this same argu ment, the constraint with the highest total conflicted will tend to be the hardest to satisfy. Overall Results -- FULL AUGMENTED DISTRIBUTION Figure summarizes the results for the augmented distribution. As expected, this distribution One interesting result of this evaluation is that, unlike the previous evaluation, the best learned strategies use truncateddualdescent as their weight search method (the strategies were similar along other control dimensions). This illustrates how even modest changes to the distribution of problems can influence the design tradeoffs associated with a problem solver: in this case, changing the tradeoff between weight and refinement search. Quality of Learned strategies The third claim is that, in practice, COMPOSER can identify strategies that rank highly when judged with respect to the whole strategy space. A secondary question is how well does the expert strategy perform. The improvements of Adaptive LR26 are of little significance if the expert strategy performs worse than most strategies in the space. Alternatively, if the expert strategy is extremely good, its improvement is compelling. As a way of assessing these claims we estimate the probability of selecting a high utility strategy given that we choose it randomly from one of three strategy spaces: the space of all possible strategies (expressible in the transformation grammar), the space of strategies produced by Adaptive LR26, and the trivial space containing only the expert strategy. This corresponds to the problem of estimating a probability density function (p.d.f.) for each space: a p.d.f., f(x), associated with a random variable gives the probability that an instance of the variable has value x . More specifically we want to esti mate the density functions, fs(u ), which is the probability of randomly selecting a strategy from space s that has expected utility u. We use a nonparametric density estimation technique called the kernel method to estimate fs(u ) (as in Smyth, 1993). To estimate the density function of the whole space, we randomly selected and tested thirty strategies. All of the learned strategies are used to estimate the density of the learned space. (In both cases, five percent of the data was withheld to estimate the bandwidth parameter used by the kernel method.) The p.d.f. associated with the single expert strategy is estimated using a nor mal model fit to the 1000 test examples from the previous evaluation. DSN Distribution Figure illustrates the results for the DSN distribution. In this evaluation the learned strategies significantly outperformed the randomly selected strategies. Thus, one would have to select and test many strategies at random before finding one of comparable expected utility to one found by Adaptive LR26. The results also indicate that the expert strategy is already a good strategy (as indicated by the relative positions of the peaks for the expert and random strategy distributions), indicating that the improvement due to Adaptive LR26 is significant and nontrivial. Learned Strategies ) (v7,4,0,,2.8000016,1.162889,3.0666667,1.0295557,7,0,1,0) (T15,5,12,,3.0666667,0.8962223,7,127,5,7,127,1,0,7, Expert Strategy ) (v7,6,0,,4.04,2.3362223,4.24,2.1362223,7,0,1,0) (T15,7,12,,4.3333333,2.0962223,7,127,5,7,127,1,0,7, Random Strategies ) (T15,8,12,,0.3333333,3.1333333,7,127,5,7,127,1,0,2, Figure : The DSN Distribution. The graph shows the probability of obtaining a strategy of a particular utility, given that it is chosen from (1) the set of all strategies, (2) the set of learned strategies, or (3) the expert strategy. ) (T15,9,12,,4.1333333,2.8962223,7,127,5,7,127,1,0,6, Negative Expected Utility Improved Performance ))> The results provide additional insight into Adaptive LR26 's learning behavior. That the p.d.f for the learned strategies contains several peaks, graphically illustrates that different local maxima exist for this problem. Thus, there may be benefit in running the system multiple times and choosing the best strategy. It also suggests that techniques designed to avoid local maxima would be beneficial. Full Augmented Distribution Figure illustrates the results for the full augmented distribution. The results are similar to the DSN distribution: the learned strategies again outperformed the expert strategy which in turn again outperformed the randomly selected strategies. The data shows that the expert strategy is significantly better than randomly selected strategies. Together, these two evaluations support the claim that Adaptive LR26 is selecting high performance strategies. Even though the expert strategy is quite good when compared with the complete strategy space, the adaptive algorithm is able to improve the expected problem solving performance. Learned Strategies ) (T15,3,12,,3.9333333,0.8962223,7,127,5,7,127,1,0,7, Expert Strategy ) (T15,4,12,,4.04,2.002889,7,127,5,7,127,1,0,7, Random Strategies ) (T15,5,12,,0.2666667,3.3333333,7,127,5,7,127,1,0,2, Figure : The Full augmented distribution. The graph shows the probability of ob taining a strategy of a particular utility, given that it is chosen from (1) the set of all strategies, (2) the set of learned strategies, or (3) the expert strategy. ) (T15,6,12,,4,3.0295557,7,127,5,7,127,1,0,6, Negative Expected Utility ) (t14,7,0,,0.1333333,1.1333333,0,7,0,300.0000305,,wst:timsps10b,Probability) (v7,8,0,,2.9133317,1.8295549,3.1733317,2.0028898,7,0,1,0) (v7,9,0,,3.9333333,1.0295557,3.6400024,1.202889,7,0,1,0) (v7,10,0,,4.4666667,2.2666667,4.3066667,2.44,7,0,1,0) (g9,11,8, (v7,11,0,,2.7333333,0.2666667,3.7666667,0.2666667,7,0,1,0) (g9,13,1025, (p8,13,0,,5,7,0 (g9,13,0, (g9,13,0, (v7,13,0,,3.0424316,0.1838436,3.0424316,0.3494897,7,0,1,0) (v7,14,0,,3.0424316,0.3494897,2.7333333,0.2666667,7,0,1,0) (v7,15,0,,2.7333333,0.2666667,3.0424316,0.1838436,7,0,1,0)))) (v7,16,25167905,,2.7333333,0.2666667,2.7333333,0.2666667,7,127,1,0))) (T15,17,12,,3.8666667,0.2187858,7,127,5,7,127,1,0,6, Improved Performance ))> .Future Work The results of applying an adaptive approach to deep space network scheduling are very promising. We hope to build on this success in a number of ways. We discuss these directions as they relate to Syntactic Approaches Syntactic approaches attempt to identify control strategies by analyzing the structure of the domain and problem solver. In LR26, our use of metacontrol knowledge can be seen as a syntactic approach; although unlike most syntactic approaches that attempt to identify a specific combination of heuristic methods, the metaknowledge (dominance and indifference relations) acts as constraints that only partially determine a strategy. An advantage of this weakening of the syntactic approach is that it lends itself to a natural and complementary interaction with statistical approaches: structural information restricts the space of reasonable strategies, which is then explored by statistical techniques. An important question concerning such knowledge is to what extent does it contribute to the success of our evaluations, and, more interestingly, how could such information be derived automatically from a structural analysis of the domain and problem solver. We are currently performing a series of experiments to address the former question. A step towards the resolving the second question would be to evaluate in the context of LR2 6 some of the structural relationships suggested by recent work in this area (Frost & Dechter, 1994, Stone, Veloso & Blythe, 1994). Generative Approaches Adaptive LR26 uses a nongenerative approach to conjecturing heuristics. Our experience in the scheduling domain indicates that the performance of adaptive problem solving is inextricably tied to the transformations it is given and the expense of processing examples. Just as an inductive learning technique relies on good attributes, if COMPOSER is to be effective, there must exist good methods for the control points that make up a strategy. Generative approaches could improve the effectiveness of Adaptive LR26. Generative approaches dynamically construct heuristic methods in response to observed problemsolving inefficiencies. The advantage of waiting until inefficiencies are observed is twofold. First, the exploration of the strategy space can be much more focused by only conjecturing heuristics relevant to the observed complications. Second, the conjectured heuristics can be tailored much more specifically to the characteristics of these observed complications. Our previous application of COMPOSER achieved greater performance improvements than Adap tive LR26, in part because it exploited a generative technique to construct heuristics (Gratch & De Jong, 1992). Ongoing research is directed towards incorporating generative methods into Adaptive LR26. Some preliminary work analyzes problemso lving traces to induce good heuristic methods. The constraint and value ordering metrics discussed in Section are used to characterize each search node. This information is then fed to a decisiontree algorithm, which tries to induce effective heuristic methods. These generated methods can then be evaluated statistically. Statistical Approaches Finally there are directions of future work devoted towards enhancing the power of the basic statistical approach, both for Adaptive LR26 in particular, and for statistical approaches in general. For the scheduler, there are two important considerations: enhancing the control grammar and exploring a wider class of utility functions. Several methods could be added to the control grammar. For example, an informal analysis of the empirical evaluations suggests that the scheduler could benefit from a lookback scheme such as backjumping (Gaschnig, 1979) or backmarking (Haralick & Elliott, 1980). We would also like to investigate the adaptive problem solving methodology on a richer variety of scheduling approaches, besides integer programming. Among these would be more powerful bottleneck centered techniques (Biefeld & Cooper, 1991), constraintbased techniques (Smith & Cheng, 1993), opportunistic techniques (Sadeh, 1994), reactive techniques (Smith, 1994) and more powerful backtracking techniques (Xiong, Sadeh & Sycara, 1992). The current evaluation of the scheduler focused on problem solving time as a utility metric, but future work will consider how to improve other aspects of the schedulers capabilities. For example, by choosing another utility function we could guide Adaptive LR 26 towards influencing other aspects of LR26's behavior such as: increasing the amount of flexibility in the generated schedules, increas ing the robustness of generated schedules, maximizing the number of satisfied project constraints, or reducing the implementation cost of generated schedules. These alternative utility functions are of great significance in that they provide much greater leverage in impacting actual operations. For example, finding heuristics which will reduce DSN schedule implementation costs by 3% would have a much greater impact than reducing the automated scheduler response time by 3%. Some pre liminary work has focused on improving schedule quality (Chien & Gratch, 1994). More generally, there are several ways to improve the statistical approach embodied by COMPOS ER. Statistical approaches involve two processes, estimating the utility of transformations and ex ploring the space of strategies. The process of estimating expected utilities can be enhanced by more efficient statistical methods (Chien, Gratch & Burl, 1995, Moore & Lee, 1994, Nelson & Matejcik, 1995), alternative statistical decision requirements (Chien, Gratch & Burl, 1995) and more complex statistical models that weaken the assumption of normality (Smyth & Mellstrom, 1992). The process of exploring the strategy space can be improved both in terms of its efficiency and susceptibility to local maxima. Moore and Lee propose a method called schemata search to help reduce the combina torics of the search. Problems with local maxima can be mitigated, albeit expensively, by considering all kwise combinations of heuristics (as in MULTITAC) or level 2 of Adaptive LR 26's search), or by standard numerical optimization approaches such as repeating the hillclimbing search several times from different start points. One final issue is the expense in processing training examples. In the LR26 domain this cost grows linearly with the number of candidates at each hillclimbing step. While this is not bad from a complexity standpoint, it is a pragmatic concern. There have been a few proposals to reduce the expense in gathering statistics. In previous work (Gratch & DeJong, 1992) we exploited properties of the transformations to gather statistics from a single solution attempt. That system required that the heuristic methods only act by pruning refinements that are guaranteed unsatisfiable. Greiner and Jurisica (1992) discuss a similar technique that eliminates this restriction by providing upper and low er bounds on the incremental utility of transformations. Unfortunately, neither of these approaches could be applied to LR26 so devising methods to reduce the processing cost is an important direction for future work. .Conclusions Although many scheduling problems are intractable, for actual sets of constraints and problem distributions, heuristic solutions can provide acceptable performance. A frequent difficulty is that determining appropriate heuristic methods for a given problem class and distribution is a challenging process that draws upon deep knowledge of the domain and the problem solver used. Furthermore, if the problem distribution changes some time in the future, one must manually reevaluate the effectiveness of the heuristics. Acknowledgements Portions of this work were performed by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration and portions at the Beckman Institute, University of Illinois under National Science Foundation Grant NSF-IRI-92-09394. References Baker, A. (1994). The Hazards of Fancy Backtracking. In Proceedings AAAI94. Bell, C., & Gratch, J. (1993). Use of Lagrangian Relaxation and Machine Learning Techniques to Schedule Deep Space Network Data Transmissions. In Proceedings of The 36th Joint National Meeting of the Operations Research Society of America, the Institute of Management Sciences. Biefeld, E., & Cooper, L. (1991). Bottleneck Identification Using Process Chronologies. In Proceed ings IJCAI-91. Chien, S. & Gratch, J. (1994). Producing Satisficing Solutions to Scheduling Problems: An Iterative Constraint Relaxation Approach. In Proceedings of the Second International Conference on Artificial Intelligence Planning Systems. Chien, S., Gratch, J. & Burl, M. (1995). On the Efficient Allocation of Resources for Hypothesis Evaluation: A Statistical Approach. Institute of Electrical and Electronics Engineers Transac tions on Pattern Analysis and Machine Intelligence 17(7), 652-665 . Dechter, R. & Pearl, J. (1987). Network-Based Heuristics for Constraint-Satisf action Problems. Artificial Intelligence 34(1) 1-38. Dechter, R. (1992). Constraint Networks. In Encyclopedia of Artificial Intelligence, Stuart C. Shapi ro (ed.). Etzioni, E. (1990). Why Prodigy/EBL Works. In Proceeding s of AAAI-90. Etzioni. O. & Etzioni, R. (1994). Statistical Methods for Analyzing Speedup Learning Experiments. Machine Learning 14(3), 333-347. Fisher, M. (1981). The Lagrangian Relaxation Method for Solving Integer Programming Problems. Management Science 27 (1) 1-18. Frost, D. & Dechter, R. (1994). In Search of the Best Constraint Satisfaction Search. In Proceedings of AAAI-94. Gaschnig, J. (1979). Performance Measurement and Analysis of Certain Search Algorithms. Tech nical Report CMU-CS-79-124, Carnegie-Mellon University. Ginsberg, M. (1993). Dynamic Backtracking. Journal of Artificial Intelligence Research 1, 25-46. Gratch, J. & DeJong, G. (1992). COMPOSER: A Probabilistic Solution to the Utility Problem in Speed- up Learning. In Proceedings of AAAI-92. Gratch, J., Chien, S., & DeJong, G. (1993). Learning Search Control Knowledge for Deep Space Net work Scheduling. In Proceedings of the Ninth International Conference on Machine Learning. Gratch, J. & DeJong, G. (1996). A Decision-theoretic Approach to McAllester, D. & Rosenblitt, D. (1991). Systematic Nonlinear Planning. In Proceedings of AAAI-91. Minton, S. (1988). Learning Search Control Knowledge: An Explanation-Based Approach, Nor well, MA: Kluwer Academic Publishers. Minton, S. (1993). Integrating Heuristics for Constraint Satisfaction Problems: A Case Study. In Proceedings of AAAI-93. Moore, A. & Lee, M. (1994). Efficient Algorithms for Minimizing Cross Validation Error. In Pro ceedings of the Tenth International Conference on Machine Learning. Nelson, B. & Matejcik, F. (1995). Using Common Random Numbers for Indifference-Zone Selec tion and Multiple Comparisions in Simulation. Management Science. Sacerdoti, E. (1977). A Structure for Plans and Behavior. New York: American Elsevier. Sadeh, N. (1994). Micro-opportunistic Scheduling: The Micro-boss Factory Scheduler. In Intelli gent Scheduling. San Mateo, CA: Morgan Kaufman. Segre, A., Elkan, C., & Russell, A. (1991). A Critical Look at Experimental Evaluations of EBL. Machine Learning 6(2). Smith, S. & Cheng, C. (1993). Slack-based Heuristics for Constraint-satisfacti on Scheduling. In Proceedings of AAAI-93. Smith, S. (1994). OPIS: A Methodology and Architecture for Reactive Scheduling. In Intelligent Scheduling. San Mateo, CA: Morgan Kaufman. Smyth, P. & Mellstrom, J. (1992). Detecting Novel Classes with Applications to Fault Diagnosis. In Proceedings of the Ninth International Conference on Machine Learning. Smyth, P. (1993). Probability Density Estimation and Local Basis Function Neural Networks. Com putational Learning Theory and Natural Learning Systems 2. Stallman, R. & Sussman, G. (1977). Forward Reasoning and Dependency-Directed Backtracking in a System for Computer-Aided Circuit Analysis. Artificial Intelligence 9, 2, 135-196. Stone, P., Veloso, M., & Blythe, J. (1994). The Need for Different Domain-Independent Heuristics. In Proceedings of the Second International Conference on Artificial Intelligence Planning Systems. Subramanian, D. & Hunter, S. (1992). Measuring Utility and the Design of Provably Good EBL Al gorithms. In Proceedings of the Ninth International Conference on Machine Learning. Taha, H. (1982). Operations Research, an Introduction, Macmillan Publishing Co. Wilkins, D. (1988). Practical Planning: Extending the Classical Artificial Intelligence Planning Paradigm. San Mateo, CA: Morgan Kaufman. Xiong, Y., Sadeh, N., & Sycara, K. (1992). Intelligent Backtracking Techniques for Job Shop Sched uling. In Proceedings of the Third International Conference on Knowledge Representation and Reasoning. Yakowitz, S. & Lugosi, E. (1990). Random Search in the Presence of Noise, with Application to Ma chine Learning. Society for Industrial and Applied Mathematics Journal of Scientific and Sta tistical Computing 11, 4, 702-712. Yang, Q. & Murray, C. (1994). An Evaluation of the Temporal Coherence Heuristic for Partial-Order Planning. Computational Intelligence 10, 2.