MIME-Version: 1.0 Server: CERN/3.0 Date: Tuesday, 07-Jan-97 15:56:22 GMT Content-Type: text/html Content-Length: 32358 Last-Modified: Wednesday, 28-Aug-96 15:56:47 GMT Theory Refinement

Theory Refinement

To view a paper, click on the open book image.

  1. Combining Symbolic and Connectionist Learning Methods to Refine Certainty-Factor Rule-Bases
    J. Jeffrey Mahoney
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, May, 1996.

    This research describes the system RAPTURE, which is designed to revise rule bases expressed in certainty-factor format. Recent studies have shown that learning is facilitated when biased with domain-specific expertise, and have also shown that many real-world domains require some form of probabilistic or uncertain reasoning in order to successfully represent target concepts. RAPTURE was designed to take advantage of both of these results.

    Beginning with a set of certainty-factor rules, along with accurately-labelled training examples, RAPTURE makes use of both symbolic and connectionist learning techniques for revising the rules, in order that they correctly classify all of the training examples. A modified version of backpropagation is used to adjust the certainty factors of the rules, ID3's information-gain heuristic is used to add new rules, and the Upstart algorithm is used to create new hidden terms in the rule base.

    Results on refining four real-world rule bases are presented that demonstrate the effectiveness of this combined approach. Two of these rule bases were designed to identify particular areas in strands of DNA, one is for identifying infectious diseases, and the fourth attempts to diagnose soybean diseases. The results of RAPTURE are compared with those of backpropagation, C4.5, KBANN, and other learning systems. RAPTURE generally produces sets of rules that are more accurate that these other systems, often creating smaller sets of rules and using less training time.

  2. Refinement of Bayesian Networks by Combining Connectionist and Symbolic Techniques
    Sowmya Ramanchandran
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1995.

    Bayesian networks provide a mathematically sound formalism for representing and reasoning with uncertain knowledge and are as such widely used. However, acquiring and capturing knowledge in this framework is difficult. There is a growing interest in formulating techniques for learning Bayesian networks inductively. While the problem of learning a Bayesian network, given complete data, has been explored in some depth, the problem of learning networks with unobserved causes is still open. In this proposal, we view this problem from the perspective of theory revision and present a novel approach which adapts techniques developed for revising theories in symbolic and connectionist representations. Thus, we assume that the learner is given an initial approximate network (usually obtained from a expert). Our technique inductively revises the network to fit the data better. Our proposed system has two components: one component revises the parameters of a Bayesian network of known structure, and the other component revises the structure of the network. The component for parameter revision maps the given Bayesian network into a multi-layer feedforward neural network, with the parameters mapped to weights in the neural network, and uses standard backpropagation techniques to learn the weights. The structure revision component uses qualitative analysis to suggest revisions to the network when it fails to predict the data accurately. The first component has been implemented and we will present results from experiments on real world classification problems which show our technique to be effective. We will also discuss our proposed structure revision algorithm, our plans for experiments to evaluate the system, as well as some extensions to the system.

  3. A Novel Application of Theory Refinement to Student Modeling
    Paul Baffes and Raymond J. Mooney
    Proceedings of the Thirteenth National Conference on Aritificial Intelligence, pp. 403-408, Portland, OR, August, 1996. (AAAI-96)

    Theory refinement systems developed in machine learning automatically modify a knowledge base to render it consistent with a set of classified training examples. We illustrate a novel application of these techniques to the problem of constructing a student model for an intelligent tutoring system (ITS). Our approach is implemented in an ITS authoring system called Assert which uses theory refinement to introduce errors into an initially correct knowledge base so that it models incorrect student behavior. The efficacy of the approach has been demonstrated by evaluating a tutor developed with Assert with 75 students tested on a classification task covering concepts from an introductory course on the C++ programming language. The system produced reasonably accurate models and students who received feedback based on these models performed significantly better on a post test than students who received simple reteaching.

  4. Refinement-Based Student Modeling and Automated Bug Library Construction
    Paul Baffes and Raymond Mooney
    Journal of Artificial Intelligence in Education, 7, 1 (1996), pp. 75-116.

    A critical component of model-based intelligent tutoring sytems is a mechanism for capturing the conceptual state of the student, which enables the system to tailor its feedback to suit individual strengths and weaknesses. To be useful such a modeling technique must be practical, in the sense that models are easy to construct, and effective, in the sense that using the model actually impacts student learning. This research presents a new student modeling technique which can automatically capture novel student errors using only correct domain knowledge, and can automatically compile trends across multiple student models. This approach has been implemented as a computer program, ASSERT, using a machine learning technique called theory refinement, which is a method for automatically revising a knowledge base to be consistent with a set of examples. Using a knowledge base that correctly defines a domain and examples of a student's behavior in that domain, ASSERT models student errors by collecting any refinements to the correct knowledege base which are necessary to account for the student's behavior. The efficacy of the approach has been demonstrated by evaluating ASSERT using 100 students tested on a classification task covering concepts from an introductory course on the C++ programming language. Students who received feedback based on the models automatically generated by ASSERT performed significantly better on a post test than students who received simple teaching.

  5. Revising Bayesian Network Parameters Using Backpropagation
    Sowmya Ramachandran and Raymond J. Mooney
    Proceedings of the International Conference on Neural Networks (ICNN-96), Special Session on Knowledge-Based Artificial Neural Networks, Washington DC, June 1996.

    The problem of learning Bayesian networks with hidden variables is known to be a hard problem. Even the simpler task of learning just the conditional probabilities on a Bayesian network with hidden variables is hard. In this paper, we present an approach that learns the conditional probabilities on a Bayesian network with hidden variables by transforming it into a multi-layer feedforward neural network (ANN). The conditional probabilities are mapped onto weights in the ANN, which are then learned using standard backpropagation techniques. To avoid the problem of exponentially large ANNs, we focus on Bayesian networks with noisy-or and noisy-and nodes. Experiments on real world classification problems demonstrate the effectiveness of our technique.

  6. Automatic Student Modeling and Bug Library Construction using Theory Refinement
    Paul T. Baffes
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, December, 1994.

    The history of computers in education can be characterized by a continuing effort to construct intelligent tutorial programs which can adapt to the individual needs of a student in a one-on-one setting. A critical component of these intelligent tutorials is a mechanism for modeling the conceptual state of the student so that the system is able to tailor its feedback to suit individual strengths and weaknesses. The primary contribution of this research is a new student modeling technique which can automatically capture novel student errors using only correct domain knowledge, and can automatically compile trends across multiple student models into bug libraries. This approach has been implemented as a computer program, ASSERT, using a machine learning technique called theory refinement which is a method for automatically revising a knowledge base to be consistent with a set of examples. Using a knowledge base that correctly defines a domain and examples of a student's behavior in that domain, ASSERT models student errors by collecting any refinements to the correct knowledge base which are necessary to account for the student's behavior. The efficacy of the approach has been demonstrated by evaluating ASSERT using 100 students tested on a classification task using concepts from an introductory course on the C++ programming language. Students who received feedback based on the models automatically generated by ASSERT performed significantly better on a post test than students who received simple reteaching.

  7. Comparing Methods For Refining Certainty Factor Rule-Bases
    J. Jeffrey Mahoney and Raymond J. Mooney
    Proceedings of the Eleventh International Workshop on Machine Learning, pp. 173-180, Rutgers, NJ, July 1994. (ML-94)

    This paper compares two methods for refining uncertain knowledge bases using propositional certainty-factor rules. The first method, implemented in the RAPTURE system, employs neural-network training to refine the certainties of existing rules but uses a symbolic technique to add new rules. The second method, based on the one used in the KBANN system, initially adds a complete set of potential new rules with very low certainty and allows neural-network training to filter and adjust these rules. Experimental results indicate that the former method results in significantly faster training and produces much simpler refined rule bases with slightly greater accuracy.

  8. Modifying Network Architectures For Certainty-Factor Rule-Base Revision
    J. Jeffrey Mahoney and Raymond J. Mooney
    Proceedings of the International Symposium on Integrating Knowledge and Neural Heuristics 1994, pp. 75-85, Pensacola, FL, May 1994. (ISIKNH-94)

    This paper describes RAPTURE --- a system for revising probabilistic rule bases that converts symbolic rules into a connectionist network, which is then trained via connectionist techniques. It uses a modified version of backpropagation to refine the certainty factors of the rule base, and uses ID3's information-gain heuristic (Quinlan) to add new rules. Work is currently under way for finding improved techniques for modifying network architectures that include adding hidden units using the UPSTART algorithm (Frean). A case is made via comparison with fully connected connectionist techniques for keeping the rule base as close to the original as possible, adding new input units only as needed.

  9. Extending Theory Refinement to M-of-N Rules
    Paul T. Baffes and Raymond J. Mooney
    Informatica, 17 (1993), pp. 387-397.

    In recent years, machine learning research has started addressing a problem known as theory refinement. The goal of a theory refinement learner is to modify an incomplete or incorrect rule base, representing a domain theory, to make it consistent with a set of input training examples. This paper presents a major revision of the EITHER propositional theory refinement system. Two issues are discussed. First, we show how run time efficiency can be greatly improved by changing from a exhaustive scheme for computing repairs to an iterative greedy method. Second, we show how to extend EITHER to refine MofN rules. The resulting algorithm, Neither (New EITHER), is more than an order of magnitude faster and produces significantly more accurate results with theories that fit the MofN format. To demonstrate the advantages of NEITHER, we present experimental results from two real-world domains.

  10. Learning to Model Students: Using Theory Refinement to Detect Misconceptions
    Paul T. Baffes
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1993.

    A new student modeling system called ASSERT is described which uses domain independent learning algorithms to model unique student errors and to automatically construct bug libraries. ASSERT consists of two learning phases. The first is an application of theory refinement techniques for constructing student models from a correct theory of the domain being tutored. The second learning cycle automatically constructs the bug library by extracting common refinements from multiple student models which are then used to bias future modeling efforts. Initial experimental data will be presented which suggests that ASSERT is a more effective modeling system than other induction techniques previously explored for student modeling, and that the automatic bug library construction significantly enhances subsequent modeling efforts.

  11. Symbolic Revision of Theories With M-of-N Rules
    Paul T. Baffes and Raymond J. Mooney
    Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1135-1140, Chambery, France, 1993. (IJCAI-93)

    This paper presents a major revision of the EITHER propositional theory refinement system. Two issues are discussed. First, we show how run time efficiency can be greatly improved by changing from a exhaustive scheme for computing repairs to an iterative greedy method. Second, we show how to extend EITHER to refine M-of-N rules. The resulting algorithm, NEITHER (New EITHER), is more than an order of magnitude faster and produces significantly more accurate results with theories that fit the M-of-N format. To demonstrate the advantages of NEITHER, we present preliminary experimental results comparing it to EITHER and various other systems on refining the DNA promoter domain theory.

  12. Combining Connectionist and Symbolic Learning to Refine Certainty-Factor Rule-Bases
    J. Jeffrey Mahoney and Raymond J. Mooney
    Connection Science, 5 (1993), pp. 339-364. (Special issue on Architectures for Integrating Neural and Symbolic Processing)

    This paper describes Rapture --- a system for revising probabilistic knowledge bases that combines connectionist and symbolic learning methods. Rapture uses a modified version of backpropagation to refine the certainty factors of a Mycin-style rule base and it uses ID3's information gain heuristic to add new rules. Results on refining three actual expert knowledge bases demonstrate that this combined approach generally performs better than previous methods.

  13. Refinement of First-Order Horn-Clause Domain Theories
    Bradley L. Richards and Raymond J. Mooney
    Machine Learning 19,2 (1995), pp. 95-131.

    Knowledge acquisition is a difficult and time-consuming task, and as error-prone as any human activity. The task of automatically improving an existing knowledge base using learning methods is addressed by a new class of systems performing theory refinement. Until recently, such systems were limited to propositional theories. This paper presents a system, FORTE (First-Order Revision of Theories from Examples), for refining first-order Horn-clause theories. Moving to a first-order representation opens many new problem areas, such as logic program debugging and qualitative modelling, that are beyond the reach of propositional systems. FORTE uses a hill-climbing approach to revise theories. It identifies possible errors in the theory and calls on a library of operators to develop possible revisions. The best revision is implemented, and the process repeats until no further revisions are possible. Operators are drawn from a variety of sources, including propositional theory refinement, first-order induction, and inverse resolution. FORTE has been tested in several domains including logic programming and qualitative modelling.

  14. Combining Symbolic and Neural Learning to Revise Probabilistic Theories
    J. Jeffrey Mahoney & Raymond J. Mooney
    Proceedings of the 1992 Machine Learning Workshop on Integrated Learning in Real Domains, Aberdeen Scotland, July 1992.

    This paper describes RAPTURE --- a system for revising probabilistic theories that combines symbolic and neural-network learning methods. RAPTURE uses a modified version of backpropagation to refine the certainty factors of a Mycin-style rule-base and it uses ID3's information gain heuristic to add new rules. Results on two real-world domains demonstrate that this combined approach performs as well or better than previous methods.

  15. Using Theory Revision to Model Students and Acquire Stereotypical Errors
    Paul T. Baffes and Raymond J. Mooney
    Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, pp. 617-622, Bloomington, IN, July 1992.

    Student modeling has been identified as an important component to the long term development of Intelligent Computer-Aided Instruction (ICAI) systems. Two basic approaches have evolved to model student misconceptions. One uses a static, predefined library of user bugs which contains the misconceptions modeled by the system. The other uses induction to learn student misconceptions from scratch. Here, we present a third approach that uses a machine learning technique called theory revision. Using theory revision allows the system to automatically construct a bug library for use in modeling while retaining the flexibility to address novel errors.

  16. A Preliminary PAC Analysis of Theory Revision
    Raymond J. Mooney
    March 1992
    Computational Learning Theory and Natural Learning Systems, Vol. 3, T. Petsche, S. Judd, and S. Hanson, Eds., MIT Press, 1995, pp. 43-53.

    This paper presents a preliminary analysis of the sample complexity of theory revision within the framework of PAC (Probably Approximately Correct) learnability theory. By formalizing the notion that the initial theory is ``close'' to the correct theory we show that the sample complexity of an optimal propositional Horn-clause theory revision algorithm is $O( ( \ln 1 / \delta + d \ln (s_0 + d + n) ) / \epsilon)$, where $d$ is the {\em syntactic distance} between the initial and correct theories, $s_0$ is the size of initial theory, $n$ is the number of observable features, and $\epsilon$ and $\delta$ are the standard PAC error and probability bounds. The paper also discusses the problems raised by the computational complexity of theory revision.

  17. Automated Debugging of Logic Programs via Theory Revision
    Raymond J. Mooney & Bradley L. Richards
    Proceedings of the Second International Workshop on Inductive Logic Programming, Tokyo, Japan, June 1992.

    This paper presents results on using a theory revision system to automatically debug logic programs. FORTE is a recently developed system for revising function-free Horn-clause theories. Given a theory and a set of training examples, it performs a hill-climbing search in an attempt to minimally modify the theory to correctly classify all of the examples. FORTE makes use of methods from propositional theory revision, Horn-clause induction (FOIL), and inverse resolution. The system has has been successfully used to debug logic programs written by undergraduate students for a programming languages course.

  18. Batch versus Incremental Theory Refinement
    Raymond J. Mooney
    Proceedings of AAAI Spring Symposium on Knowledge Assimilation, Standford, CA, March, 1992.

    Most existing theory refinement systems are not incremental. However, any theory refinement system whose input and output theories are compatible can be used to incrementally assimilate data into an evolving theory. This is done by continually feeding its revised theory back in as its input theory. An incremental batch approach, in which the system assimilates a batch of examples at each step, seems most appropriate for existing theory revision systems. Experimental results with the EITHER theory refinement system demonstrate that this approach frequently increases efficiency without significantly decreasing the accuracy or the simplicity of the resulting theory. However, if the system produces bad initial changes to the theory based on only small amount of data, these bad revisions can ``snowball'' and result in an overall decrease in performance.

  19. A Multistrategy Approach to Theory Refinement
    Raymond J. Mooney & Dirk Ourston
    Machine Learning: A Multistrategy Approach, Vol. IV, R.S. Michalski & G. Teccuci (eds.), pp.141-164, Morgan Kaufman, San Mateo, CA, 1994.

    This chapter describes a multistrategy system that employs independent modules for deductive, abductive, and inductive reasoning to revise an arbitrarily incorrect propositional Horn-clause domain theory to fit a set of preclassified training instances. By combining such diverse methods, EITHER is able to handle a wider range of imperfect theories than other theory revision systems while guaranteeing that the revised theory will be consistent with the training data. EITHER has successfully revised two actual expert theories, one in molecular biology and one in plant pathology. The results confirm the hypothesis that using a multistrategy system to learn from both theory and data gives better results than using either theory or data alone.

  20. Theory Refinement Combining Analytical and Empirical Methods
    Dirk Ourston and Raymond J. Mooney
    Artificial Intelligence, 66 (1994), pp. 311--344.

    This article describes a comprehensive approach to automatic theory revision. Given an imperfect theory, the approach combines explanation attempts for incorrectly classified examples in order to identify the failing portions of the theory. For each theory fault, correlated subsets of the examples are used to inductively generate a correction. Because the corrections are focused, they tend to preserve the structure of the original theory. Because the system starts with an approximate domain theory, in general fewer training examples are required to attain a given level of performance (classification accuracy) compared to a purely empirical system. The approach applies to classification systems employing a propositional Horn-clause theory. The system has been tested in a variety of application domains, and results are presented for problems in the domains of molecular biology and plant disease diagnosis.

  21. Improving Shared Rules in Multiple Category Domain Theories
    Dirk Ourston and Raymond J. Mooney
    Proceedings of the Eighth International Machine Learning Workshop, pp. 534-538, Evanston, IL, June 1991.

    This paper presents an approach to improving the classification performance of a multiple category theory by correcting intermediate rules which are shared among the categories. Using this technique, the performance of a theory in one category can be improved through training in an entirely different category. Examples of the technique are presented and experimental results are given.

  22. Constructive Induction in Theory Refinement
    Raymond J. Mooney and Dirk Ourston
    Proceedings of the Eighth International Machine Learning Workshop, pp. 178-182, Evanston, IL. June 1991.

    This paper presents constructive induction techniques recently added to the EITHER theory refinement system. These additions allow EITHER to handle arbitrary gaps at the ``top,'' ``middle,'' and/or ``bottom'' of an incomplete domain theory. Intermediate concept utilization employs existing rules in the theory to derive higher-level features for use in induction. Intermediate concept creation employs inverse resolution to introduce new intermediate concepts in order to fill gaps in a theory that span multiple levels. These revisions allow EITHER to make use of imperfect domain theories in the ways typical of previous work in both constructive induction and theory refinement. As a result, EITHER is able to handle a wider range of theory imperfections than does any other existing theory refinement system.

  23. Theory Refinement with Noisy Data
    Raymond J. Mooney and Dirk Ourston
    Technical Report AI 91-153, Artificial Intelligence Lab, University of Texas at Austin, March 1991.

    This paper presents a method for revising an approximate domain theory based on noisy data. The basic idea is to avoid making changes to the theory that account for only a small amount of data. This method is implemented in the EITHER propositional Horn-clause theory revision system. The paper presents empirical results on artificially corrupted data to show that this method successfully prevents over-fitting. In other words, when the data is noisy, performance on novel test data is considerably better than revising the theory to completely fit the data. When the data is not noisy, noise processing causes no significant degradation in performance. Finally, noise processing increases efficiency and decreases the complexity of the resulting theory.


    estlin@cs.utexas.edu