Intelligent Information Agents Research at Aberdeen =================================================== Contact Details: --------------- Dr Pete Edwards Department of Computing Science King's College University of Aberdeen Aberdeen, AB24 3UE United Kingdom Tel. +44 (0)1224 272270 Fax. +44 (0)1224 273422 Email pedwards@csd.abdn.ac.uk Overview: -------- This report summarises recent activities in the area of intelligent information agents (I2A) at the Department of Computing Science, University of Aberdeen. The focus of these activities is the Learning Agents & Systems group, led by Dr Pete Edwards. The group is concerned with the development of novel machine learning techniques and their application in the context of intelligent information management. The group has strong relationships with a number of commercial organisations, including: British Telecom Research Laboratories; the Scottish Software Federation, and IEE Publishing & Information Services. Links with British Telecom are particularly strong, with two members of the Aberdeen group being (partially) supported by BT. Recent Activities: ----------------- * Novel Learning Methods for User-Profile Acquisition We have explored a variety of existing learning techniques for automated acquisition of user-profiles, including: rule-induction, nearest-neighbour, and Bayesian learning methods. In addition, we have developed our own novel approaches. For example, the IBPL nearest-neighbour learning algorithm employs a set-based representation which is better suited to learning in textual domains. Another area of activity focusses on probabilistic methods, and in particular, enhancements to the naive Bayes algorithm. * Dimensionality Reduction We are currently examining a variety of attribute selection and dimensionality reduction techniques in the context of information management agents. This presents a particular problem in the context of learning from text, where the domain may contain 20,000 - 100,000 terms. A variety of techniques are being explored from the fields of information retrieval and machine learning, including: reduction techniques related to correspondence analysis, weighting methods, filter/wrapper methods. By combining various approaches to identify those attributes/terms that are most relevant to a categorisation task, we hope to improve the accuracy of machine learning techniques used within information agents. On a related note, we are also exploring the role of feature generation methods, which can be applied in the context of information agents to the identification of term groups which result in enhanced user-profile accuracy. This most recent work is being conducted using a probabilistic profile representation. * Information Presentation The use of clustering methods provides a means of organising an information space to suit the needs of an individual (or group of individuals) based on their user-profile/information needs. For example, rather than a Web site existing as a static hypertext hierarchy, we can imagine a more dynamic space, which is organised to reflect the information requirements of a specific user. We have explored a variety of clustering techniques inspired by both the machine learning and information retrieval communities in this context, and are presently working on a novel clustering engine which will exploit existing knowledge of users (and perhaps the information space itself) to perform knowledge-guided clustering on a dataset of several thousand Web pages. * Communication of Inductive Inferences With the growth of large-scale distributed data sources, there is a need for agents capable of extracting useful knuggets of information from such locations - i.e. agents that can perform distributed data-mining. This raises a number of interesting issues, including: How do such agents share the knowledge they have extracted? How are various "knuggets" integrated? What role does the raw data itself play in the process? We are investigating how the Version Space representation can be used to facilitate knowledge sharing in this context, by acting as a common language for data-mining agents. This representation allows agents to integrate their results without requiring them to transmit all or part of the original data. * Applications - General Architecture for Embedded Learning Agents A simple, component-based approach which allowed adaptive information agents to be embedded within existing software tools. Resulting applications include the MAGI (electronic mail), IAN (USENET news) and LAW (Web) systems. Each of these systems is able to acquire (and subsequently adapt) a user-profile which can be used to automate filtering/retrieval tasks. These systems were also used as the basis for a study which investigated the performance of different learning/content extraction techniques for user-profile acquisition. - Off-line Meta-Search Engine The Remora2c meta-search engine (and its variants) supports off-line searching of a variety of commercial search-engines. Remora provides a variety of search modes, including user-profile based searching. Results are clustered before presentation to the user via an email message or Web page. - New Technology Information Agent (Scottish Software Federation) An agent (based on the Remora2c meta-search engine) which offers members of the SSF various facilities for information gathering, including an "occasional search" mode which employs a user-profile to perform automated searches at a frequency determined by the user - weekly. monthly, quarterly. - Agent Support for Scientific Publishing (IEE Publishing/Information Services) Scientific publishing involves a great deal of information processing, much of it performed by experienced editorial staff. One crucial step is the selection of appropriate referees to review a manuscript once it has been submitted. We are presently exploring a number of agent-based solutions to the problem of (semi)automating this step. Selected Publications: --------------------- T R Payne & P Edwards , Interface Agents that Learn: An Investigation of Learning Issues in a Mail Agent Interface, Applied Artificial Intelligence, 11 (1), 1-32, 1997. T R Payne, P Edwards & C L Green, Experience with Rule Induction & k-Nearest Neighbour Methods for Interface Agents that Learn, IEEE Transactions on Knowledge & Data Engineering, 9 (2), 329-335, 1997. P Edwards, D Bayer, C L Green & T R Payne, Experience with Learning Agents which Manage Internet-Based Information, in M A Hearst & H Hirsh (Eds), AAAI 1996 Stanford Spring Symposium on Machine Learning in Information Access, SS-96-05, AAAI Press, 31-40, 1996. P Edwards, C L Green, P C Lockier & T C Lukins, Exploiting Learning Technologies for World Wide Web Agents, IEE Colloquium on Intelligent World Wide Web Agents, Digest No: 97/118, IEE, Savoy Place, London, 3/1-3/7, 1997. W H E Davies & P Edwards, The Communication of Inductive Inferences, in G Weiss (Ed), Distributed Artificial Intelligence Meets Machine Learning: Learning in Multi-Agent Environments, Lecture Notes in Artificial Intelligence 1221, Springer-Verlag, Berlin, 223-241, 1997. Learning Agents & Systems Group Web Pages: ----------------------------------------- http://www.csd.abdn.ac.uk/~pedwards/res/las.html