From gps0@harvey.gte.com Wed Mar 2 18:19:21 EST 1994 Article: 20911 of comp.ai Xref: glinda.oz.cs.cmu.edu comp.ai:20911 Newsgroups: comp.ai Path: honeydew.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!news.mic.ucla.edu!library.ucla.edu!europa.eng.gtefsd.com!MathWorks.Com!noc.near.net!ceylon!harvey.gte.com!gps0 From: gps0@harvey.gte.com (Gregory Piatetsky-Shapiro) Subject: Data Mining / Knowledge Discovery references Message-ID: Keywords: Data Mining, Knowledge Discovery, Databases Sender: news@gte.com (USENET News System) Organization: GTE Laboratories, Inc. Date: Wed, 2 Mar 1994 15:05:42 GMT Lines: 72 Several people have recently requested references on Knowledge Discovery/Data Mining. I enclose a brief list of recent references. You can be up to date on those topics by subscribing to KDD Nuggets List (e-mail to kdd-request@gte.com). -- Gregory Piatetsky-Shapiro (gps@gte.com) ================================================================ Gregory Piatetsky-Shapiro, Ph.D. Principal Member of Technical Staff GTE Laboratories, MS-45 e-mail: gps@gte.com 40 Sylvan Road fax: (617) 466-2960 Waltham MA 02154-1120 USA phone:(617) 466-4236 ==================================================== --------- Overview Articles ----------- C. Matheus, P. Chan, G. Piatetsky-Shapiro, Systems for Knowledge Discovery in Databases, Special Issue on Learning and Discovery in Databases, IEEE Transactions on Knowledge and Data Engineering, Vol 5, No 6, Dec. 1993. W. Frawley, G. Piatetsky-Shapiro, and C. Matheus, 1992. Knowledge Discovery in Databases: An Overview. AI Magazine, Fall 1992. Reprint of the introductory chapter of {\em Knowledge Discovery in Databases} collection, AAAI/MIT Press, 1991. Data Mining: Intelligent Technology Gets down to Business", PC AI (Nov - Dec 1993). ------ Collections and Books IEEE Transactions on Knowledge and Data Engineering, special issue on Learning and Discovery in Databases, N. Cercone and M. Tsuchiya, guest editors, Vol 5, No 6, Dec 1993 Machine Learning Journal, special issue on Machine Discovery, Jan Zytkow, guest editor, 12(1-3), 1993. KDD-93: Proceedings of AAAI-93 Knowledge Discovery in Databases workshop, G. Piatetsky-Shapiro, editor, AAAI Press technical report WS-02, July 1993 K. Parsaye and M. Chignell, 1993. Intelligent Database Tools & Applications, John Wiley. Special issue on Knowledge Discovery in Databases and KnowledgeBases, International Journal of Intelligent Systems, Vol 7, no, 7, Sep 1992, G. Piatetsky-Shapiro, guest editor. edited selection of best papers from KDD-91 workshop G. Piatetsky-Shapiro and W. Frawley, 1991. Editors, {\em Knowledge Discovery in Databases}, Cambridge, Mass.: AAAI/MIT Press. a collection of state-of the art research papers W. H. Inmon and S. Osterfelt, 1991. {\em Understanding Data Pattern Processing: the key to Competitive Advantage}. QED Technical Publishing Group, Wellesley, MA. a business-oriented, nontechnical book -- Gregory Piatetsky-Shapiro GTE Laboratories, MS-45 email: gps0@gte.com 40 Sylvan Road fax: (617) 466-2960 Waltham MA 02254 USA phone: (617) 466-4236 Article 21055 of comp.ai: Xref: glinda.oz.cs.cmu.edu comp.ai:21055 Newsgroups: comp.ai Path: honeydew.srv.cs.cmu.edu!fs7.ece.cmu.edu!europa.eng.gtefsd.com!howland.reston.ans.net!newsserver.jvnc.net!raffles.technet.sg!newsserver.iti.gov.sg!news From: sherry@iti.gov.sg (Long Ai Sin Sherry) Subject: Summary of Replies for "Help: Data Mining Tools" Message-ID: Sender: news@exodus.iti.gov.sg (USENET News System) Reply-To: sherry@iti.gov.sg Organization: Information Technology Institute, National Computer Board, S'pore Date: Fri, 11 Mar 1994 11:16:59 GMT Lines: 751 Hello all, Due to popular demand, here is a summary of replies for the subject : "Help : Data Mining Tools" which i posted 2 weeks ago. I would like to take this opportunitly to thank all netters who responded to the subject. Hope you will be able to find some useful information from here. Any more replies to this subject are welcome. Questions posted : >Hi! > >I'm currently doing some evaluation on data mining tools. >I would appreciate it very much if anyone could : > >1) refer me a list of data mining tools available in the > market; or > >2) recommend me some good data mining tools; or > >3) recommend me some data mining tools that are capable of > doing unsupervised learning; or > >4) provide me pointers on any tool evaluation/comparison > reports; *********************************************************************** REPLY 1 (consolidated version) *********************************************************************** Overview report : Data Mining - The Search for Knowledge in Databases Marcel Holsheimer, Arno Siebes Univ. of Amsterdam available via anonymous ftp to : ftp.cwi.nl cd pub/CWIreports/AA get CS.R9406.ps.Z *********************************************************************** REPLY 2 *********************************************************************** From: tgorb@rrc.chevron.com (Joe Gorberg) Organization: Chevron, Richmond, California A couple of suggestions: 1. Contact HNC, Inc. in San Diego. They developed a tool called Database Mining. I think they even registered the name as a trademark or something. Anyway, check out what they have to offer. 2. I just purchased and received IDIS from IntelligenceWare, Inc. I can't recommend the package yet as I have only used it for a few hours. Not too impressed so far, but I really need to understand what it's doing and how to interpret the results. It develops a set of rules to define correlations and cause-effect based on one or more goals which you set. 3. On the mac side a good visualization tool I like and recommend is Data Desk (you can get it from Egghead and MacWarehouse). Its a stat. package with excellent graphics for x-y-z rotating plots, histograms and much more. It really has helped me get value out of neural nets and understanding the data. Good luck. If you come across anything else, please let me know. Joe Gorberg Chevron Research and Technology Co. tgorb@chevron.com (510) 242-2378 *********************************************************************** REPLY 3 *********************************************************************** From: saswss@hotellng.unx.sas.com (Warren Sarle) Organization: SAS Institute Inc. In article <16023@lhdsy1.lahabra.chevron.com>, tgorb@rrc.chevron.com (Joe Gorberg) writes: |> ... |> 2. I just purchased and received IDIS from IntelligenceWare, Inc. I can't recommend |> the package yet as I have only used it for a few hours. Not too impressed so far, |> but I really need to understand what it's doing and how to interpret the |> results. It develops a set of rules to define correlations and cause-effect based on |> one or more goals which you set. Cause and effect cannot be established without running an experiment (a real experiment, not some simulation) in which the potential causes are experimentally manipulated. Any AI software or stat software that claims otherwise is lying. |> 3. On the mac side a good visualization tool I like and recommend is Data Desk (you |> can get it from Egghead and MacWarehouse). Its a stat. package with excellent graphics |> for x-y-z rotating plots, histograms and much more. It really has helped me get value |> out of neural nets and understanding the data. JMP is also good. Warren S. Sarle SAS Institute Inc. The opinions expressed here saswss@unx.sas.com SAS Campus Drive are mine and not necessarily (919) 677-8000 Cary, NC 27513 those of SAS Institute. *********************************************************************** REPLY 4 *********************************************************************** From A.N.Pryke@computer-science.birmingham.ac.uk Fri Mar 11 03:37:13 1994 Hi. I've not got much info on tools, what I have got is a posting by Sandra Oudshoff sumarizing replies to her request for info on tools, I guess you've got this already, but I'll send it anyway. I've also got a short article on dblearn, which I'll include. Andy. ----------------------------------------------------------------------- Start Enclosure 1 ----------------------------------------------------------------------- ---------------------------------------------------------------------------- Published by The Centre For Systems Science Simon Fraser University Burnaby, BC Canada V5A 1S6 604-291-3455 Editor: Barry Shell shell@sfu.ca ---------------------------------------------------------------------------- ************************************************************************ Data Mining ************************************************************************ New computer programs can probe vast databases searching for patterns. They promise to extract useful knowledge from the rapidly mounting store of boring data created by the information age. ======================================================================== We are drowning in a flood of computer information. Raw data from banks, hospitals, and credit card transactions; digitized images from space, geographical information systems, and computer scanners; mailing lists, gene maps, on-line news, market reports, and demographic surveys. "We've got lots and lots of data in computer databases. It's everywhere!" says Jiawei Han, CSS member and associate professor of computing science at SFU, "But people are getting bored of searching the raw data." Sure, it's easy to find John Doe's bank balance, but what general knowledge can be drawn from all the bank's data? Han, together with Nick Cercone and associates, has created a computer program called dblearn that, if unleashed on a bank's database, might ferret out the fact that 38% of the largest savings accounts are maintained by little old ladies living within two kilometres of the bank. Now the data starts to get interesting. Easy to program and designed for the most common type of database in use (so-called relational databases because they store information in related tables), dblearn is starting to attract attention. The group has invitations to speak at the 1st International Conference on Knowledge and Information Management in Baltimore, MD and at Computer World '92 in Kobe, Japan. "People are excited," says Han, "because the new technology can influence policy making. You can now get precise general information that was originally buried in tons of little details. Now, in minutes, you can get what you need to put forth good arguments about policy." How it works Data tables in a relational database are organised in columns and rows, each column holding one attribute like a person's age in years, while each row might correspond to a different person. dblearn works its magic in three phases. First the database must be preconditioned by a computer science professional called a knowledge engineer who creates a framework for mining the information. Han explains that many data attributes can be clarified through the formation of conceptual hierarchies. For example, one conceptual hierarchy for age might look like this: 0-1 -> infant, 2-5 -> preschool, 6-12 -> primary school, 13-19 -> teenager, 20-30 -> young adult, 31-65 -> middle age, 66+ -> senior citizen. Depending on the nature of the data, the hierarchy could be further refined. For instance, if most of the people in the database were middle aged and older, the knowledge engineer might add the following information: {infant, preschool, primary school, teenager}fichild. This tells dblearn that it can lump anyone under 19 into the category "child" adding a further level to the conceptual hierarchy. Once the database is prepared, a minimally trained clerk can enter a learning request to dblearn . This small English-like program defines how to extract the knowledge from the raw data (see text box, next page). Finally, dblearn sets about the task by applying what Han calls internal learning strategies to the data. dblearn "learns" from a database in three steps while it creates a simplified temporary version of the data in memory. First it checks all the attributes to see if any can be decomposed into smaller units. For example, a birthdate might actually contain three pieces of information: day, month, and year. "We must be careful not to throw away anything potentially useful during the learning process," says Han. This is especially important because the next step involves removing attribute columns that are of no use. For instance, people's last names are mostly unique so they won't yield any general rules. After decomposition and removal, dblearn tries to reduce the complexity of the data by substituting actual values with the more general terms defined in its conceptual hierarchy. It might replace "Manitoba" with "Prairies". Then it counts up rows with matching values and copies them over to become one row in the temporary table. The program also creates a new column where it stores this count. If it can move up the conceptual hierarchy to generalize the data further, it does more substitution, counting and copying. Eventually a table of information emerges revealing previously hidden facts. In the future, Han and Cercone feel concept hierarchies could be generated by automatic statistical analysis of a column's contents. With easy to use interfaces that understand plain English, such data-mining programs will become commonplace in the information society. TEXT BOX ======================================================================== FINDING GOLD IN A MOUNTAIN OF INFORMATION NSERC (the Natural Sciences and Engineering Research Council of Canada) gave the group access to their Grants Information database containing information about all the research grants awarded in 1990-91. The central relation table, award, contains 10,087 tupples (rows) with eleven attributes (columns). The dblearn software extracts knowledge in three steps as follows: Phase I: The Conceptual Hierarchy For attribute province: {Alberta, Saskatchewan, Manitoba} -> Prairies {New Brunswick, Nova Scotia, Newfoundland, Prince Edward Island} -> Maritimes For attribute amount: 1-19,999 -> 1_20K, 20,000-39,999 -> 20_40K, etc... For attribute disc_code (discipline code): 26000-26499 -> AI Phase II: The Learning Request learn characteristic rule for disc_code = "AI" from award where grant_code = "Operating_Grants" in relevance to amount, province, prop(vote), prop(amount) ("prop" is an internal function that gives the percentage of total) Phase III: Internal Learning and Results Attribute-oriented induction based on the above conceptual hierarchy and learning request has interesting results. We discover, among other things, that for operating grants in AI between $20,000 and $40,000, BC beats lion's-share grant-winner Ontario; or that Quebec AI research funding clusters at the low and high ends. ======================================================================== ----------------------------------------------------------------------- End Enclosure 1 ----------------------------------------------------------------------- ----------------------------------------------------------------------- Start Enclosure 2 ----------------------------------------------------------------------- Article: 5687 in comp.ai Newsgroups: comp.ai From: oudshoff@sun019.research.ptt.nl (Sandra Oudshoff) Subject: Summary: tools for information harvesting Message-ID: <1993Oct8.142256.11343@spider.research.ptt.nl> Keywords: information harvesting, data mining, tools Sender: oudshoff@sun019 (Sandra Oudshoff) Nntp-Posting-Host: sun019.research.ptt.nl Organization: PTT Research, Groningen, The Netherlands Date: Fri, 8 Oct 1993 14:22:56 GMT Hi all, This posting summarizes the information sent to me by several netters in reply to my post for information about commercial software tools for information harvesting (or data mining). I hope you will find some useful information in here. ------------------------------------------------------------------------ >From kdd%eureka@gte.com Thu Sep 23 15:38:28 1993 Some Commercially Available Products for Intelligent Discovery in Databases. Gregory Piatetsky-Shapiro (gps0@gte.com) GTE Laboratories, 40 Sylvan Road, Waltham MA 02154 Last updated: July 1993 Here I will discuss only the products with AI-related approaches. Other tools, such as statistical and forecasting methods or scientific visualization packages, are not discussed. This is an informal list, representing only MY PERSONAL opinions, and not opinions of GTE or GTE Laboratories. It is not intended to be a complete survey or endorsement of any kind. I also do not have any financial interest in any of the companies below. Ads for other intelligent tools can be found in AI Magazine, AI Expert, IEEE Expert, PC AI, Expert Systems, and similar magazines. Index to tools (listed alphabetically by tool name) AIM from AbTech AUTOCLASS from NASA Database Mining software from HNC Datalogic/R from Reduct Systems Information Harvesting from Ryan Associates IXL/IDIS from IntelligenceWare KnowledgeSeeker from FirstMark Technologies NEXTRA from Neuron Data PC-MARS from Data Patterns, RECON for Data Mining from Lockheed Detailed descriptions: ------------------------ AIM from: AbTech, 700 Harris Street, Charlottesville, VA 22901. (804) 977-0686. It automatically synthesizes network solutions from databases of examples. It uses 1-, 2, and 3-dimensional polynomials. ------------------------- AUTOCLASS from NASA "AutoClass: A Bayesian Classification System", Peter Cheeseman, James Kelly, Matthew Self, John Stutz, Will Taylor, Don Freeman. Presented at the Fifth International Conference on Machine Learning. WHAT IS AUTOCLASS: AutoClass is an unsupervised Bayesian classification system for independent data. It seeks a maximum posterior probability classification. Inputs consist of a database of attribute vectors and a class model defined by a parametric class probability function and corresponding parameter priors. Models are constructed from a specified set of terms appropriate to both discrete and real valued attributes. AutoClass attempts to find the set of classes that is maximally probable with respect to the data and model. The output is a set of classes given as instances of the model with specific parameters. There are facilities for reporting on the classes, the influence of the attributes on the classes, and the probability weighting of the data over the classes. Running AutoClass requires a Common Lisp environment. It has been successfully run on Symbolics and Explorer Lisp machines, on the Franz and Sun/Lucid Lisp implementations on the Sun and similar Un*x platforms, and on the Macintosh personnel computer. The most recent release I could find is AutoClass III (Version 3.0.3); you should be able to locate your nearest server using archie -s autoclass If you don't have an archie client installed, telnet to archie.ans.net and login as archie. ------------------------- RECON for Data Mining from Lockheed. Advertised at AAAI-93. from RECON Brochure: Capabilities: Pattern Discovery, Pattern Validation, Summarization, Decision Support. Top-down and Bottom-Up data mining. Contact: Dr. Evangelos Simoudis, Lockheed AI Center, 3251 Hanover Street, Palo Alto CA 94304 Voice: (415) 354-5271 Fax: 415-424-3425 ------------------------ IXL/IDIS Discovery Machine from: IntelligenceWare, 5933 West Century Blvd., Los Angeles, CA 90045, (213) 216-6177 IXL is a sophisticated product, with fancy screen layout and many features. IXL finds the most interesting rules in data, using a symbolic learning approach. I have used it and it is nice. IntelligenceWare also has several other related products. IntelligenceWare also sells Data Visualization Tool - a package for automatically generating 2D and 3D graphs from data, -- pattern discovery by using human visual abilities ------------------------ KnowledgeSeeker from: FirstMark Technologies Ltd, 14 Concourse Gate, Suite 600, Ottawa, Ontario, Canada K2E 7S8, 613-723-8020. It automatically builds a decision tree for your concept, and does many other interesting things. I have used and it has a good user interface. -------------------- NEXTRA from: Neuron Data, 156 University Ave., Palo Alto, CA 94301. 1-800-876-4900 It is an impressive tool able to synthesize rules from user preferences. Nice graphical abilities. -------------------- Database Mining Software (Last Updated 7-28-93) from: HNC (San Diego, CA), 1-800-HNC-EXPR. 1-619-546-8877 (ask for Scott Crispie). It uses a more classical neural net approach. After training a net to recognize a concept, it uses a patented method to extract rules that correspond to the net. (a partial description of that method in paper by Steve Gallant Connectionist Expert Systems, Communications of ACM 31(2):153-168, 1988) ------------------------- PC-MARS, Data Patterns, 528 S. 45th street, Philadelphia, PA 19104, (215) 387-1844. 495 (Dec 92). is a software package for developing models of non-linear multivariable processes from past input/output data, useful for predicting future outputs. Advertised as an alternative to neural networks, helps user to understanfd the process being modelled. Provides graphical tools. IBM PC and compatibles. -------------------- Datalogic/R (formerly DataQuest) from: Reduct Systems, Regina, Canada. (306) 586-9408, fax (306) 586 9442. Software for data mining using a rough set approach. (see AI Expert, March 1993, for an Ad) ------------------------------------------------------------------------ RECON software, contact: Dr. Evangelos Simoudis, Lockheed AI Center 3251 Hanover Street Palo Alto, Ca 94304 simoudis@aic.lockheed.com Indeed Recon is able to perform the type of tasks you are interested in accomplishing with information harvesting. Today I will send you a video showing how two of Recon's components can be used to extract rule-based models from a database with data about stocks. I will also send you other documentation that describes some of the applications we have developed, as well as pricing information. I hope you find the information useful. I was glad to hear you received the information I sent you. Lockheed, has offices throughout Europe including one in Brussels and a representative in the Netherlands. We are currently negotiating with three European software firms to provide Recon support, in addition to the support we provide from the United States. Thus far, our work on data mining has been performed through large contracts with large companies and the federal government. For this reason, we have been able to provide support from our home base in California as well as by traveling directly to the customer's site, if the situation warranted it. Given that all of our customers are on the East Coast of the United States, 4.000km from California, I hope you can appreciate that we can deliver support to anywhere we need. The tape that I sent you mentions that Recon includes neural and statistical modules. For example, Lockheed has developed the Probabilistic Neural Network and General Regression Neural Network that have been recognized as providing the best results among competing neural network algorithms. Of course, we are also working with the more traditional neural network algorithms such as the back propagation and its variants. The operation of these modules is not shown in the tape. Furthermore, the tape does not show the data visualization module of Recon. Our approach to work on data mining is the following: 1. We know that a single mining technique will not be appropriate for *every* type of data. For example, neural networks can deal with certain data sets that statistics cannot. Similarly, symbolic learning techniques can work better than neural networks with other data sets. For this reaon we have develped a toolbox of techniques to perform top-down and bottom-up data mining. 2. We evaluate the customer's data and work with the customer to define the types of data mining operations that the customer will need to perform. Lockheed then recommends to the customer the components of the toolbox that will be most appropriate to the customer's data and the type of operations that must be performed. 3. We tailor the Recon system to include only the techniques that the customer and Lockheed have agreed upon. In this way, each time the customer achieves the best possible results from the data mining operation. The video tape I sent you demonstrates the operation of an actual system we delivered to a financial company. This customer did not want any visualization capabilities in the version of Recon we delivered. As a result, the visualization component of our toolbox was not delivered. Of course, being a market driven group we will be willing to discuss other possible configurations of the Recon system which will of interest to your company. Please do not hesitate to contact me for any other information you may want on Recon's capabilities that will help you in your evaluation. Thank you and regards, Evangelos Simoudis ------------------------------------------------------------------------- AUTOCLASS >From schmid@bastille.berkeley.edu Thu Sep 9 12:01:25 1993 Organization: University of California, Berkeley Check out the Knowledge Discovery in Databases proceedings. Check out AutoClass, an unsupervised Bayesian classifcation system which learns classifications from data. Developed by Peter Cheeseman et al at NASA Ames. (cheesem@ptolemy.arc.nasa.gov) This has done some pretty impressive things. Lots of papers on it if you want background. Sorry that I'm only familiar with probabilistic approaches. scott. I am sorry to say that our latest version of the program, AutoClass X, is not available internationally. However, an earlier version, AutoClass III, is available from COSMIC, see below. Also, below that is a list of references. Will Taylor Recom Technologies (415)604-3364 Artificial Intelligence Research Branch - Code FIA NASA Ames Research Center MS 269-2, Moffett Field, CA 94035-1000 taylor@ptolemy.arc.nasa.gov AutoClass III is the official released implementation of AutoClass available from COSMIC (NASA's software distribution agency): COSMIC University of Georgia 382 East Broad Street Athens, GA 30602 USA voice: (706) 542-3265 fax: (706) 542-4807 telex: 41- 190 UGA IRC ATHENS e-mail: cosmic@@uga.bitnet or service@@cossack.cosmic.uga.edu Request "AutoClass III - Automatic Class Discovery from Data (ARC-13180)". ---------------------------------------------------------------------- ARC-13180 - AutoClass: Automatic Class Discovery from Data ------------------------------------------------------------- The standard approach to classification in much of artificial intelligence and statistical pattern recognition research involves partitioning of the data into separate subsets, known as classes. AUTOCLASS III, from NASA Ames Research Center, uses the Bayesian approach in which classes are described by probability distributions over the attributes of the objects, specified by a model function and its parameters. The calculation of the probability of each object's membership in each class provides a more intuitive classification than absolute partitioning techniques. AUTOCLASS III is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or omitted. The user specifies a class probability distribution function by associating attribute sets with supplied likelihood function terms. AUTOCLASS then searches in the space of class numbers and parameters for the maximally probable combination. It returns the set of class probability function parameters, and the class membership probabilities for each data instance. AUTOCLASS III, ARC-13180, is written in Common Lisp, and is designed to be platform independent. This program has been successfully run on Symbolics and Explorer Lisp machines. It has been successfully used with the following implementations of Common LISP on the Sun: Franz Allegro CL, Lucid Common Lisp, and Austin Kyoto Common Lisp and similar UNIX platforms; under the Lucid Common Lisp implementations on VAX/VMS v5.4, VAX/Ultrix v4.1, and MIPS/Ultrix v4, rev. 179; and on the Macintosh personal computer. The minimum Macintosh required is the IIci. This program will not run under CMU Common Lisp or VAX/VMS DEC Common Lisp. A minimum of 8Mb of RAM is required for Macintosh platforms and 16Mb for workstations. The standard distribution medium for this program is a .25 inch streaming magnetic tape cartridge in UNIX tar format. It is also available on a 3.5 inch diskette in UNIX tar format and a 3.5 inch diskette in Macintosh format. An electronic copy of the documentation is included on the distribution medium. Domestic pricing is $900 for the program, and $21 for the documentation -- there is a 50% educational discount. International pricing is $1800 for the program, and $42 for the documentation -- there is *no* educational discount. REFERENCES P. Cheeseman, et al. "Autoclass: A Bayesian Classification System", Proceedings of the Fifth International Conference on Machine Learning, pp. 54-64, Ann Arbor, MI. June 12-14 1988. P. Cheeseman, et al. "Bayesian Classification", Proceedings of the Seventh National Conference of Artificial Intelligence (AAAI-88), pp. 607-611, St. Paul, MN. August 22-26, 1988. J. Goebel, et al. "A Bayesian Classification of the IRAS LRS Atlas", Astron. Astrophys. 222, L5-L8 (1989). P. Cheeseman, et al. "Automatic Classification of Spectra from the Infrared Astronomical Satellite (IRAS)", NASA Reference Publication 1217 (1989) P. Cheeseman, "On Finding the Most Probable Model", Computational Models of Discovery and Theory Formation, ed. by Jeff Shrager and Pat Langley, Morgan Kaufman, Palo Alto, 1990, pp. 73-96. R. Hanson, J. Stutz, P. Cheeseman, "Bayesian Classification with Correlation and Inheritance", Proceedings of 12th International Joint Conference on Artificial Intelligence, Sydney, Australia. August 24-30, 1991. ----------------------------------------------------------------------------- DATA MARINER .. My sponsoring company, Logica Cambridge Ltd, market a product called `Data Mariner' which is based among other things on the IDn algorithms - so far as I'm familiar with it, it carries out similarity-driven induction to abstract regularities from large data sets. I've no idea about prices and so on, I'm afraid. E-mail address for the site is logcam.co.uk but I don't know who you'd need to contact. Jim Kennedy is technical manager of the group dealing with knowledge-based products, and could doubtless refer you to marketing personnel, if you can reach him. Try jimk or postmaster. The one user name I do know is Marc Foote (marcf) who works under Jim; he might also be able to help. Hope this can help! Tony Griffiths tony@minster.york.ac.uk _________________________________________________________________________ RTWORKS Rtworks has no built-in tools for extracting useful information / rules / patterns from (large) amounts of data. One could use such a tool to generate rules for RTworks. There is a book called: "C4.5: Programs for Machine Learning" by J. Ross Quinlan which comes with a C prgoram to generate rules/decision trees from large data sets. You should look into that. | Tom Laffey phone: (415)-965-8050 | | Talarian Corporation fax: (415)-965-9077 | | 444 Castro Str, Suite 140 E-mail: tom@talarian.com | | Mtn. View, CA 94041 uunet!talarian!tom | RTworks A family of products for large scale, distributed time critical systems RTworks is a family of independent software modules developed for intelligent real-time data acquisition, data analysis, data archiving, data playback, data distribution, and message/data display. RTworks offers a number of sophisticated problem-solving strategies including knowledge-based systems, a point-and-click graphical user interface, temporal and statistical reasoning, and the ability to distribute an application over a network. -------------------------------------------------------------------------- IXL A research project I was associated with tried IXL, by IntelligenceWare. Two of us (with many years computing experience) never managed to make it cope with our data ( about 380 records of about 90 fields). But then we couldn't get it to cope with some of the test data distributed with it either, in spite of trying several different machines with better than the recommended configuration and memory management system! The promotional blurb suggested that our size of data was well within its capabilities. The kinds of errors we got were memory errors which caused system crashes. The reason for these failures was never established. IntelligenceWare assured us that all their test data worked fine for them. Eventually, after discussions via the university lawyer, they refunded us. Jane Hesketh (hesketh@ed.ac.uk) jane@aisf.edinburgh.ac.uk ------------------------------------------------------------------------ NLToolset Our NLToolset is capable of performing the functions you describe. I am passing your note on to someone who hopefully will be able to provide you with the pricing information you requested. Regards, Lisa Rau 200 South 33rd Street Visiting Assistant Professor Dept of Computer and Information Science lrau@cis.upenn.edu University of Pennsylvania FAX: (215) 898-0587 Philadelphia, NY 19104-6389 Phone: (215) 573-2815 -------------------------------------------------------------------------- Dear Sandra Oudshoff! We have developed a system for discovery in databases (Explora) which is generally available and can be run on Macintosh. If you are interested in using the system, you can get it via anonymous ftp from ftp.gmd.de in directory gmd/explora: Open a connection to "ftp.gmd.de" and transfer the file "Explora.sit.hqx" from the directory "gmd/explora". The file "README" informs about the installation of Explora. An user manual is included. If you need some further support, please contact me. Best wishes Willi Kloesgen Willi Kloesgen, GMD, D-53757 Sankt Augustin Phone ++49/2241-14-2723, Fax ++49/2241-14-2618 E-mail: kloesgen@gmd.de ----------------------------------------------------------------------- End Enclosure 2 ----------------------------------------------------------------------- --- Andy Pryke Email : A.N.Pryke@cs.bham.ac.uk ****************************************************************************** That's all for the summary. Hope you found some info useful to you! Sherry Long sherry@iti.gov.sg