Return-Path: Received: from RI.CMU.EDU by A.GP.CS.CMU.EDU id aa13115; 18 Jul 94 14:07:30 EDT Received: from sk2eu.EUnet.sk by RI.CMU.EDU id aa06288; 18 Jul 94 14:06:24 EDT Received: from softec.sk by sk2eu.eunet.sk with UUCP id AA03309 Mon, 18 Jul 1994 20:06:09 +0200 Subject: fuzzy information retrieval To: mkant+fuzzy-faq@cs.cmu.edu Date: Mon, 18 Jul 1994 19:07:20 +0200 (MET) From: Jaro Kostelansky X-Mailer: ELM [version 2.4 PL21] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 13964 Message-Id: <9407181907.aa03980@softec.softec.sk> X-Charset: ASCII X-Char-Esc: 29 Sender: mkant@A.GP.CS.CMU.EDU : Contributions and corrections should be sent to the mailing : list mkant+fuzzy-faq@cs.cmu.edu. Please let me contribute to the comp.ai.fuzzy FAQ with the following contact info. (a short article about our product is included) ---------------------------------------------------------------- Aria Ltd. Products: DB-fuzzy A library of fuzzy information retrieval for CA-Clipper Aria Ltd. Dubravska 3 842 21 Bratislava SLOVAKIA Phone: (+42 7) 3709 286 Fax: (+42 7) 3709 232 Email: aria@softec.sk ---------------------------------------------------------------- Fuzzy technology for CA-Clipper Mgr. Juraj Durov, Mgr. Jaroslav Kostelansky Aria Ltd., Slovakia The DB-fuzzy library for Clipper is designed to develop informa- tion systems in which the information retrieval is based on ambi- guous and/or incomplete data. In addition, the fuzzy library allows the transformation of the user's requests formulated in common language into a query statement providing an output ade- quately reflecting the original request. DB-fuzzy is based on the modified set theory, so called theory of fuzzy sets. The theory of fuzzy sets The theory of fuzzy sets (ThFS) originated in 1965 when Prof. Lotfi A. Zadeh from California University, Berkeley, introduced an idea of fuzzy set which is a central concept of the theory. According to one language dictionary, a fuzzy set is a generali- zation of the classical set allowing for various degrees of set membership instead of all or none. This means that a value between 0 and 1, the degree of membership in the fuzzy set, is assigned to each element. The membership of an element in a given fuzzy set increases with increasing value of the degree of mem- bership. The ThFS starts from this simple and natural idea allo- wing the theory to adapt in an environment of uncertainty and inaccuracy which is inherent to the real world. The word fuzzy also corresponds to the theory base and it may be understood as not sharp, dim, vague or uncertain. Although the original attitude of "black-and-white" mathemati- cians was rejected, the theory of fuzzy sets was under theoreti- cal investigations. Today this theory belongs to well-developed and appreciated mathematical theories. First practical applica- tions appeared in Japan around 1987. Applying the theory In recent years fuzzy technologies are becoming increasingly em- bedded in a wide range of applications. As an illustration we can mention for example the recognition of hand-written characters, video-camera with automatic search of the central point of pictu- re to focus on and compensating the hand-shimmering and braking system ABS in the cars. The fuzzy approach popularity has increa- sed mainly for enormous success of fuzzy logic controllers that successfully simulate human operations. Even though fuzzy logic is used worldwide, the leading country is Japan with its many re- search programs, big financial support and almost all top indu- strial companies accepted this new trend. Information retrieval from databases belongs to various fields of theory applications. In this context, the main advantage of the ThFS is its ability to perform mathematically also the vaguely specified information, and thus to process this information by the computer techniques. The need to work with inaccurate and/or incomplete information is due to the fact that in almost each description of a reality is always present some measure of uncertainty. The DB-fuzzy library The use of a fuzzy library is justified in the situations when- ever the classical tools of information retrieval are insuffi- cient. For example, such a situation may occur if we cannot ret- rieve information needed due to the inaccurate or lack of suffi- cient data identifying it. The DB-fuzzy allows to represent such data in a query the output of which are records more or less fit- ting the input data in the query. That means that we get also the records not strictly fitting the input data. In this way, the inaccurateness or incompletness of data, by which we are retrie- ving, is accepted. This steps up the probability of obtaining the information needed. Another advantage of the fuzzy library is its ability to transform requests formulated in the common language into the query statements in which it is possible to adapt vague- ly specified information. The output is the evaluated set of re- cords where their value reflects how the records fit the given query. In such way, we are able to select the most suitable al- ternative adequate to the original requests formulated in the common language. The main difference between classical and fuzzy approaches con- sists in dealing with the output set of records fitting the query as if it was a fuzzy set. The degrees of membership assigned to the records reflect the measure of relevancy to the query. Using the DB-fuzzy we simply obtain for each record a real number bet- ween 0 and 1 expressing to what extent a particular record fits the query. The higher the assigned value the more the correspon- ding record fits the given query. In other words, a record res- ponding to the query is neither selected nor unselected but it is evaluated to what extent it fits the given query. The evaluated output obtained in this way can be sorted in decreasing order according to assigned values of the degree of membership. That means that the records that are the most appropriate to the query are at the top and the degree of membership decreases with the rate of relevancy. The ability of the DB-fuzzy consists just in the adequate calculation of the degree of membership, i.e. of the real number from the interval < 0, 1>. In a classical approach, the output consists of the set of re- cords that unambiguously fit the conditions given in the query. That means that the logic of query evaluation is bivalent - a re- cord from an input set either belongs to the output set or not. There is, therefore, no distinction among the records in the sen- se of relevancy to a given query. All the records in the output set are considered to be equivalently right or wrong. It is unp- leasant that a user does not realize this mistake because the ba- sic defect is not in providing the irrelevant information but, on the contrary, not providing the relevant one. A practical example A concrete use of the fuzzy library is demonstrated in an example from business activities. This example is based on the selection of the most appropriate alternative. Let's say that we are loo- king for eligible candidates for a new joint company according to the following requirements: ----------------------------------------------------------------- | The main requirement is the high profit and turn-over of the | | company in last year, with emphasis on the profit. | | | | Less important requirement is the sales representative of the | | company in U.S.A., Canada, or Mexico, or if need be in all | | three contries, while the priority is put on the sales | | representative in U.S.A. and the representative in Mexico is | | the least important. | | | | In addition, it would be useful if the company is medium-sized| | as to the number of employees while the a wide net of | | distribution outlets, i.e. a certain number of shops is | | required at the same time. The wide net of distribution | | outlets is more important. | ----------------------------------------------------------------- ----------------------------------------------------------------- | The profit exceeding approximately 8 percent and the turn-over| | exceeding 600 milions are considered high. The medium-sized | | company means the company with about 100 employees and wide | | net of distribution outlets means the net with number | | of shops approximately exceeding 15. | ----------------------------------------------------------------- As it can be seen, the requirements formulated in common language contain vague terms (e.g. high, medium-sized, wide, approximate- ly) and, a different importance (e.g. the main, priority, more important) is attributed to the various parts of the query. The result (see table below) is illustrated on test input using DB-fuzzy functions, and will be investigated how the above-mentioned requirements are accepted. The output is ordered by the value of the degree of membership (see score below) that reflects the measure of the candidate's acceptability. ------------------------------------------------------------ |company profit turn. sales represent. emp. net score | ------------------------------------------------------------ | E | 10.40 | 1050 | US, Can. | 115 | 14 | 0.716 | | I | 10.40 | 890 | US, Can. | 115 | 14 | 0.715 | | H | 12.50 | 880 | US, Can. | 115 | 11 | 0.671 | | C | 12.50 | 880 | US, Mex. | 115 | 11 | 0.651 | | A | 9.05 | 1050 | US, Can. | 115 | 14 | 0.650 | | G | 9.70 | 880 | US, Can., Mex. | 100 | 11 | 0.647 | | D | 12.50 | 880 | Can., Mex. | 115 | 11 | 0.644 | | F | 13.62 | 925 | US, Can. | 35 | 7 | 0.586 | | B | 13.62 | 925 | Slovakia | 97 | 21 | 0.443 | ------------------------------------------------------------ The evaluation of acceptable candidates for a joint company Company B that is medium-sized with a wide net of distribution outlets but without sales representative in any required coun- tries, is on the output after company F, however this one is not both medium-size and with wide net of distribution outlets but it has a sales representative in the U.S.A. as well as in Canada. The profit and turn-over are the same for both. This classifica- tion follows from the fact that the second query requirement is more important than the third one. The differences between the companies H, C and D consist in the sales representatives. Both companies H and C have sales repre- sentatives in the U.S.A. but the H has the representative in Ca- nada and C in Mexico. That is why H has a higher degree of mem- bership than C. It is due to the least importance of the repre- sentative in Mexico within this query requirement. The degree of comapny D without representative in the U.S.A. is lower than the degree of company C because of the priority given to U.S.A. The difference between companies H and G is among others in much lower profit of company G. This is the reason why company G is ratec lower than H despite of sales representatives in all re- quired countries and the absolute satisfaction of the condition on medium-sized company. This evaluation follows from the fact that the second and the third requirements of the query are less important than the first one as well as from the emphasis on high profit in the first requirement. Mexico and the size of the com- pany in the second and third requirements were also less impor- tant. The company E with higher turn-over and wider net of distributors than company H was assigned a higher degree than H despite a li- ttle lower profit. Also the company I is better evaluated than the H. Company I differs from E by the lower turn-over. Company A with lower profit than E was assigned a lower degree than H. It is therefore evident that the calculated degree of membership is based on all the characteristics of candidates. As you can see from the above, the assigned degrees of membership adequately reflect the acceptability or non-acceptability of a candidate. It should be pointed out that it would be quite difficult to evaluate the candidates in the means of conventional tools of information retrieval systems even though using such a relatively small input. Concluding remarks The advantages of DB-fuzzy use become more important if larger input sets are used when the end-user is able to examine top re- cords, i.e. the ones with the highest value or only the records with the degree higher than a given threshold value. In this way, it is possible to solve partly also such a problem as is an in- formation overload when the user has no chance to process all the relevant information. By processing according to the relevancy evaluation, the user can be quite sure that he processed the most essential information he was able to. The fuzzy library allows the writing of software which simulates the thinking process of humans. In real life, the majority of de- cisions is not based on accurate or absolute values. The human decision is determined mostly by incomplete information that in summary describes the problem being solved. Single information is more or less precise or true and one attributes different impor- tance or priority to them. The dispositions of the DB-fuzzy to work with inaccurate data allow, in the field of data processing, to create an attractive class of applications, so called decision support systems. Recently, these systems marked to be unsuitable to be used for automation on account of their inaccurate parame- ters andd they could not cope with the classical tools of infor- mation processing. Contact addresses Aria Ltd. Dubravska 3 Phone: (+42 7) 3709 286 842 21 Bratislava Fax: (+42 7) 3709 232 Slovakia Email: aria@softec.sk ClippArt Ltd. Polianky 15 Tel. (+42 7) 786 160 841 02 Bratislava Fax (+42 7) 786 160 Slovakia