\documentclass[twoside,11pt]{article} %\input epsf %% at top of file \usepackage{jair} \usepackage{theapa} \usepackage{latexsym} \usepackage{eepic} \usepackage{epsf} \jairheading{14}{2001}{105}{12/99}{04/01} \ShortHeadings{What's in an Attribute?} {K{\"u}steres, Borgida} \firstpageno{1} %\input{macros} %\setcounter{page}{0} %\renewcommand{\baselinestretch}{0.92} \begin{document} %\thispagestyle{empty} \title{What's in an Attribute? \\ Consequences for the Least Common Subsumer} \author{\name Ralf K{\"u}sters \email kuesters@ti.informatik.uni-kiel.de \\ \addr Institut f{\"u}r Informatik und Praktische Mathematik\\ Christian-Albrechts-Universit{\"a}t zu Kiel\\ 24098 Kiel\\Germany \AND \name Alex Borgida \email borgida@cs.rutgers.edu \\ \addr Department of Computer Science\\ Rutgers University\\ Piscataway, NJ 08855 \\ USA } \maketitle \begin{abstract} \noindent Functional relationships between objects, called ``attributes'', are of considerable importance in knowledge representation languages, including Description Logics (DLs). A study of the literature indicates that papers have made, often implicitly, different assumptions about the nature of attributes: whether they are always required to have a value, or whether they can be partial functions. The work presented here is the first explicit study of this difference for subclasses of the \textsc{Classic} DL, involving the same-as concept constructor. It is shown that although determining subsumption between concept descriptions has the same complexity (though requiring different algorithms), the story is different in the case of determining the least common subsumer (lcs). For attributes interpreted as partial functions, the lcs exists and can be computed relatively easily; even in this case our results correct and extend three previous papers about the lcs of DLs. In the case where attributes must have a value, the lcs may not exist, and even if it exists it may be of exponential size. Interestingly, it is possible to decide in polynomial time if the lcs exists. \end{abstract} %\newpage{} %%%%%%%%%%%%%%%%%%%%% %% Introduction %%%%%%%%%%%%%%%%%%%%% %\input{introduction} \section{Introduction} Knowledge representation systems based on Description Logics (DLs) have been the subject of continued attention in Artificial Intelligence, both as a subject of theoretical studies \cite{Borgida-CIKM-1994,Baader-JLC-1996,BaaderSattler-TABLEAUX-2000,DeGiacomoLenzerini-KR-1996,CalvaneseDeGiacomo+-IJCAI-1999} and in applications \cite{ArtaleFranconi+-DKE-1996,Brachman+-AI-1999,McGuinnessPatelschneider-AAAI-1998}. More impressively, DLs have found applications in other areas involving information processing, such as databases \cite{Borgida-TKDE-1995,CalvaneseLenzerini+-JAIR-1999}, semi-structured data \cite{CalvaneseDeGiacomo+-AAAI-1998,CalvaneseDeGiacomo+-NISJ-1999}, information integration \cite{CalvaneseDeGiacomo+-KR-1998,BorgidaKuesters-DL-2000}, as well as more general problems such as configuration \cite{McGuinnessWright-IEEE-1998} and software engineering \cite{BorgidaDevanbu-ICSE-1999,DevanbuJones-TOSEM-1997}. In fact, wherever the ubiquitous term ``ontology" is used these days (e.g., for providing the semantics of web/XML documents), DLs are prime contenders because of their clear semantics and well-studied computational properties. In Description Logics, one takes an object-centered view, where the world is modeled as individuals, connected by binary relationships (here called {\em roles}), and grouped into classes (called {\em concepts}). For those more familiar with Predicate Logic, objects correspond to constants, roles to binary predicates, and concepts to unary predicates. In every DL system, the concepts of the application domain are described by \emph{concept descriptions} that are built from atomic concepts and roles using the ``constructors'' provided by the DL language. For example, consider a situation where we want a concept describing individual cars that have had frequent (at least 10) repairs, and also record the fact that for cars, their \rmodel{} is the same as their manufacturer's \rmodel{}. Concepts can be thought of as being built up from (possibly nested) simpler noun-phrases, so the above concept, called $\cAccidentCar$ in the sequel, might be captured as the conjunction of \begin{tabbing} (objects that are \cCar s) \\ (things {\bf all} of whose \rmodel{} values are in concept \cModel{}) \\ (things {\bf all} of whose \rmadeBy{} values are in concept \cManufacturer{}) \\ (things whose \rmodel{} value is {\bf the same as} the \rmodel{} {\bf of the} \rmadeBy{} attribute) \\ (things with {\bf at least} 10 \raccidents{} values) \\ (things {\bf all} of whose \raccidents{} values are \cAccidentReport{}). \end{tabbing} \noindent Using the syntax of the \fullclassic{} language, we can abbreviate the above, while emphasizing the term-like nature of descriptions and the constructors used in each: \begin{tabbing} ({\bf and} \= \cCar{} \\ \> ({\bf all} \rmodel{} \cModel{}) \\ \> ({\bf all} \rmadeBy{} \cManufacturer{}) \\ \>({\bf same-as} (\rmodel{}) ($\compose{\rmadeBy}{\rmodel}$))\\ \>({\bf at-least} 10 \raccidents{}) \\ \>({\bf all} \raccidents{} \cAccidentReport{})) \end{tabbing} \noindent So, for example, the concept term ({\bf at-least} $n$ {\tt p}) has constructor {\bf at-least}, and denotes objects which are related by the relationship {\tt p} to at least $n$ other objects; in turn, ({\bf all} {\tt p C}) has as instances exactly those objects which are related by {\tt p} only to instances of {\tt C}. Finally, we present the same concept in a mathematical notation which is more succinct and preferred in formal work on DLs: \vspace*{2ex} $\cAccidentCar:=$ \begin{tabular}[t]{l@{\hspace*{5ex}}l} $\cCar\:\And$ \\ $\all{\rmodel}{\cModel}\;\And$ \\ $\all{\rmadeBy}{\cManufacturer}\;\And$ \\ $\sameas{\rmadeBy}{(\compose{\rmodel}{\rmadeBy})}\;\And$ \\ $\atleast{10}{\raccidents}\;\And$ \\ $\all{\raccidents}{\cAccidentReport}$ \end{tabular} \vspace*{2ex} %\noindent The distinguishing feature of DLs with respect to other semantic %network-like knowledge representation formalisms is that they support a %variety of inferences, and algorithms that carry out these inferences. \rk{Since we haven't mentioned network-like formalisms so far, I changed the following a bit:} \noindent Unlike preceding formalisms, such as semantic networks and frames \cite{Quillian-SIP-1968,Minsky-PCV-1975}, DLs are equipped with a formal semantics, which can be given by a translation into first-order predicate logic \cite{Borgida-CIKM-1994}, for example. Moreover, DL systems provide their users with various inference capabilities that allow them to deduce implicit knowledge from the explicitly represented knowledge. For instance, the subsumption algorithms allow one to determine subconcept-superconcept relationships: $C$ is {\em subsumed by} $D$ ($C\sqsubseteq D$) if and only if all instances of $C$ are also instances of $D$, i.e., the first description is always interpreted as a subset of the second description. For example, the concept \cCar{} obviously subsumes the concept description \cAccidentCar{}, while ({\bf at-least} 10 \raccidents{}) is subsumed by ({\bf at-least} 8 \raccidents{}). The traditional inference problems for DL systems, such as subsumption, inconsistency detection, membership checking, are by now well-investigated. Algorithms and detailed complexity results for realizing such inferences are available for a variety of DLs of differing expressive power --- see, e.g., \cite{BaaderSattler-TABLEAUX-2000} for an overview. \subsection{Least Common Subsumer} The {\em least common subsumer (lcs)} of concepts is the most specific concept description subsuming the given concepts. Finding the lcs was first introduced as a new inference problem for DLs by \citeA{CohenBorgida+-AAAI-1992}. One motivation for considering the lcs is to use it as an alternative to disjunction. The idea is to replace disjunctions like $C_1 \sqcup \cdots \sqcup C_n$ by the lcs of $C_1,\ldots,C_n$. \citeA{BorgidaEtherington-KR-1989} call this operation \emph{knowledge-base vivification}. Although, in general, the lcs is not equivalent to the corresponding disjunction, it is the best approximation of the disjunctive concept within the available language. Using such an approximation is motivated by the fact that, in many cases, adding disjunction would increase the complexity of reasoning.\footnote{Observe that if the language already allows for disjunction, we have $lcs(C_1,\ldots,C_n)\equiv C_1\sqcup\cdots\sqcup C_n$. In particular, this means that, for such languages, the lcs is not really of interest.} As proposed by Baader et al.~\cite{BaaderKuesters-KI-1998,BaaderKuesters+-IJCAI-1999}, the lcs operation can be used to support the ``bottom-up'' construction of DL knowledge bases, where, roughly speaking, starting from ``typical'' examples an lcs algorithm is used to compute a concept description that (i) contains all these examples, and (ii) is the most specific description satisfying property (i). \citeauthor{BaaderKuesters-KI-1998} have presented such an algorithm for cyclic $\aln$-concept descriptions; $\aln$ is a relatively simple language allowing for concept conjunction, primitive negation, value restrictions, and number restrictions. Also, \citeA{BaaderKuesters+-IJCAI-1999} have proposed an lcs algorithm for a DL allowing existential restrictions instead of number restrictions. Originally, the lcs was introduced as an operation in the context of inductive learning from examples \cite{CohenBorgida+-AAAI-1992}, and several papers followed up this lead. % with regard to the PAC learning model\cite{Valiant-ACM-1984}. The DLs considered were mostly sublanguages of \textsc{Classic} which allowed for same-as equalities, i.e., expressions like {\tt ({\bf same-as} (madeBy) (model $\circ$ madeBy))}. %$\sameas{\rmadeBy}{(\compose{\rmodel}{\rmadeBy})}$. \citeauthor{CohenBorgida+-AAAI-1992} proposed an lcs algorithm for $\aln$ and a language that allows for concept conjunction and same-as, which we will call \sameconj{}. The algorithm for \sameconj{} was extended by \citeA{CohenHirsh-ML-1994} to \textsc{CoreClassic}, which additionally allows for value restrictions (see \cite{CohenHirsh-KR-1994} for experimental results). % % Finally, \citeA{FrazierPitt-ML-1996} presented an lcs algorithm for full \textsc{Classic}. \rk{I made quite some changes in the following two subsections.} \subsection{Total vs. Partial Attributes} In most knowledge representation systems, including DLs, functional relationships, here called {\em attributes} (also called ``features'' in the literature), are distinguished as a subclass of general relationships, at least in part because functional restrictions occur so frequently in practice\footnote{Readers coming from the Machine Learning community should be aware of the difference between our ``attributes'' (functional roles) and their ``attributes'', which are components of an input feature vector that usually describes an exemplar. }. In the above example, clearly \rmadeBy{} and \rmodel{} are meant to be attributes, thus making unnecessary number restrictions like {\tt ({\bf and} ({\bf at-most} 1 madeBy) ({\bf at-least} 1 madeBy))}. %$(\atmost 1{\rmadeBy}\And \atleast 1{\rmadeBy})$. In addition, distinguishing attributes helps identify tractable subsets of DL constructors: in \textsc{Classic}, coreferences between attribute chains (as in the above examples) can be reasoned with efficiently \cite{BorgidaPatel-Schneider-JAIR-1994}, while if we changed to roles, e.g., allowed {\tt ({\bf same-as} (repairs) (ownedBy $\circ$ repairsPaidFor))}, %$\sameas{{\sf repairs}}{(\compose{\sf ownedBy}{\sf repairsPaidFor})}$, the subsumption problem becomes undecidable \cite{SchmidtSchauss-KR-1989}. Whereas the distinction between roles and attributes in DLs is both theoretically and practically well understood, we have discovered that another distinction, namely the one between attributes being interpreted as total functions (\emph{total attributes}) and those interpreted as partial functions (\emph{partial attributes}), has ``slipped through the cracks'' of contemporary research. A total attribute always has a value in ``the world out there'', even if we do not know it in the knowledge base currently. A partial attribute may not have a value. This distinction is useful in practice, since there is a difference between a car possibly, but not necessarily, having a CD player, and the car necessarily having a manufacturer (which just may not be known in the current knowledge base). The latter is modeled by defining the attribute $\rmadeBy$ to be a total attribute. Note that with $\rmadeBy$ being a total attribute, \emph{every} individual in the world of discourse (not only cars) must have a filler for $\rmadeBy$. Since, however, no structural information is provided for fillers of $\rmadeBy$ of non-car individuals, all implications drawn about these fillers are trivial. Thus, making $\rmadeBy$ a total attribute seems reasonable in this case. A car's CD player, on the other hand, should be modeled by a partial attribute to express the fact that cars are not required to have a CD player. To indicate that a particular car does have a CD player, one would have to add the description {\tt ({\bf at-least} 1 CDplayer)}. % $(\atleast{1}{CDplayer})$. \subsection{New Results} As mentioned above, in conjunction with the same-as constructor, roles and attributes behave very differently with respect to subsumption. The main objective of this paper is to show that the distinction between total and partial attributes induces significantly different behaviour in computing the lcs, in the presence of {\bf same-as}. More precisely, the purpose of this paper is twofold. First, we show that with respect to the complexity of deciding subsumption there is no difference between partial and total attributes. \citeA{BorgidaPatel-Schneider-JAIR-1994} have shown that when attributes are total, subsumption of $\fullclassic$ concept descriptions can be decided in polynomial time. As shown in the present work, slight modifications of the algorithm proposed by \citeauthor{BorgidaPatel-Schneider-JAIR-1994} suffice to handle partial attributes. Moreover, these modifications do not change the complexity of the algorithm. Thus, partial and total attributes behave very similarly from the subsumption point of view. Second, and this is the more surprising result of this paper, the distinction between partial and total attributes does have a significant impact on the problem of computing the lcs. Previous results on sublanguages of $\fullclassic$ show that if partial attributes are used, the lcs of two concept descriptions always exists, and can be computed in polynomial time. If, however, only total attributes are involved, the situation is very different. The lcs need no longer even exist, and in case it exists its size may grow exponential in the size of the given concept descriptions. Nevertheless, the existence of the lcs of two concept descriptions can be decided in polynomial time. Specifically, in previous work \cite{CohenBorgida+-AAAI-1992,CohenHirsh-ML-1994,FrazierPitt-ML-1996} concerning the lcs computation in $\fullclassic$, constructions and proofs have been made without realizing the difference between the two types of attributes. Without going into details here, the main problem for lcs is that merely finite graphs have been employed, making the constructions applicable only for the partial attribute case. In addition to fixing these problems, this paper also presents the proper handling of inconsistent concepts in the lcs algorithm for $\fullclassic$ presented by \citeA{FrazierPitt-ML-1996}. Although our results about subsumption are not as intriguing, the proofs to show the results on the lcs make extensive use of the corresponding subsumption algorithms, which is one reason we present them beforehand in this paper. Returning to the general differences between the cases of total and partial attributes, one could say that the fundamental cause for the differences lies in the same-as constructor, whose semantics normally requires that (i) the two chains of attributes each have a value, and (ii) that these values coincide. In the case of total attributes, same-as obeys the principle $$ C\sqsubseteq \sameas{u}{v} \mbox{ implies } C\sqsubseteq \sameas{u\circ w}{v\circ w} $$ where $u$,$v$, and $w$ are sequences of total attributes, e.g., ($\compose{\rmadeBy}{\rmodel}$), because condition (i) is ensured by the total aspect of all the attributes. In the case of partial attributes, the above implication does not hold, because $w$, and hence $u\circ w$, is no longer guaranteed to have a value, implying that the same-as restriction may not hold. Clearly, this implication affects the results of subsumption. As far as lcs is concerned, a certain graph (representing the lcs of the two given concepts) may be infinite in the case of total attributes, thus jeopardizing the existence of the lcs. The more general significance of our result is that knowledge representation language designers {\em and} users need to explicitly check at the beginning whether they deal with total or partial attributes because the choice can have significant effects. Although in some situations total attributes are convenient, to guarantee the existence of attributes without having to resort to number restrictions, our results show that they can have drawbacks. All things considered, requiring all attributes to be total appears to be less desirable. Concerning \fullclassic{}, the technical results in this paper support the use of partial attributes because these ensure the existence of the lcs and its computation in polynomial time as well as the efficient decision of subsumption. Moreover, the current implementation of the \fullclassic{} subsumption algorithm does not require major changes in order to handle partial attributes. The outline of this paper is as follows: In the following section, the basic notions necessary for our investigations are introduced. Then, in the two subsequent sections, subsumption and lcs computation in $\fullclassic$ with partial attributes is investigated. More precisely, in Section~\ref{sec:subsumption} we offer a subsumption algorithm for the sublanguage $\classic$ of $\fullclassic$, which contains all main $\fullclassic$-constructors; in Section~\ref{sec:lcs}, we present an lcs algorithm for $\classic$ concept descriptions, along the lines of that proposed by \citeA{CohenHirsh-ML-1994}, and formally prove its correctness, thereby resolving some shortcomings of previous lcs algorithms, which did not handle inconsistencies properly. Finally, Section~\ref{sec:totalfunctions} covers the central new result of this paper, i.e., the lcs computation in presence of total attributes. For this section, we restrict our investigations to the sublanguage $\sameconj$ of $\classic$ in order to concentrate on the changes caused by going from partial to total attributes. Nevertheless, we strongly conjecture that all the results proved in this section can easily be extended to $\classic$ and $\fullclassic$ using similar techniques as the one employed in the two previous sections. %%% Local Variables: %%% mode: latex %%% TeX-master: "KuestersBorgida-JAIR-2000" %%% End: %%%%%%%%%%%%%%%%%%%%% %% Preliminaries %%%%%%%%%%%%%%%%%%%%% %\input{preliminaries} \section{Formal Preliminaries}\label{preliminaries} In this section, we introduce the syntax and semantics of the description languages considered in this paper and formally define subsumption and equivalence of concept descriptions. Finally, the least common subsumer of concept descriptions is specified. \begin{definition} Let $\calC$, $\calR$, and $\calA$ be disjoint finite sets representing the set of {\em concept names}, the set of {\em role names}, and the set of \emph{attribute names}. The set of all {\em \classic{}-concept descriptions\/} over $\calC$, $\calR$, and $\calA$ is inductively defined as follows: \begin{itemize} \item Every element of $\calC$ is a concept description \emph{(concept name, like $\cCar$).} \item The symbol $\top$ is a concept description \emph{(top concept, denoting the universe of all objects)}. \item If $r\in \calR$ is a role and $n \geq 0$ is a nonnegative integer, then $\atmost nr$ and $\atleast nr$ are concept descriptions \emph{(number restrictions, like $\atleast{10}{\raccidents}$)}. \item If $C$ and $D$ are concept descriptions, then $C \sqcap D$ is a concept description \emph{(concept conjunction)}. \item If $C$ is a concept description and $r$ is a role or an attribute, then $\forall r.C$ is a concept description \emph{(value restriction, like $\all{\rmadeBy}{\cManufacturer}$)}. \item If $k,h \geq 0$ are non-negative integers and $a_1,\ldots,a_k,b_1,\ldots,b_h\in \calA$ are attributes, then $\sameas{a_1\circ\cdots \circ a_k}{b_1\circ\cdots \circ b_h}$ is a concept description \emph{(same-as equality, like $\sameas{\rmadeBy}{\compose{\rmodel}{\rmadeBy}}$)}. Note that the two sequences may be empty, i.e., $k=0$ or $h=0$. The empty sequence is denoted by $\varepsilon$. \end{itemize} \end{definition} % Often we dispense with $\circ$ in the composition of attributes. For example, the sequence $a_1\circ \cdots \circ a_k$ is simply written as $a_1\cdots a_k$. Moreover, we will use $\forall r_1\cdots r_n.C$ as abbreviation of $\forall r_1.\forall r_2\cdots\forall r_n.C$, where we have $\forall \varepsilon.C$ in case $n=0$, and this denotes $C$. As usual, the semantics of $\classic$ is defined in a model-theoretic way by means of interpretations. \begin{definition}\label{desc-extension} An \emph{interpretation} $\pw$ consists of a nonempty domain $\DD$ and an interpretation function $\intr{\cdot}$. The interpretation function assigns extensions to atomic identifiers as follows: \begin{itemize} \item The extension of a concept name $E$ is some subset $\intr{E}$ of the domain. \item The extension of a role name $r$ is some subset $\intr{r}$ of $\DD \times \DD$. \item The extension of an attribute name $a$ is some partial function $\intr{a}$ from $\DD$ to $\DD$, i.e., if $(x,y_1)\in \intr{a}$ and $(x,y_2)\in \intr{a}$ then $y_1=y_2$. \end{itemize} Given roles or attributes $r_i$, we use $\intr{(r_1\cdots r_n)}$ to denote the composition of the binary relations $\intr{r_i}$. If $n=0$ then the result is $\intr{\varepsilon}$, which denotes the identity relation, i.e., $\intr{\varepsilon}:=\{(d,d)\;|\; d\in \DD\}$. For an individual $d\in \DD$, we define $\intr{r}(d):=\{e\;|\;(d,e)\in\intr{r}\}$. If the $r_i$'s are attributes, we say that $\intr{(r_1\cdots r_n)}$ is {\em defined} for $d$ iff $\intr{(r_1\cdots r_n)}(d)\not=\emptyset$; occasionally, we will refer to $\intr{(r_1\cdots r_n)(d)}$ as the image of $d$ under $\intr{(r_1\cdots r_n)}(d)$. The extension $\intr{C}$ of a concept description $C$ is inductively defined as follows: \begin{itemize} \item $\intr{\thing}:=\DD$; \item $\intr{(\atleast{n}{r})}:=\{d\in \DD\;|\; cardinality(\{e\in \DD\;|\; (d,e)\in \intr{r}\})\ge n\}$; \item $\intr{(\atmost{n}{r})}:=\{d\in \DD\;|\; cardinality(\{e\in \DD\;|\; (d,e)\in \intr{r}\})\le n\}$; \item $\intr{(C \And D)} := \intr{C} \cap \intr{D}$; \item $\intr{(\all{r}{C})} := \{ d \in \DD \mid \intr{r}(d)\subseteq\intr{C}\}$ where $r$ is a role or an attribute; \item $\intr{(\sameas{a_1\cdots a_k}{b_1\cdots b_h})} := \{ d \in \DD \mid$ \parbox[t]{0.5\textwidth}{$\intr{(a_1\cdots a_k)}$ and $\intr{(b_1\cdots b_h)}$ are defined for $d$ and $\intr{(a_1\cdots a_k)}(d) \; =\; \intr{(b_1\cdots b_h)}(d) \}.$} \end{itemize} \end{definition} % Note that in the above definition attributes are interpreted as partial functions. Since the main point of this paper is to demonstrate the impact of different semantics for attributes, we occasionally restrict the set of interpretations to those that map attributes to \emph{total} functions. Such interpretations are called \emph{t-interpretations} and the attributes interpreted in this way are called \emph{total attributes} in order to distinguish them from \emph{partial} ones. We stress, as remarked in the introduction, that in the definition of $\intr{(\sameas{a_1\cdots a_k}{b_1\cdots b_h})}$, $a_1\cdots a_k$ and $b_1\cdots b_h$ must be defined on $d$ in order for $d$ to satisfy the same-as restriction. Although this is the standard semantics for same-as equalities, one could also think of relaxing this restriction. For example, the same-as condition might be specified to hold if {\em either} both paths are undefined {\em or} both images are defined and have identical values. A third definition might be satisfied if even just one of the paths is undefined. Each of these definitions of the semantics of same-as might lead to different results. However, in this paper we only pursue the standard semantics. The subsumption relationship between concept descriptions is defined as follows. \begin{definition} A concept description $C$ is {\em subsumed} by the concept description $D$ ($C\sqsubseteq D$ for short) if and only if for all interpretations $\pw$, $\intr{C}\subseteq \intr{D}$. If we consider only total interpretations, we get \emph{t-subsumption}: $C \isat D$ iff $\intr{C}\subseteq \intr{D}$ for all t-interpretations $\pw$. \end{definition} % Having defined subsumption, equivalence of concept descriptions is defined in the usual way: $C\equiv D$ if and only if $C\sqsubseteq D$ and $D\sqsubseteq C$. T-equivalence $C\equivt D$ is specified analogously. As already mentioned in the introduction, the main difference between partial and total attributes with respect to subsumption is that $\sameas uv\sqsubseteq_t \sameas{u\circ w}{v\circ w}$ holds for all attribute chains $u,v,w$, whereas it is not necessarily the case that $\sameas uv \sqsubseteq \sameas{u\circ w}{v\circ w}$. Finally, before introducing the lcs operation formally and concluding this section, we comment on the expressive power of $\classic$, since (syntactically) $\classic$ lacks some common constructors. Although \classic{}, as introduced here, does not contain the \emph{bottom concept} $\bot$ explicitly, it can be expressed by, e.g., $(\atleast 1r)\sqcap (\atmost 0r)$. We will use $\bot$ as an abbreviation for inconsistent concept descriptions. % \ab{I just realized this is only correct with respect to some ops -- in general, denotations are not preserved by this trick. Lets leave it out?} \rk{I don't see a problem. Could you elaborate on this a little.} % Furthermore, \emph{primitive negation}, i.e., negation of concept names, can be simulated by number restrictions. For a concept name $E$ one can replace every occurrence of $E$ by ($\atleast 1{r_E}$) and the negation $\neg E$ of $E$ by ($\atmost 0{r_E}$) where $r_E$ is a new role name. Finally, for an attribute $a$ the following equivalences hold: $(\atleast na)\equiv \bot$ for $n\ge 2$; $(\atleast 1a)\equiv (\sameas aa)$; $(\atleast 0a)\equiv \top$; $(\atmost na)\equiv \thing$ for $n\ge 1$; and $(\atmost 0a)\equiv (\all a{\bot})$. These show that we do not lose any expressive power by not allowing for number restrictions on attributes. Still, full $\fullclassic$ is somewhat more expressive than $\classic$. This is mainly due to the introduction of individuals (also called nominals) in $\fullclassic$. For the sake of completeness we give the syntax of the full \fullclassic{} language.\footnote{Even here we are omitting constructs dealing with integers and other so-called ``host individuals'', which cannot have roles of their own and can only act as role/attribute fillers.} This requires a further set, ${\cal O}$, representing the set of individual names. Then we can define two additional concept constructors \begin{itemize} \item $\{e_1, ..., e_m\}$, for individuals $e_i \in {\cal O}$ (\emph{enumeration} as in $\{Fall,Summer,Spring\}$) \item $p:e$ for a role or attribute $p$, and an individual $e$ (\emph{fills} as in $currentSeason:Summer$). \end{itemize} In a technical report, \citeA{KuestersBorgida-DCS-TR-404-1999} extend some of the results presented in this work to full $\fullclassic$, in the case when individuals have a non-standard semantics. \smallskip The least common subsumer of a set of concept descriptions is the most specific concept subsuming all concept descriptions of the set: \begin{definition}\label{def:lcs} The concept description $D$ is the \emph{least common subsumer (lcs)} of the concept descriptions $C_1,\ldots,C_n$ ($lcs(C_1,\ldots,C_n)$ for short) iff i) $C_i\sqsubseteq D$ for all $i=1,\ldots,n$ and ii) for every $D'$ with that property $D\sqsubseteq D'$. Analogously, we define $\lcst(C_1,\ldots,C_n)$ using $\isat$ instead of $\isa$. \end{definition} Note that the lcs of concept descriptions may not exist, but if it does, by definition it is uniquely determined up to equivalence. In this sense, we may refer to \emph{the} lcs. In the following two sections, attributes are always interpreted as partial functions; only in Section~\ref{sec:totalfunctions} do we consider total attributes. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Subsumption in Basic Classic %%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\input{subsumption-classic.tex} \section{Characterizing Subsumption in $\classic$}\label{sec:subsumption} In this section we modify the characterization of t-subsumption for \textsc{Classic}, as proposed by \citeA{BorgidaPatel-Schneider-JAIR-1994}, to handle the case of partial attributes. We do so in detail, because the tools used for deciding subsumption are intimately related to the computation of lcs. T-subsumption in \textsc{Classic} is decided by a multi-part process. First, descriptions are turned into description graphs. Next, description graphs are put into canonical form, where certain inferences are explicated and other redundancies are reduced by combining nodes and edges in the graph. Finally, t-subsumption is determined between a description and a canonical description graph. In order to ``inherit'' the proofs, we have tried to minimize the necessary adjustments to the specification in \cite{BorgidaPatel-Schneider-JAIR-1994}. For this reason, roughly speaking, attributes are treated as roles unless they form part of a same-as equality. (Note that attributes participating in a same-as construct must have values!) To some extent, this will allow us to adopt the semantics of the original description graphs, which is crucial for proofs. However, the two different occurrences of attributes, namely, in a same-as equality vs.~a role in a value-restriction, require us to modify and extend the definition of description graphs, the normalization rules, and the subsumption algorithm itself. In the following, we present the steps of the subsumption algorithm in detail. We start with the definition of description graphs. \subsection{Description Graphs}\label{cg-defn} Intuitively, description graphs reflect the syntactic structure of concept descriptions. A description graph is a labeled, directed multigraph, with a distinguished node. Roughly speaking, the edges (\emph{a-edges}) of the graph capture the constraints expressed by same-as equalities. The labels of nodes contain, among others, a set of so-called \emph{r-edges}, which correspond to value restrictions. Unlike the description graphs defined by \citeauthor{BorgidaPatel-Schneider-JAIR-1994}, here the r-edges are not only labeled with role names but also with attribute names. (We shall comment later on the advantage of this modification in order to deal with partial attributes.) The r-edges lead to {\em nested description graphs}, representing the concepts of the corresponding value restrictions. Before defining description graphs formally, in Figure~\ref{fig:descgraph} we present a graph corresponding to the concept description \cAccidentCar{} defined in the introduction. We use $G(\cManufacturer)$, $G(\cModel)$, as well as $G(\cAccidentReport)$ to denote description graphs for the concept names $\cManufacturer$, $\cModel$, and $\cAccidentReport$. These graphs are very simple; they merely consist of one node, labeled with the corresponding concept name. In general, such graphs can be more complex since a value restriction like $\all rC$ leads to a (possibly complex) nested concept description $C$. Although number restrictions on attributes are not allowed, r-edges labeled with attributes, like \rmodel{} and \rmadeBy{}, always have the restriction $[0,1]$ in order to capture the semantics of attributes. % \begin{figure}[tbp] \begin{center} \leavevmode \framebox[\textwidth]{ \parbox{0.99\textwidth}{\hfill\ \ \input{descgraph.eepic} \hfill\ \ } }% exported with 55 percent \caption{A description graph for \cAccidentCar{}, where the large node is the root of the graph} \label{fig:descgraph} \end{center} \end{figure} % Formally, description graphs, nodes, and edges are defined mutually recursively as follows: % \begin{definition}\label{dg-def} A {\em description graph} $G$ is a tuple $(N,E,n_0,l)$, consisting of a finite set $N$ of {\em nodes}; a finite set $E$ of edges ({\em a-edges}); a {\em distinguished node} $n_0\in N$ ({\em root of the graph}); and a function $l$ from $N$ into the set of labels of nodes. We will occasionally use the notation $G.Nodes$, $G.Edges$, and $G.root$ to access the components $N$, $E$ and $n_0$ of the graph $G$. An {\em a-edge} is a tuple of the form $(n_1,a,n_2)$ where $n_1$, $n_2$ are nodes and $a$ is an attribute name. A {\em label of a node} is defined to be $\bot$ or a tuple of the form $(C,H)$, consisting of a finite set $C$ of concept names (the {\em atoms} of the node) and a finite set $H$ of tuples (the r-edges of the node). {\em Concept names} in a description graph stand for atomic concept names and $\thing$. We will occasionally use the notation $n.Atoms$ and $n.REdges$ to access the components $C$ and $H$ of the node $n$. An {\em r-edge} is a tuple, $(r,m,M,G')$, consisting of a role or attribute name, $r$; a min, $m$, which is a non-negative integer; a max, $M$, which is a non-negative integer or $\infty$; and a (recursively nested) description graph $G'$. The graph $G'$ will often be called the {\em restriction graph} of the node for the role $r$. We require the nodes of $G'$ to be distinct from all the nodes of $G$ and other nested description graphs of $G$. If $r$ is an attribute, then we require: $m=0$ and $M\in \{0,1\}$. \end{definition} % Given a description graph $G$ and a node $n\in G.Nodes$, we define $G_{|n}$ to be the graph $(N,E,n,l)$; $G_{|n}$ is said to be rooted at $n$. A sequence $p=n_0a_1a_2\cdots a_kn_k$ with $k\ge 0$ and $(n_{i-1},a_i,n_i)\in G.Edges$, $i=1,\ldots,k$, is called \emph{path in $G$ from the node $n_0$ to $n_k$} ($p\in G$ for short); for $k=0$ the path $p$ is called {\em empty}; $w=a_1\cdots a_k$ is called the {\em label} of $p$ (the empty path has label $\varepsilon$); $p$ is called \emph{rooted} if $n_0$ is the root of $G$. Occasionally, we write $n_0a_1\cdots a_kn_k\in G$ omitting the intermediate nodes. Throughout this work we make the assumption that \emph{description graphs are connected}. A description graph is said to be \emph{connected} if all nodes of the graph can be reached by a rooted path and all nested graphs are connected. The semantics of description graphs (see Definition~\ref{dg-extension}) is not altered if nodes that cannot be reached from the root are deleted. In order to merge description graphs we need the notion of ``recursive set of nodes'' of a description graph $G$: The {\em recursive set of nodes of $G$} is the union of the nodes of $G$ and the recursive set of nodes of all nested description graphs of $G$. Just as for concept descriptions, the semantics of description graphs is defined by means of an interpretation $\pw$. We introduce a function $\inj$ which assigns an individual of the domain of $\pw$ to every node of the graph. This ensures that all same-as equalities are satisfied. % \begin{definition}\label{dg-extension} Let $G = (N,E,n_0,l)$ be a description graph and let $\pw$ be an interpretation. An element, $d$, of $\DD$ is in $\intr{G}$, iff there is some total function, $\inj$, from $N$ into $\DD$ such that \begin{enumerate} \item $d = \inj(n_0)$; \item for all $n \in N$, $\inj (n) \in \intr{n}$; and \item for all $ (n_1,a,n_2) \in E$ we have $ (\inj(n_1),\inj(n_2)) \in \intr{a}$. \end{enumerate} % The extension $\intr{n}$ of a node $n$ with label $\bot$ is the empty set. An element, $d$, of $\DD$ is in $\intr{n}$, where $l(n)=( C, H)$, iff \begin{enumerate} \item for all $B \in C$, we have $d \in \intr{B}$; and \item for all $(r,m,M,G') \in H$, \begin{enumerate} \item there are between $m$ and $M$ elements, $d'$, of the domain such that $(d,d') \in \intr{r}$; and \item $d' \in \intr{G'}$ for all $d'$ such that $(d,d') \in \intr{r}$. \end{enumerate} \end{enumerate} \end{definition} % \citeA{CohenHirsh-ML-1994} defined the semantics of description graphs in a different way, avoiding the introduction of a total function $\inj$. The problem with their definition is, however, that it is only well-defined for acyclic graphs, which, for example, excludes same-as equalities of the form $\sameas{\varepsilon}{{\sf spouse}\circ{\sf spouse}}$, or even $\sameas{p}{p\circ q}$. The semantics of the graphs proposed by \citeA{BorgidaPatel-Schneider-JAIR-1994} is similar to Definition~\ref{dg-extension}. However, in that paper a-edges captured not only same-as equalities but also {\em all} value restrictions on attributes. Still, in the context of partial attributes, we could not define the semantics of description graphs by means of a total function $\inj$ since some attributes might not have fillers. Specifying the semantics of description graphs in terms of \emph{partial} mappings $\inj$ would make the definition even longer. Furthermore, the proofs in \cite{BorgidaPatel-Schneider-JAIR-1994} would not carry over as easily. Therefore, in order to keep $\inj$ a total function, value restrictions of attributes are initially always translated into r-edges. The next section will present the translation of concept descriptions into description graphs in detail. Having defined the semantics of description graphs, subsumption and equivalence between description graphs (e.g., $H\sqsubseteq G$) as well as concept descriptions and description graphs (e.g., $C\sqsubseteq G$) is defined in the same way as subsumption and equivalence between concept descriptions. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Translating Concept Descriptions into Description Graphs}\label{transl} Following \citeA{BorgidaPatel-Schneider-JAIR-1994}, a \classic{} concept description is turned into a description graph by a recursive process. In this process, nodes and description graphs are often merged. % \begin{definition}\label{node-merge} The {\em merge of two nodes}, $n_1\oplus n_2$, is a new node $n$ with the following label: if $n_1$ or $n_2$ have label $\bot$, then the label of $n$ is $\bot$. Otherwise if both labels are not equal to $\bot$, then $n.Atoms = n_1.Atoms \cup n_2.Atoms$ and $n.REdges = n_1.REdges \cup n_2.REdges$. If $G_1=(N_1,E_1,n_1,l_1)$ and $G_2=(N_2,E_2,n_2,l_2)$ are two description graphs with disjoint recursive sets of nodes, then the merge of $G_1$ and $G_2$, $G:=G_1\oplus G_2=(N,E,n_0,l)$, is defined as follows: \begin{enumerate} \item $n_0:=n_1\oplus n_2$; \item $N:=(N_1\cup N_2\cup \{n_0\})\setminus \{n_1,n_2\}$; \item $E:=(E_1\cup E_2)[n_1/n_0,n_2/n_0]$, i.e., $E$ is the union of $E_1$ and $E_2$ where every occurrence of $n_1,n_2$ is substituted by $n_0$; \item $l(n):=l_1(n)$ for all $n\in N_1\setminus \{n_1\}$; $l(n):=l_2(n)$ for all $n\in N_2\setminus \{n_2\}$; and $l(n_0)$ is defined by the label obtained by merging $n_1$ and $n_2$. \end{enumerate} \end{definition} % Now, a $\classic$-concept description $C$ can be turned into its \emph{corresponding description graph $G(C)$} by the following translation rules. \begin{enumerate} \item $\thing$ is turned into a description graph with one node $n_0$ and no a-edges. The only atom of the node is $\thing$ and the set of r-edges is empty. \item A concept name is turned into a description graph with one node and no a-edges. The atoms of the node contain only the concept name and the node has no r-edges. \item A description of the form $(\atleast{n}{r})$ is turned into a description graph with one node and no a-edges. The node has as its atoms $\thing$ and it has a single r-edge $(r,n,\infty,G({\thing}))$ where $G(\thing)$ is specified by the first translation rule. \item A description of the form $(\atmost{n}{r})$ is turned into a description graph with one node and no a-edges. The node has as its atom $\thing$ and it has a single r-edge $(r,0,n,G({\thing}))$. \item A description of the form $\sameas{a_1\cdots a_p}{b_1 \cdots b_q}$ is turned into a graph with pairwise distinct nodes $n_1,\ldots,n_{p-1},m_1,\ldots,m_{q-1}$, the root $m_0:=n_0$, and an additional node $n_p=m_q:=n$; the set of a-edges consists of $(n_0,a_1,n_1)$, $(n_1,a_2,n_2),\ldots,(n_{p-1},a_p,n_p)$ and $(m_0,b_1,m_1)$, $(m_1,b_2,m_2)$, $\ldots$, $(m_{q-1},b_q,m_q)$, i.e., two disjoint paths which coincide on their starting point, $n_0$, and their final point, $n$. (Note that for $p=0$ the first path is the empty path from $n_0$ to $n_0$ and for $q=0$ the second path is the empty path from $n_0$ to $n_0$.) All nodes have $\thing$ as their only atom and no r-edges. \item A description of the form $\all{r}{C}$, where $r$ is a role, is turned into a description graph with one node and no a-edges. The node has the atom $\{\thing\}$ and it has a single r-edge $(r,0,\infty,G(C))$. \item A description of the form $\all{a}{C}$, where $a$ is an attribute, is turned into a description graph with one node and no a-edges. The node has the atom $\{\thing\}$ and it has a single r-edge $(a,0,1,G(C))$. (In the work by \citeauthor{BorgidaPatel-Schneider-JAIR-1994}, the concept description $\all aC$ is turned into an a-edge. As already mentioned, this would cause problems for attributes interpreted as partial functions when defining the semantics by means of $\inj$ as specified in Definition~\ref{dg-extension}.) \item To turn a description of the form $C \And D$ into a description graph, construct $G(C)$ and $G(D)$ and merge them. \end{enumerate} % Figure~\ref{fig:descgraph} shows the description graph built in this way for the concept \cAccidentCar{} of our example. It can easily be verified that the translation preserves extensions: \begin{theorem} \label{translation lemma} A concept description $C$ and its corresponding description graph $G(C)$ are equivalent, i.e.,$\intr{C}=\intr{G(C)}$ for every interpretation $\pw$. \end{theorem} % The main difficulty in the proof of this theorem is in showing that merging two description graphs corresponds to the conjunction of concept descriptions. % \begin{lemma}\label{merge lemma} For all interpretations $\pw$, if $n_1$ and $n_2$ are nodes, then $\intr{(n_1 \oplus n_2)} = \intr{n_1} \cap \intr{n_2}$; if $G_1$ and $G_2$ are description graphs then $\intr{(G_1 \oplus G_2)} = \intr{G_1} \cap \intr{G_2}$. \end{lemma} % The proof of the preceding statement is rather simple and like the one in \cite{BorgidaPatel-Schneider-JAIR-1994}. %%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Translating Description Graphs to Concept Descriptions\label{sec:trans-graph-concept}} %%%%%%%%%%%%%%%%%%%%%%%%%%%% Although the characterization of subsumption does not require translating description graphs back to concept descriptions, this translation is presented here to show that concept descriptions and description graphs are equivalent representations of $\classic$ concept descriptions. In subsequent sections, we will in fact need to turn graphs into concept descriptions. The translation of a description graph $G$ can be specified in a rather straightforward recursive definition. The main idea of the translation stems from \citeA{CohenHirsh-ML-1994}, who employed spanning trees to translate same-as equalities. A \emph{spanning tree} of a (connected) graph is a tree rooted at the same node as the graph and containing all nodes of the graph. In particular, it coincides with the graph except that some a-edges are deleted. For example, one possible spanning tree $T$ for $G$ in Figure~\ref{fig:descgraph} is obtained by deleting the a-edge labeled \rmadeBy{}, whose origin is the root of $G$. Now, let $G$ be a connected description graph and $T$ be a spanning tree for it. Then, the corresponding concept description $C_G$ is obtained as a conjunction of the following descriptions: \begin{enumerate} \item $C_G$ contains (i) a same-as equality $\sameas vv$ for every leaf $n$ of $T$, where $v$ is the label of the rooted path in $T$ to $n$; and (ii) a same-as equality $\sameas{v_1\circ a}{v_2}$ for each a-edge $(n_1,a,n_2)\in G.Edges$ not contained in $T$, where $v_i$ is the label of the rooted path to $n_i$ in $T$, $i=1,2$. \item for every node $n$ in $T$, $C_G$ contains a value restriction $\forall v.C_n$, where $v$ is the label of the rooted path in $T$ to $n$, and $C_n$ denotes the translation of the label of $n$, i.e., $C_n$ is a conjunction obtained as follows: \begin{itemize} \item every concept name in the atoms of $n$ is a conjunct in $C_n$; \item for every r-edge $(r,m,M,G')$ of $n$, $C_n$ contains (a) the number restrictions $(\atleast{m}{r})$ and $(\atmost{M}{r})$ (in case $r$ is a role and $M\not=\infty$) and (b) the value restriction $\forall r.C_{G'}$, where $C_{G'}$ is the recursively defined translation of $G'$. \end{itemize} In case the set of atoms and r-edges of $n$ is empty, define $C_n:=\top$. \end{enumerate} % Referring to the graph $G$ in Figure~\ref{fig:descgraph}, $C_G$ contains the same-as equalities $\sameas{\rmodel\circ\rmadeBy}{\rmodel\circ\rmadeBy}$ and $\sameas{\rmadeBy}{\rmodel\circ \rmadeBy}$. Furthermore, if $n_0$ denotes the root of $G$, $C_G$ has the value restrictions $\forall \varepsilon.C_{n_0}$, $\forall \rmodel.\top$, and $\forall \rmodel\,\rmadeBy.\top$, where $C_{n_0}$ corresponds to \cAccidentCar{} as defined in the introduction, but without the same-as equality. Note that, although in this case the same-as equality $\sameas{\rmodel\circ\rmadeBy}{\rmodel\circ\rmadeBy}$ is not needed, one cannot dispense with 1.(i) in the construction above, as illustrated by the following example: Without 1.(i), the description graph $G(\sameas aa)$ would be turned into the description $\top$, which is not equivalent to $\sameas aa$ since the same-as equality requires that the path $a$ has a value, which may not be the case. It is easy to prove that the translation thus defined is correct in the following sense \cite{KuestersBorgida-DCS-TR-404-1999}. % \begin{lemma}\label{lem:equivgraphconcept} Every connected description graph $G$ is equivalent to its translation $C_G$, i.e., for all interpretations $\pw$: $\intr{G}= \intr{C_G}$. \end{lemma} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Canonical Description Graphs}\label{canon} In the following we occasionally refer to ``{\em marking a node incoherent}''; this means that the label of this node is changed to $\bot$. ``{\em Marking a description graph as incoherent}'' means that the description graph is replaced by the graph $G(\bot)$ corresponding to $\bot$, i.e., the graph consisting only of one node with label $\bot$. \medskip One important property of canonical description graphs is that they are deterministic, i.e., every node has at most one outgoing edge (a-edge or r-edge) labeled with the same attribute or role name. Following \citeA{BorgidaPatel-Schneider-JAIR-1994}, in order to turn a description graph into a canonical graph we need to merge a-edges and r-edges. In addition, different from their work, it might be necessary to ``lift'' r-edges to a-edges. To {\em merge two a-edges} $(n,a,n_1)$ and $(n,a,n_2)$ in a description graph $G$, replace them with a single new edge $(n,a,n')$ where $n'$ is the result of merging $n_1$ and $n_2$. In addition, replace $n_1$ and $n_2$ by $n'$ in all other a-edges of $G$. In order to {\em merge two r-edges} $(r,s_1,k_1,G_1)$, $(r,s_2,k_2,G_2)$ replace them by the new r-edge $(r,max(s_1,s_2),min(k_1,k_2), G_1\oplus G_2)$. To {\em lift up} an r-edge $(a,m,M,G_a )$ of a node $n$ in a concept graph $G$ with an a-edge $(n,a,n_1)$, remove it from $n.REdges$, and augment $G$ by adding $G_a.Nodes$ to $G.Nodes$, $G_a.Edges$ to $G.Edges$, as well as adding $(n,a,G_a.Root)$ to $G.Edges$. A \emph{precondition} for applying this transformation is that $M=1$, or $M=0$ and $G_a$ corresponds to the graph $G(\bot)$. The reason for this precondition is that if an r-edge of the form $(a,0,0,G_a)$ is lifted without $G_a$ being inconsistent, the fact that no $a$-successors are allowed is lost. Normalization rule 5 (see below) will guarantee that this precondition can always be satisfied. A description graph $G$ is transformed into {\em canonical form} by exhaustively applying the following {\em normalization rules}. A graph is called \emph{canonical} if none of these rules can be applied. % \begin{enumerate} \item \label{item:1}If some node in $G$ is marked incoherent, mark the description graph as incoherent. \expl{Even if the node is not a root, attributes corresponding to a-edges must always have a value (since they participate in same-as equalities), and this value cannot belong to the empty set.} \item \label{item:5} If some r-edge in a node has its min greater than its max, mark the node incoherent. \expl{$\atleast 2r\And \atmost 1r\equiv \bot$} \item \label{item:8}Add $\thing$ to the atoms of every node, if absent. \item \label{item:10}If some r-edge in a node has its restriction graph marked incoherent, change its max to 0. \expl{$(\atmost 0r)\equiv \forall r.\bot$.} \item \label{item:11}If some r-edge in a node has a max of 0, mark its restriction graph as incoherent. \expl{See \ref{item:10}.} \item \label{item:21}If some r-edge is of the form $(r,0,\infty,G')$ where $G'$ only contains one node with empty set of atoms or with the atoms set to $\{\top\}$ and no r-edges, then remove this r-edge. \expl{$\forall r.\top\equiv \top$.} \item \label{item:18}If some node has two r-edges labeled with the same role, merge the two edges, as described above. \expl{$\forall r.C\And \forall r.D\equiv \forall r.{(C\And D)}$.} \item \label{item:19}If some description graph has two a-edges from the same node labeled with the same attribute, merge the two edges, as described above. \expl{$\forall a.C\And \forall a.D\equiv \forall a.{(C\And D)}$.} \item \label{item:20}If some node in a graph has both an a-edge and an r-edge for the same attribute, then ``lift up the r-edge'' if the precondition is satisfied (see above). \expl{The value restrictions imposed on attributes that participate in same-as equalities must be made explicit and gathered at one place similar to the previous to cases.} \end{enumerate} % We need to show that the transformations to canonical form do not change the semantics of the graph. The main difficulty is in showing that the merging processes and the lifting preserve the semantics. The only difference from \cite{BorgidaPatel-Schneider-JAIR-1994} is that in addition to merging r-edges and a-edges we also need to lift up r-edges. Therefore, we omit the proofs showing that merging edges preserves extensions. The proofs of the following two lemmas are routine and quite similar to the one of Lemma~\ref{attribute lifting lemma}. % \begin{lemma} \label{attribute merging extension lemma} Let $G = (N,E,n_0,l)$ be a description graph with two mergeable a-edges and let $G' = (N',E',n',l')$ be the result of merging these two a-edges. Then, $G \equiv G'$. %\proof{} %Let the two edges be $(n,n_1,\name{a},F_1)$ and %$(n,n_2,\name{a},F_2)$ and the new node $n'$ be $n_1 \oplus n_2$. %Choose $d \in \intr{G}$, and %let $\inj$ be a function from $N$ into the domain of $\pw$ satisfying the conditions %for extensions (Definition~\ref{dg-extension}) such that $\inj(n_0) = d$. %Then $\inj(n_1) = \inj(n_2)$ because both are equal to %$\intr{\notation{a}}(\inj(n))$. %Let $\inj '$ be the same-as $\inj$ except that $\inj '(n') = %\inj(n_1)= \inj(n_2)$. Then $\inj '$ satisfies %Definition~\ref{dg-extension}, part 3, for $G'$, because we replace %$n_1$ and $n_2$ by $n'$ everywhere, and the conditions for fillers %are satisfied for $\inj$. Moreover, $\inj'(n') = \inj(n_1)\ \in\ %\intr{n_1} \cap \intr{n_2}$, which by Lemma~\ref{merge lemma} equals %$\intr{(n_1 \oplus n_2)}$; so part 2 is satisfied too, since $n'= n_1 %\oplus n_2$. Finally, if the root is modified by the merger, i.e., %$n_1$ or $n_2$ is $r$, say $n_1$, then $d=\inj(n_1)=\inj'(n')$, so part 1 %of the definition is also satisfied. %\medskip %Conversely, given $d \in \intr{G'}$, let $\inj '$ be %the function stipulated by Definition~\ref{dg-extension} such that %$\inj '(r') = d$. Let $\inj$ be the same-as $\inj'$ except that %$\inj(n_1) = \inj(n')$ and $\inj(n_2) = \inj'(n')$. Then the above %argument can be traversed in reverse to verify %that $\inj $ satisfies Definition~\ref{dg-extension}, such that % $d \in \intr{G}$. %\qed \end{lemma} % \begin{lemma} \label{role merging extension lemma} Let $n$ be a node with two mergeable r-edges and let $n'$ be the node with these edges merged. Then, $\intr{n} = \intr{n'}$ for every interpretation $\pw$. %\proof{} %Let the two r-edges be $(\name{r},m_1,M_1,F_1,G_1,)$ and %$(\name{r},m_2,M_2,F_2,G_2)$. %Let $d \in \intr{n}$. %Then there are between $m_1$ ($m_2$) and $M_1$ ($M_2$) elements $d'$ of the %domain such that $(d,d') \in \intr{\syntax{r}}$. %Therefore there are between the maximum of $m_1$ and $m_2$ and % the minimum of $M_1$ and $M_2$ elements $d'$ of the domain %such that $(d,d') \in \intr{\syntax{r}}$. Furthermore, for %all $f\in F_1$ ($f\in F_2$) there is a $d'$ such that %$(d,d') \in \intr{\syntax{r}}$ and $d'\in \intr{f}$. Thus, %there are fillers of $d$ for all $f\in F_1\cup F_2$. %Also, all $d'$ such that $(d,d') \in \intr{\syntax{r}}$ %are in $\intr{G_1}$ ($\intr{G_2}$). %Therefore, all $d'$ such that $(d,d') \in \intr{\syntax{r}}$ %are in $\intr{G_1} \cap \intr{G_2}$, which equals $\intr{(G_1 \oplus %G_2)}$ by Lemma~\ref{merge lemma}. %Thus, $d \in \intr{n'}$. %Let $d \in \intr{n'}$. %Then there are between the maximum of $m_1$ and $m_2$ and % the minimum of $M_1$ and $M_2$ elements $d'$ of the domain %such that $(d,d') \in \intr{\syntax{r}}$. %Therefore there are between $m_1$ ($m_2$) and $M_1$ ($M_2$) elements %$d'$ of the %domain such that $(d,d') \in \intr{\syntax{r}}$. %Furthermore, for all $f\in F_1\cup F_2$ there is a $d'$ such that %$(d,d') \in \intr{\syntax{r}}$ and $d'\in \intr{f}$. Thus, %for all $f\in F_1$ ($f\in F_2$) there is a $d'$ such that %$(d,d') \in \intr{\syntax{r}}$ and $d'\in \intr{f}$. %Also, all $d'$ such that $(d,d') \in \intr{\syntax{r}}$ %are in $\intr{(G_1 \oplus G_2)} = \intr{G_1} \cap \intr{G_2}$. %Therefore, all $d'$ such that $(d,d') \in \intr{\syntax{r}}$ %are in $\intr{G_1}$ ($\intr{G_2}$). %Hence, $d \in \intr{n}$. %\qed \end{lemma} % \begin{lemma} \label{attribute lifting lemma} Let $G = (N,E,n_0,l)$ be a description graph with node $n$ and a-edge $(n,a,n'')$. Suppose $n$ has an associated r-edge $(a,m,M,G_a)$. Provided that the precondition for lifting r-edges is satisfied and that $G' = (N',E',n',l')$ is the result of this transformation, then $G\equiv G'$. \proof{} It is sufficient to show that $\intr{G_{|n}} = \intr{{G'}_{|n}}$, since only the label of $n$ is changed in $G'$ and only $n$ obtains an additional a-edge, which points to the graph $G_a$ not connected to the rest of $G'$. W.l.o.g. we therefore may assume that $n$ is the root of $G$, i.e., $n=n_0$. Let $d\in \intr{G}$. Thus, there is a function $\inj$ from $N$ into $\DD$ as specified in Definition~\ref{dg-extension} and an individual $e$ such that $d=\inj(n)$, $e=\inj(n'')$, and $(d,e)\in \intr{a}$. This implies $e\in \intr{G_a}$. Hence, there exists a function $\inj'$ from $G_a.Nodes$ into $\DD$ for $G_a$ and $e$ satisfying the conditions in Definition~\ref{dg-extension}. Since the sets of nodes of $G$ and $G_a$ are disjoint, we can define $\inj''$ to be the union of $\inj$ and $\inj'$, i.e., $\inj''(m):=\inj(m)$ for all nodes $m$ in $G$ and $\inj''(m):=\inj'(m)$ for all nodes $m$ in $G_a$. Since, by construction, for the additional a-edge $(n,a,G_a.Root)\in E'$ we have $(\inj''(n),\inj''(G_a.Root))\in \intr{a}$, it follows that all conditions in Definition~\ref{dg-extension} are satisfied for $d$ and $G'$, and thus, $d\in\intr{G'}$. Now let $d\in \intr{G'}$. Thus, there is a function $\inj''$ from $N'$ into $\DD$ according to Definition~\ref{dg-extension}. Let $e:=\inj''(G_a.Root)=\inj''(n'')$. Let $G''$ be the description graph we obtain from $G'$ by deleting the nodes corresponding to $G_a$, which is the same graph as $G$ without the r-edge $(a,m,M,G_a)$. If we restrict $\inj''$ to the nodes of $G''$, then it follows $d\in \intr{G''}$. Furthermore, restricting $\inj''$ to the nodes of $G_a$ yields $e\in \intr{G_a}$. In particular, $G_a$ can not be marked incoherent. Then, our precondition ensures $M=1$. Thus, since $e$ is the only $a$-successor of $d$, we can conclude $d\in \intr{G}$. \qed \end{lemma} % Having dealt with the issue of merging and lifting, it is now easy to verify that ``normalization'' does not affect the meaning of description graphs. \begin{theorem} \label{canonical theorem} If $G$ is a description graph and $G'$ is the corresponding canonical description graph, then $G\equiv G'$. \end{theorem} % As an example, the canonical description graph of the graph given in Figure~\ref{fig:descgraph} is depicted in Figure~\ref{fig:cangraph}. % \begin{figure}[tbp] \begin{center} \leavevmode \framebox[\textwidth]{ \parbox{0.99\textwidth}{\hfill\ \ \input{cangraph.eepic} \hfill\ \ } }% exported with 55 percent \caption{The canonical description graph for \cAccidentCar, where the left-most node is the root.} \label{fig:cangraph} \end{center} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Subsumption Algorithm}\label{sec:subalgo} The final part of the subsumption process is checking to see if a canonical description graph is subsumed by a concept description. As in \citeA{BorgidaPatel-Schneider-JAIR-1994}, where attributes are total, it turns out that it is not necessary to turn the potential subsumer into a canonical description graph. The subsumption algorithm presented next can also be considered as a characterization of subsumption. % \begin{algorithm}\label{subsume} {\bf (Subsumption Algorithm)} Given a concept description $D$ and description graph $G=(N,E,n_0,l)$, {\em subsumes?}$( D, G )$ is defined to be true if and only if one of the following conditions hold: \begin{enumerate} \item The description graph $G$ is marked incoherent. \item $D$ is a concept name or $\thing$, and $D$ is an element of the atoms of $n_0$. \item $D$ is $(\atleast{n}{r})$ and i) some r-edge of $n_0$ has $r$ as its role, and min greater than or equal to $n$; or ii) $n=0$. \item $D$ is $(\atmost{n}{r})$ and some r-edge of $n_0$ has $r$ as its role, and max less than or equal to $n$. \item $D$ is $\sameas{a_1 \cdots a_n}{b_1 \cdots b_m}$, and there are rooted paths with label $a_1 \cdots a_n$ and $b_1 \cdots b_m$ in $G$ ending at the same node. \item $D$ is $\all{r}{C}$, for a role $r$, and either (i) some r-edge of $n_0$ has $r$ as its role and $G'$ as its restriction graph with $\mbox{{\em subsumes?}}( C , G')$; or (ii) $\mbox{subsumes?}(C, G(\thing))$. \expl{$\all{r}{\thing}\equiv \top$.} \item $D$ is $\all{a}{C}$, for an attribute $a$, and (i) some a-edge of $G$ is of the form $(n_0,a,n')$, and $\mbox{{\em subsumes?}}(C,(N,E,n',l))$; or (ii) some r-edge of $n_0$ has $a$ as its attribute, and $G'$ as its restriction graph with $\mbox{{\em subsumes?}}( C , G')$; or (iii) $\mbox{subsumes?}(C,G(\thing))$. \item $D$ is $E \And F$ and both $\mbox{subsumes?}(E,G)$ and $\mbox{subsumes?}(F,G)$ are true. \end{enumerate} \end{algorithm} % There are only two differences between this algorithm and the one for total attributes presented by \citeauthor{BorgidaPatel-Schneider-JAIR-1994} (see also Algorithm~\ref{alg:11subsumption}). First, in the partial attribute case, given $D=\all aC$, one needs to look up the value restriction either in some a-edge or some r-edge of $G$, since attributes can label both a-edges and r-edges. (In the total attribute case, attributes can only label a-edges so that examining r-edges was not necessary.) The second and most important distinction is the treatment of same-as equalities. As shown in the above algorithm, with $D=\sameas{a_1 \cdots a_n}{b_1 \cdots b_m}$ one only needs to check whether there exist two paths labeled $v:=a_1\cdots a_n$ and $w:=b_1\cdots b_m$ leading the same node in $G$. In the total attribute case, however, it suffices if there exist prefixes $v'$ and $w'$ of $v$ and $w$ with this property, as long as the remaining suffixes are identical. Soundness and completeness of this algorithm is stated in the following theorem. % \begin{theorem} \label{subsumption theorem} Let $C$, $D$ be \classic{} descriptions. Then, $C\sqsubseteq D$ iff $subsumes?(D,G_C)$, where $G_C$ is the canonical form of $G(C)$. \end{theorem} % The soundness of the subsumption algorithm, i.e., the if direction in the theorem stated above, is pretty obvious. As in \cite{BorgidaPatel-Schneider-JAIR-1994}, the main point of the only-if direction (proof of completeness) is that the canonical graph $G_C$ is deterministic, i.e., from any node, given a role or attribute name $r$, there is at most one outgoing r-edge or a-edge with $r$ as label. We point the reader to \cite{BorgidaPatel-Schneider-JAIR-1994} %or \cite{KuestersBorgida-DCS-TR-404-1999} for the proof, since it is almost identical to the one for total attributes already published there. These proofs reveal that, for the if direction of Theorem~\ref{subsumption theorem}, description graphs need not be normalized. Thus, one can also show: \begin{remark}\label{rem:subsumption-classic} Let $G$ be some (not necessarily normalized description graph) and let $D$ be a $\classic$ concept description. Then, $subsumes?(D,G)$ implies $G\sqsubseteq D$. \end{remark} % \citeauthor{BorgidaPatel-Schneider-JAIR-1994} argue that the canonical description graph $G$ of a concept description $C$ can be constructed in time polynomial in the size of $C$. Furthermore, Algorithm~\ref{subsume} runs in time polynomial in the size of $G$ and $D$. It is not hard to see that the changes presented here do not increase the complexity. Thus, soundness and completeness of the subsumption algorithm provides us with the following corollary. % \begin{corollary}\label{cor:complexsub} Subsumption for \classic{} concept descriptions $C$ and $D$, where attributes are interpreted as partial functions, can be decided in time polynomial in the size of $C$ and $D$. \end{corollary} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% computing the lcs in CLASSIC %%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\input{classiclcs.tex} \section{Computing the LCS in $\classic$}\label{sec:lcs} In this section, we will show that the lcs of two $\classic$ concept descriptions can be stated in terms of a product of canonical description graphs. A similar result has been proven by \citeA{CohenHirsh-ML-1994} for a sublanguage of \classic{}, which only allows for concept names, concept conjunction, value restrictions, and same-as equalities. In particular, this sublanguage does not allow for inconsistent concept descriptions (which, for example, can be expressed by conflicting number-restrictions). Furthermore, the semantics of the description graphs provided by \citeauthor{CohenHirsh-ML-1994} restricts the results to the case when description graphs are acyclic. This excludes, for example, same-as equalities of the form $\sameas{\epsilon}{\compose{{\sf spouse}}{{\sf spouse}}}$. In the following, we first define the product of description graphs. Then, we show that for given concept descriptions $C$ and $D$, the lcs is equivalent to a description graph obtained as the product of $G_C$ and $G_D$. Our constructions and proofs will be quite close to those in \cite{CohenHirsh-ML-1994}. %%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{The Product of Description Graphs}\label{sec:classicproduct} %%%%%%%%%%%%%%%%%%%%%%%%%% A description graph represents the constraints that must be satisfied by all individuals in the extension of the graph. Intuitively, the product of two description graphs is the intersection of these constraints---as the product of finite automata corresponds to the intersection of the words accepted by the automata. However, in the definition of the product of description graphs special care has to be taken of incoherent nodes, i.e., nodes labeled with $\bot$. Also, since attributes may occur both in r-edges and a-edges, one needs to take the product between restriction graphs of r-edges, on the one hand, and the original graphs $G_1$ or $G_2$ (rooted at certain nodes), on the other hand. % \begin{definition}\label{def:productgraph} Let $G_1=(N_1,E_1,n_1,l_{1})$ and $G_2=(N_2,E_2,n_2,l_{2})$ be two description graphs. Then, the {\em product $G:=G_1\times G_2:=(N,E,n_0,l)$ of the two graphs} is recursively defined as follows: \begin{enumerate} \item $N:=N_1\times N_2$; \item $n_0:=(n_1,n_2)$; \item $E:=$\parbox[t]{0.8\textwidth}{$\{((n,n'),a,(m,m'))\;|\;$ $(n,a,m)\in E_1$ and $(n',a,m')\in E_2$\};} \item Let $n\in N_1$ and $n'\in N_2$. If $l_{1}(n)=\bot$, then let $l((n,n')):=l_{2}(n')$ and, analogously, if $l_2(n')=\bot$, then $l((n,n')):=l_1(n)$. Otherwise, for $l_{1}(n)=(S_1,H_1)$ and $l_{2}(n')=(S_2,H_2)$, define $l((n,n')):=(S,H)$ where \begin{enumerate} \item $S:=S_1\cap S_2$; \item {\small $H:=$ \\ $\{(r,min(p_1,p_2),max(q_1,q_2), G'_1\times G'_2 )\;|\;$ $(r,p_1,q_1,G'_1)\in H_1$, $(r,p_2,q_2,G'_2)\in H_2\}\: \cup$\\ $\{(a,0,1,{G_1}_{|m}\times G'_2)\;|\;(n,a,m)\in E_1$, $(a,p_2,q_2,G'_2)\in H_2\}\: \cup$\\ $\{(a,0,1,G'_1\times {G_2}_{|m} )\;|\; (a,p_1,q_1,G'_1)\in H_1$, $(n',a,m)\in E_2 \}$. } \end{enumerate} \end{enumerate} \end{definition} % According to this definition, if in the tuple $(n,n')$ some node, say $n$, is incoherent, then the label of $(n,n')$ coincides with the one for $n'$. The reason for defining the label in this way is that $lcs(\bot,C)\equiv C$ for every concept description $C$. This has been overlooked by \citeA{FrazierPitt-ML-1996}, thus making their constructions and proofs only hold for concept descriptions that do not contain inconsistent subexpressions. Note that $G$, as defined here, might not be connected, i.e., it might contain nodes that cannot be reached from the root $n_0$. Even if $G_1$ and $G_2$ are connected this can happen because all tuples $(n_1,n_2)$ belong to the set of nodes of $G$ regardless of whether they are reachable from the root or not. However, as already mentioned in Section~\ref{cg-defn} we may assume $G$ to be connected. Also note that the product graph can be translated back into a $\classic$ concept description since the product of two description graphs is once again a description graph. %%%%%%%%%%%%%%%%%%%%%% \subsection{Computing the LCS} %%%%%%%%%%%%%%%%%%%%%% We now prove the main theorem of this subsection, which states that the product of two description graphs is equivalent to the lcs of the corresponding concept descriptions. % \begin{theorem}\label{the:lcs} Let $C_1$ and $C_2$ be two concept descriptions, and let $G_1$ and $G_2$ be corresponding {\em canonical} description graphs. Then, $C_{G_1\times G_2}\equiv lcs(C_1,C_2)$. \end{theorem} \noindent{\bf Proof.} Let $G:=G_1\times G_2$. We will only sketch the proof showing that $C_G$ subsumes $C_1$ and, by symmetry, also $C_2$ (see \cite{KuestersBorgida-DCS-TR-404-1999} for details). By construction, if there are two rooted paths to a common node in $G$, then $G_1$ has corresponding paths leading to the same node as well. Thus, by Theorem~\ref{subsumption theorem}, the same-as equalities in $C_G$ subsume the ones in $C_1$. Now, let $T$ be a spanning tree of $G$, $(m_1,m_2)$ be a node in $G$, and $v$ be the label of the rooted path in $T$ to $(m_1,m_2)$. Then, by construction it follows that there exists a rooted path in $G_1$ to $m_1$ labeled $v$. Furthermore, a rather straightforward inductive proof shows that the concept description $E$ corresponding to the label of $(m_1,m_2)$ subsumes ${G_1}_{|m_1}$. This implies $\forall v.E\sqsupseteq {G_1}$. As a result, we can conclude $G\sqsupseteq G_1$. The more interesting part of the proof is to show that $C_G$ is not only a common subsumer of $C_1$ and $C_2$, but the \emph{least} common subsumer. We now show by induction over the size of $D$, $C_1$, and $C_2$ that if $D$ subsumes $C_1$ and $C_2$, then $D$ subsumes $C_G$: We distinguish different cases according to the definition of ``$subsumes?$''. Let $G_1=(N_1,E_1,n_1,l_{1})$ be the canonical description graph of $C_1$, $G_2=(N_2,E_2,n_2,l_{2})$ be the canonical description graph of $C_2$, and $G=(N,E,n_0,l)=G_1\times G_2$. In the following, we assume that $C_1\sqsubseteq D$ and $C_2\sqsubseteq D$; thus, $subsumes?(D,G_1)$ and $subsumes?(D,G_2)$. We show that $subsumes?(D,G)$. Then, Remark~\ref{rem:subsumption-classic} implies $G\sqsubseteq D$, and thus, $C_G\sqsubseteq D$. Note that one cannot use Theorem~\ref{subsumption theorem} since $G$ might not be a canonical description graph. % \begin{enumerate} \item If $G$ is incoherent, then there is nothing to show. \item If $D$ is a concept name, $\thing$, or a number-restriction, then by definition of the label of $n_0$ it is easy to see that $subsumes?(D,G)$. \item If $D$ is $\sameas vw$, then there exist nodes $m_1$ in $G_1$ and $m_2$ in $G_2$ such that there are two paths from $n_1$ to $m_1$ with label $v$ and $w$, respectively, as well as two paths from $n_2$ to $m_2$ with label $v$ and $w$. Then, by definition of $G$ it is easy to see that there are two paths from $n_0=(n_1,n_2)$ to $(m_1,m_2)$ with label $v$ and $w$, respectively. This shows $subsumes?(D,G)$. \item If $D$ is $\all rC$, $r$ a role or attribute, then one of several cases applies: (i) $n_1$ and $n_2$ have r-edges with role or attribute $r$, and restriction graphs $G'_1$ and $G'_2$, respectively, such that $subsumes?(C,G'_1)$ and $subsumes?(C,G'_2)$; (ii) without loss of generality, $n_1$ has an a-edge pointing to $m_1$ with attribute $r$, such that $subsumes?(C,G'_1)$, where $G'_1:={G_1}_{|m_1}$; and $n_2$ has an r-edge with restriction graph $G'_2$ such that $subsumes?(C,G'_2)$. In both cases (i) and (ii), $subsumes?(C,G'_1\times G'_2)$ follows by induction. Furthermore, by definition of $G$ there is an r-edge with role $r$ and restriction graph $G'_1\times G'_2$ for $n_0$. This implies $subsumes?(D,G)$. (iii) $n_1$ and $n_2$ have a-edges with attribute $r$ leading to nodes $m_1$ and $m_2$, respectively. Then, $subsumes?(C,{G_1}_{|m_1})$ and $subsumes?(C,{G_2}_{|m_2})$. By induction, we know $subsumes?(C,{G_1}_{|m_1}\times {G_2}_{|m_2})$. It is easy to see that $G_{|(m_1,m_2)}={G_1}_{|m_1}\times {G_2}_{|m_2}$. Furthermore, by definition there is an a-edge with attribute $r$ from $(n_1,n_2)$ to $(m_1,m_2)$ in $G$. This shows $subsumes?(D,G)$. (iv) (without loss of generality) $n_1$ has no r-edge and no a-edge with role or attribute $r$. This implies $subsumes?(C,G(\thing))$, which also ensures $subsumes?(D,G)$. \item If $D$ is $E\And F$, then by definition of the subsumption algorithm, $subsumes?(E,G_1)$ and $subsumes?(E,G_2)$ hold. By induction, we have $subsumes?(E,G)$, and analogously, $subsumes?(F,G)$. Thus, $subsumes?(D,G)$. \qed \end{enumerate} % As stated in Section~\ref{sec:subalgo}, a canonical description graph for a \classic{} concept description can be computed in time polynomial in the size of the concept description. It is not hard to verify that the product of two description graphs can be computed in time polynomial in the size of the graphs. In addition, the concept description corresponding to a description graph can be computed in time polynomial in the size of the graph. Thus, as a consequence of Theorem~\ref{the:lcs} we obtain: % \begin{corollary}\label{cor:complexlcs} The lcs of two \classic{} concept descriptions always exists and can be computed in time polynomial in the size of the concept descriptions. \end{corollary} % As intimated in \cite{CohenBorgida+-AAAI-1992}, this statement does not hold for sequences of concept descriptions. Intuitively, generalizing the lcs algorithm to sequences of, say, $n$ concept descriptions, means computing the product of $n$ description graphs. The following proposition shows that the size of such a product graph may grow exponentially in $n$. Thus, the lcs computed in this way grows exponentially in the size of the given sequence. However, this does not imply that this exponential blow-up is unavoidable. There might exist a smaller, still equivalent representation of the lcs. Nevertheless, we can show that the exponential growth is inevitable. % \begin{proposition}\label{pro:complexlcs} For all integers $n\ge 2$ there exists a sequence $D_1,\ldots,D_n$ of \classic{} concept descriptions such that the size of every \classic{} concept description equivalent to $lcs(D_1,\ldots,D_n)$ is at least exponential in $n$ where the size of the ${D_i}'s$ is linear in $n$. \end{proposition} \proof{} As in \citeA{CohenBorgida+-AAAI-1992}, for a given $n$, define the concept descriptions $D_i$ as follows: \[ D_i\;:=\;\bigsqcap_{j\not= i}(\sameas {\varepsilon}{a_j})\sqcap \bigsqcap_{j\not= i}(\sameas {a_i}{a_i a_j})\sqcap (\sameas {\varepsilon}{a_i a_i}) \] % where $a_1,\ldots,a_n$ denote attributes. The canonical description graph for $D_i$ is depicted in Figure~\ref{fig:sequencelcs}. % \begin{figure}[tbp] \begin{center} \leavevmode \framebox[\textwidth]{ \parbox{0.99\textwidth}{\hfill\ \ \input{sequencelcs.eepic} \hfill\ \ } }% exported with 60 percent \caption{The canonical description graph for $D_i$, without node labels.} \label{fig:sequencelcs} \end{center} \end{figure} % Using Algorithm~\ref{subsume} it is easy to see that $D_i\sqsubseteq \sameas vw$ iff the number of ${a_i}'s$ in $v$ and the number of ${a_i}'s$ in $w$ are equal modulo 2 where $v, w$ are words over $\{a_1,\ldots,a_n\}$. This implies that % \begin{eqnarray}\label{eqn:complexlcs} D_1,\dots,D_n\sqsubseteq \sameas vw&\mbox{ iff }&\mbox{\parbox[t]{0.5\textwidth}{for all $1\le i\le n$ the number of ${a_i}'s$ in $v$ and the number of ${a_i}'s$ in $w$ are equal modulo 2.}} \end{eqnarray} % Let $s\subseteq \{1,\ldots,n\}$ be a non-empty set. We define $v_s:=a_{i_1}\cdots a_{i_k}$ where $i_1<\cdots