\documentstyle[12pt]{article} 
%
\begin{document}
%
\noindent
\topmargin -0.1 in
\oddsidemargin 0.4 in
\evensidemargin 0.8 in
\textheight 8.9 in
\textwidth 5.3 in
\parskip=10 pt
\parindent=0pt
%\baselineskip 0.8 in
\renewcommand{\baselinestretch}{1.1}
%

\begin{center}
\Large \bf
DESCRIPTION of META-DATA
\end{center}

The meta-data file, is a flat file that contains a set of records, 
each related to one particular test that involved one classification 
algorithm and one dataset:
\begin{center}
\begin{tabular}{|l|l||l|l|} \hline\hline
\multicolumn {2} {|c||} {\em Algorithms} & 
\multicolumn {2} {|c|} {\em DataSets} \\  \hline

C4.5 &  NewId & Credit\_Austr & Belgian \\
AC2 &  CART & Chromosome & Credit\_Man \\
IndCART & Cal5 & CUT &  DNA \\
CN2 &  ITRule & Diabetes & Digits44 \\
Discrim &  QuaDisc & Credit\_German & Faults \\
LogDisc &  ALLOC80 & Head & Heart \\
kNN & SMART & KLDigits &  Letters \\
BayesTree &  CASTLE & New\_Belgian & Sat\_Image \\
DIPLO92 & RBF & Segment & Shuttle \\
LVQ & Backprop & Technical &  TseTse \\
Kohonen & & Vehicle  & \\
\hline
\end{tabular}
\end{center}

Each record contains the following attributes:\\

\begin{tabular}{|l|l|l|}	\hline\hline
{\em Attributte} & {\em Type} & {\em Description}\\ \hline

T & continuous	& Number of examples in test set \\
N & continuous	& Number of examples \\
p & continuous	& Number of attributes \\
k & continuous	& Number of classes \\
Bin & continuous	& Number of binary Attributes \\
Cost & continuous	& Cost (1=yes,0=no) \\
SDratio & continuous	& Standard deviation ratio \\
correl & continuous	& Mean correlation between attributes \\ 
cancor1 & continuous	& First canonical correlation \\
cancor2 & continuous	& Second canonical correlation \\
fract1 & continuous	& First eigenvalue \\
fract2 & continuous	& Second eigenvalue \\
skewness &  continuous	& Mean of $|E(X-Mean)|^3/STD^3$ \\
kurtosis & continuous	& Mean of $|E(X-Mean)|^4/STD^4$ \\
Hc & continuous		& Mean entropy of attributes \\
Hx & continuous		& Entropy of classes \\
MCx & continuous	& Mean mutual entropy of class and attributes \\
EnAtr & continuous	& Equivalent number of attributes \\
NSRatio & continuous	& Noise-signal ratio \\
DS\_Name & categorical   	& Name of DataSet \\
Alg\_Name & categorical	& Name of Algorithm \\
Norm\_error & continuous	& Normalized Error (continuous class) \\
\hline
\end{tabular}


See the references below for detailed description of the attributes.

The last value in each record represents 
a normalized error of the algorithm Ai on the dataset. 
This is a numerical value, and represents, in effect a continuous class.
 
This value is calculated as a difference between error rate of Ai 
and the best error rate acieved on the same dataset. 
This difference is expressed in terms of standard deviations
(i.e. the difference of error rates is divided by the estimate of 
the standard deviation relative to the best success rate).

The numerical values can be turned into categorical ones (i.e. classes),
by invoking the AWK program "classify" (enclosed) as follows:

\begin{center}
	awk -f classify k=8 meta.data

\end{center}
Where k=8 sets a treshold. All values less than the threshold are "Applicable", the others are "Non Applicable".
File "meta.data" is ordered by dataset name. 
The UNIX command:
\begin{center}
	sort -f -b +20 -21 meta.data
\end{center}
allows the user to obtain the data ordered by algorithm name.

The UNIX command:
\begin{center}
	grep $<Algorithm>$ meta.data
\end{center}
allows the user to obtain the data corresponding to a particular algorithm.

Further information can be obtained from:

\begin{center}
\begin{tabular}{l l}
       P.Brazdil or J.Gama & Tel.:  +351 600 1672 \\
       LIACC, University of Porto & Fax.:  +351 600 3654 \\
       Rua Campo Alegre 823 & Email:  statlog-adm@ncc.up.pt \\
       4150 Porto, Portugal 
\end{tabular}
\end{center}

\begin{center}
\Large \bf

References
\\
\end{center}

\begin{enumerate}
\item P. Brazdil, J.Gama and B.Henery: Characterizing the Applicability of 
Classification Algorithms Using Meta-Level Learning, in Proc. of Machine 
Learning - ECML-94, ed. F.Bergadano and L.de Raedt, Springer-Verlag. 
\\
\item D.Michie, D.Spiegelhalter, C.C.Taylor:  Machine Learning, Neural and 
Statistical Classification, Prentice Hall, 1994. 
\end{enumerate}

\end{document}
