Package mekano
[hide private]
[frames] | no frames]

Source Code for Package mekano

 1  """\ 
 2  Mekano 
 3  ====== 
 4  Provides low-level building blocks for information retrieval and machine learning, 
 5  with a special focus on text processing. 
 6   
 7  Features 
 8  ======== 
 9    - Representing text documents as sparse vectors 
10    - Representing a collection of documents as a dataset, which can be subsetted for cross-validation etc. 
11    - Evaluation using various metrics 
12    - Reading various common input formats like SMART and TREC 
13    - Parsing and tokenizing text 
14    - Maintaining corpus statistics (term frequecies), creating inverted indexes 
15    - Creating weighted document vectors (TF--IDF) based on corpus statistics 
16     
17  Most of the code is in Python, with some crucial functions implemented in C++. 
18   
19  Getting started 
20  =============== 
21  The L{atoms} sub-package provides all functionality related to representing text documents as numbers. 
22  This is a good place to start using the Mekano package. 
23   
24  The L{ml} sub-package provides access to classifiers and related utilities. 
25   
26  The L{dataset} module provides a handy class for representing and working with datasets (collections of documents). 
27   
28  The L{io} module contains several functions for reading common file formats and working with Python pickles. 
29   
30  The L{evaluator} module contains evaluation tools. 
31   
32  The L{textual} module contains functions for parsing and tokenizing text. 
33   
34  The L{indri} module provides a simple interface to the U{Indri<http://www.lemurproject.org/indri/>} binaries. 
35   
36  """ 
37   
38  __version__ = "2.0" 
39   
40  # Misc 
41  import logging 
42   
43  # Atoms 
44  from atoms import * 
45   
46  # IO 
47  import io 
48  from dataset import Dataset 
49  import textual 
50   
51  import indri 
52   
53  from ml import * 
54