1 """\
2 Mekano
3 ======
4 Provides low-level building blocks for information retrieval and machine learning,
5 with a special focus on text processing.
6
7 Features
8 ========
9 - Representing text documents as sparse vectors
10 - Representing a collection of documents as a dataset, which can be subsetted for cross-validation etc.
11 - Evaluation using various metrics
12 - Reading various common input formats like SMART and TREC
13 - Parsing and tokenizing text
14 - Maintaining corpus statistics (term frequecies), creating inverted indexes
15 - Creating weighted document vectors (TF--IDF) based on corpus statistics
16
17 Most of the code is in Python, with some crucial functions implemented in C++.
18
19 Getting started
20 ===============
21 The L{atoms} sub-package provides all functionality related to representing text documents as numbers.
22 This is a good place to start using the Mekano package.
23
24 The L{ml} sub-package provides access to classifiers and related utilities.
25
26 The L{dataset} module provides a handy class for representing and working with datasets (collections of documents).
27
28 The L{io} module contains several functions for reading common file formats and working with Python pickles.
29
30 The L{evaluator} module contains evaluation tools.
31
32 The L{textual} module contains functions for parsing and tokenizing text.
33
34 The L{indri} module provides a simple interface to the U{Indri<http://www.lemurproject.org/indri/>} binaries.
35
36 """
37
38 __version__ = "2.0"
39
40
41 import logging
42
43
44 from atoms import *
45
46
47 import io
48 from dataset import Dataset
49 import textual
50
51 import indri
52
53 from ml import *
54