Package mekano :: Module textual
[hide private]
[frames] | no frames]

Module textual

source code

Functions [hide private]
 
BasicTokenizer(s, minlen=1)
Split on any non-word letter.
source code
 
WordRegexTokenizer(s)
Find 3 or more letter words.
source code
 
WordNumberRegexTokenizer(s)
Find 4 or more letter words or numbers/currencies.
source code
 
Vectorize(s, af, tokenizer=<function WordRegexTokenizer at 0x101a98c08>)
Create an AtomVector from a string.
source code
Variables [hide private]
  wordsplitter_rex = re.compile(r'\W+')
  word_regex = re.compile(r'\b[a-z]{3,}\b')
  word_number_regex = re.compile(r'\b[a-z][a-z0-9]{3,}\b|(\$|\b)...
  __package__ = 'mekano'
Function Details [hide private]

BasicTokenizer(s, minlen=1)

source code 

Split on any non-word letter.

Words need not start with [a-z]

WordRegexTokenizer(s)

source code 

Find 3 or more letter words.

Words must start with [a-z]

WordNumberRegexTokenizer(s)

source code 

Find 4 or more letter words or numbers/currencies.

Words must start with [a-z]

Vectorize(s, af, tokenizer=<function WordRegexTokenizer at 0x101a98c08>)

source code 

Create an AtomVector from a string.

Tokenizes string 's' using tokenizer, creating atoms using AtomFactory 'af'.


Variables Details [hide private]

word_number_regex

Value:
re.compile(r'\b[a-z][a-z0-9]{3,}\b|(\$|\b)[0-9]+(,[0-9]{3})*(\.[0-9]+)\
?\b')