info.ephyra.nlp
Class LingPipe

java.lang.Object
  extended by info.ephyra.nlp.LingPipe

public class LingPipe
extends java.lang.Object

This class provides a common interface to the LingPipe toolkit.

It supports the following natural language processing tools:

Version:
2006-11-25
Author:
Nico Schlaefer

Field Summary
private static com.aliasi.sentences.SentenceModel sentenceModel
          Sentence detection model.
private static com.aliasi.tokenizer.TokenizerFactory tokenizerFactory
          Tokenization model.
 
Constructor Summary
LingPipe()
           
 
Method Summary
static void createSentenceDetector()
          Creates models for the tokenizer and the sentence detector, if not already done.
static void createTokenizer()
          Creates a model for the tokenizer, if not done already.
static java.lang.String[] sentDetect(java.lang.String text)
          Splits a text into sentences.
static java.lang.String[] tokenize(java.lang.String text)
          Tokenizes a text.
static java.lang.String tokenizeWithSpaces(java.lang.String text)
          Tokenizes a text and concatenates the tokens with spaces.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tokenizerFactory

private static com.aliasi.tokenizer.TokenizerFactory tokenizerFactory
Tokenization model.


sentenceModel

private static com.aliasi.sentences.SentenceModel sentenceModel
Sentence detection model.

Constructor Detail

LingPipe

public LingPipe()
Method Detail

createTokenizer

public static void createTokenizer()
Creates a model for the tokenizer, if not done already.


createSentenceDetector

public static void createSentenceDetector()
Creates models for the tokenizer and the sentence detector, if not already done.


tokenize

public static java.lang.String[] tokenize(java.lang.String text)
Tokenizes a text.

Parameters:
text - text to tokenize
Returns:
array of tokens or null, if the tokenizer is not initialized

tokenizeWithSpaces

public static java.lang.String tokenizeWithSpaces(java.lang.String text)
Tokenizes a text and concatenates the tokens with spaces.

Parameters:
text - text to tokenize
Returns:
string of space-delimited tokens or null, if the tokenizer is not initialized

sentDetect

public static java.lang.String[] sentDetect(java.lang.String text)
Splits a text into sentences.

Parameters:
text - sequence of sentences
Returns:
array of sentences in the text or null, if the sentence detector is not initialized