info.ephyra.questionanalysis
Class Term

java.lang.Object
  extended by info.ephyra.questionanalysis.Term
All Implemented Interfaces:
java.io.Serializable

public class Term
extends java.lang.Object
implements java.io.Serializable

A Term comprises one or more tokens of text that form a unit of meaning. It can be an individual word, a compound noun or a named entity.

This class implements the interface Serializable.

Version:
2008-01-23
Author:
Nico Schlaefer
See Also:
Serialized Form

Field Summary
static java.lang.String COMPOUND
          Part of speech tag for terms that comprise multiple tokens
private  java.util.Map<java.lang.String,java.lang.Double> expansionLemmas
          Maps lemmas of the expansions to their weights.
private  java.util.Map<java.lang.String,java.lang.Double> expansions
          Maps expansions of the term to their weights.
private  java.lang.String lemma
          The lemma of the term.
private  java.lang.String[] neTypes
          The named entity types of the term (optional).
private  java.lang.String pos
          The part of speech of the term or COMPOUND to indicate that it comprises multiple tokens.
private  double relFrequency
          Relative frequency of the term.
private static long serialVersionUID
          Version number used during deserialization.
private  java.lang.String text
          The textual representation of the term.
 
Constructor Summary
Term(java.lang.String text, java.lang.String pos)
          Constructs a term from the provided information.
Term(java.lang.String text, java.lang.String pos, java.lang.String[] neTypes)
          Constructs a term from the provided information.
 
Method Summary
private  void generateLemma()
          Generates the lemma of the term.
 java.util.Map<java.lang.String,java.lang.Double> getExpansions()
           
 java.lang.String getLemma()
           
 java.lang.String[] getNeTypes()
           
 java.lang.String getPos()
           
 double getRelFrequency()
           
 java.lang.String getText()
           
 double getWeight(java.lang.String lemma)
          Gets the weight of the term or expansion with the given lemma.
 void setExpansionLemmas(java.util.Map<java.lang.String,java.lang.Double> expansionLemmas)
          Normalizes and sets the lemmas of the expansions.
 void setExpansions(java.util.Map<java.lang.String,java.lang.Double> expansions)
           
 void setLemma(java.lang.String lemma)
          Normalizes and sets the lemma of the term.
 void setNeTypes(java.lang.String[] neTypes)
           
 void setRelFrequency(double relFrequency)
           
 double simScore(java.lang.String lemma)
          Calculates similarity scores for the given lemma and the lemmas of the term and its expansions based on their weights and the number of common tokens.
 java.lang.String toString()
          Creates a string representation of the term.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
Version number used during deserialization.

See Also:
Constant Field Values

COMPOUND

public static final java.lang.String COMPOUND
Part of speech tag for terms that comprise multiple tokens

See Also:
Constant Field Values

text

private java.lang.String text
The textual representation of the term.


lemma

private java.lang.String lemma
The lemma of the term.


pos

private java.lang.String pos
The part of speech of the term or COMPOUND to indicate that it comprises multiple tokens.


neTypes

private java.lang.String[] neTypes
The named entity types of the term (optional).


relFrequency

private double relFrequency
Relative frequency of the term.


expansions

private java.util.Map<java.lang.String,java.lang.Double> expansions
Maps expansions of the term to their weights.


expansionLemmas

private java.util.Map<java.lang.String,java.lang.Double> expansionLemmas
Maps lemmas of the expansions to their weights.

Constructor Detail

Term

public Term(java.lang.String text,
            java.lang.String pos)
Constructs a term from the provided information.

Parameters:
text - textual representation
pos - part of speech

Term

public Term(java.lang.String text,
            java.lang.String pos,
            java.lang.String[] neTypes)
Constructs a term from the provided information.

Parameters:
text - textual representation
pos - part of speech
neTypes - named entity types
Method Detail

getText

public java.lang.String getText()

getLemma

public java.lang.String getLemma()

getPos

public java.lang.String getPos()

getNeTypes

public java.lang.String[] getNeTypes()

setNeTypes

public void setNeTypes(java.lang.String[] neTypes)

getRelFrequency

public double getRelFrequency()

setRelFrequency

public void setRelFrequency(double relFrequency)

getExpansions

public java.util.Map<java.lang.String,java.lang.Double> getExpansions()

setExpansions

public void setExpansions(java.util.Map<java.lang.String,java.lang.Double> expansions)

generateLemma

private void generateLemma()
Generates the lemma of the term.


setLemma

public void setLemma(java.lang.String lemma)
Normalizes and sets the lemma of the term.

Parameters:
lemma - the lemma of the term

setExpansionLemmas

public void setExpansionLemmas(java.util.Map<java.lang.String,java.lang.Double> expansionLemmas)
Normalizes and sets the lemmas of the expansions.

Parameters:
expansionLemmas - the lemmas of the expansions

getWeight

public double getWeight(java.lang.String lemma)
Gets the weight of the term or expansion with the given lemma.

Parameters:
lemma - the lemma
Returns:
the weight or 0 if there is no match

simScore

public double simScore(java.lang.String lemma)
Calculates similarity scores for the given lemma and the lemmas of the term and its expansions based on their weights and the number of common tokens. Gets the maximum of all these scores.

Parameters:
lemma - lemma to compare with
Returns:
similarity score

toString

public java.lang.String toString()
Creates a string representation of the term.

Overrides:
toString in class java.lang.Object
Returns:
string representation