|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectinfo.ephyra.questionanalysis.atype.extractor.FeatureExtractor
public abstract class FeatureExtractor
A feature extractor for question classification. The most important
functionality is provided by the createInstance
method (a couple convenience versions also provided) which create
an edu.cmu.minorthird.classify.Instance object from basic question data
(the original question and its syntactic parse tree).
createInstance can be used to extraction features for run-time classification
feature extraction. It is also used
when loading edu.cmu.minorthird.classify.Example objects from a dataset
file at training time (see loadFile and
createExample). Thus, feature extraction for classification
is accomplished by the same code for both training and run-time classification.
An important thing for subclassing classes to note is that the Instance returned
by a createInstance(...) method must have the original question, as
a String, as it's source.
| Field Summary | |
|---|---|
protected int |
classLevels
|
protected java.util.regex.Pattern |
datasetExamplePattern
Regular expression describing the format of a line in a question classification dataset. |
protected boolean |
isInitialized
|
protected int |
labelPosition
The captured group index of the answer type label in the dataset line Pattern. |
private static org.apache.log4j.Logger |
log
|
protected int |
numLoaded
|
protected int |
parsePosition
The captured group index of the syntactic parse tree in the dataset line Pattern. |
protected int |
questionPosition
The captured group index of the question in the dataset line Pattern. |
protected static java.lang.String |
SPACE_PTRN
|
protected boolean |
useClassLevels
|
| Constructor Summary | |
|---|---|
FeatureExtractor()
|
|
| Method Summary | |
|---|---|
edu.cmu.minorthird.classify.Example[] |
createExample(java.lang.String datasetLine)
Creates an edu.cmu.minorthird.classify.Example object from one line of a dataset file using createInstance(String, String). |
abstract edu.cmu.minorthird.classify.Instance |
createInstance(java.util.List<edu.cmu.lti.javelin.qa.Term> terms,
java.lang.String parseTree)
Given a question as a list of Terms and it's syntactic parse tree, creates a Instance for question classification by extracting the appropriate features. |
abstract edu.cmu.minorthird.classify.Instance |
createInstance(java.lang.String question)
Creates an Instance for question classification when nothing but the original question is available for feature extraction. |
edu.cmu.minorthird.classify.Instance |
createInstance(java.lang.String question,
java.lang.String parseTree)
Convenience method that tokenizes the given question by whitespace, creates Terms, and calls createInstance(List, String). |
int |
getClassLevels()
|
java.util.regex.Pattern |
getDatasetExamplePattern()
|
int |
getLabelPosition()
|
int |
getNumLoaded()
|
int |
getParsePosition()
|
int |
getQuestionPosition()
|
void |
initialize()
Reads in properties from this class's properties file and sets class data members. |
boolean |
isInitialized()
|
boolean |
isUsingClassLevels()
|
edu.cmu.minorthird.classify.Example[] |
loadFile(java.lang.String fileName)
Loads an array of edu.cmu.minorthird.classify.Example objects from the file at the given location, using datasetExamplePattern and
createExample. |
void |
printFeatures(java.lang.String dataSetFileName,
java.util.List<java.lang.String> features)
Prints the features generated for each example in an input file. |
void |
printFeaturesFromQuestions(java.lang.String questionSetFileName,
java.util.List<java.lang.String> features)
Prints the features generated for each example in an input file. |
void |
setClassLevels(int classLevels)
|
void |
setDatasetExamplePattern(java.util.regex.Pattern datasetExamplePattern)
|
void |
setInitialized(boolean isInitialized)
|
void |
setLabelPosition(int labelPosition)
|
void |
setParsePosition(int parsePosition)
|
void |
setQuestionPosition(int questionPosition)
|
void |
setUseClassLevels(boolean useClassLevels)
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private static final org.apache.log4j.Logger log
protected static java.lang.String SPACE_PTRN
protected java.util.regex.Pattern datasetExamplePattern
labelPosition,
questionPosition, and parsePosition, respectively.
protected int labelPosition
protected int questionPosition
protected int parsePosition
protected int classLevels
protected boolean useClassLevels
protected int numLoaded
protected boolean isInitialized
| Constructor Detail |
|---|
public FeatureExtractor()
| Method Detail |
|---|
public void initialize()
throws java.lang.Exception
java.lang.Exception
public abstract edu.cmu.minorthird.classify.Instance createInstance(java.util.List<edu.cmu.lti.javelin.qa.Term> terms,
java.lang.String parseTree)
terms - the Terms of the questionparseTree - the syntactic parse tree of the question
public edu.cmu.minorthird.classify.Instance createInstance(java.lang.String question,
java.lang.String parseTree)
createInstance(List, String).
question - the question to create an Instance fromparseTree - the syntactic parse tree of the questionpublic abstract edu.cmu.minorthird.classify.Instance createInstance(java.lang.String question)
question - the input question
public edu.cmu.minorthird.classify.Example[] createExample(java.lang.String datasetLine)
throws java.lang.Exception
createInstance(String, String).
datasetLine - the line from the dataset file from which to create
the Example
java.lang.Exceptionpublic edu.cmu.minorthird.classify.Example[] loadFile(java.lang.String fileName)
datasetExamplePattern and
createExample.
fileName - the name of the dataset file
public void printFeatures(java.lang.String dataSetFileName,
java.util.List<java.lang.String> features)
dataSetFileName - the name of the file containing the dataset to loadfeatures - a List of the features to print
public void printFeaturesFromQuestions(java.lang.String questionSetFileName,
java.util.List<java.lang.String> features)
questionSetFileName - the name of the file containing the dataset to loadfeatures - a List of the features to printpublic boolean isInitialized()
public void setInitialized(boolean isInitialized)
isInitialized - the isInitialized to setpublic int getNumLoaded()
public void setClassLevels(int classLevels)
public int getClassLevels()
public void setUseClassLevels(boolean useClassLevels)
public boolean isUsingClassLevels()
public java.util.regex.Pattern getDatasetExamplePattern()
public void setDatasetExamplePattern(java.util.regex.Pattern datasetExamplePattern)
datasetExamplePattern - the datasetExamplePattern to setpublic int getLabelPosition()
public void setLabelPosition(int labelPosition)
labelPosition - the labelPosition to setpublic int getParsePosition()
public void setParsePosition(int parsePosition)
parsePosition - the parsePosition to setpublic int getQuestionPosition()
public void setQuestionPosition(int questionPosition)
questionPosition - the questionPosition to set
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||