info.ephyra.answerselection.filters
Class ScoreNormalizationFilter

java.lang.Object
  extended by info.ephyra.answerselection.filters.Filter
      extended by info.ephyra.answerselection.filters.ScoreNormalizationFilter

public class ScoreNormalizationFilter
extends Filter

A filter that normalizes the scores of the answer candidates by applying a trained classifier. The weight of the positive class ("answer correct") is used as the normalized score.

The main method can be used to evaluate different combinations of features and models and to train a classifier with the best combination.

The filter is applied to factoid answers only.

This class extends the class Filter.

Version:
2008-01-26
Author:
Nico Schlaefer

Field Summary
private static java.lang.String ADA_BOOST_10_M
          Identifier for the Ada Boost model (boosts a decision tree learner 10 times).
private static java.lang.String ADA_BOOST_100_M
          Identifier for the Ada Boost model (boosts a decision tree learner 100 times).
private static java.lang.String ADA_BOOST_L_M
          Identifier for the Ada Boost model (Logistic Regression version).
private static java.lang.String ADA_BOOST_N_M
          Identifier for the Ada Boost model (boosts a decision tree learner NUM_BOOSTS times).
private static java.lang.String[] ALL_FEATURES
          All feature identifiers.
private static java.lang.String[] ALL_MODELS
          All model identifiers.
private static java.lang.String ANSWER_TYPES_F
          Identifier for the answer type features.
private static java.lang.String BALANCED_WINNOW_M
          Identifier for the Balanced Winnow model.
private static edu.cmu.minorthird.classify.Classifier classifier
          Classifier for score normalization.
private static java.lang.String DECISION_TREE_M
          Identifier for the Decision Tree model.
private static java.lang.String EXTRACTORS_F
          Identifier for the extractor features.
private static java.lang.String KNN_M
          Identifier for the K-Nearest-Neighbor model.
private static java.lang.String KWAY_MIXTURE_M
          Identifier for the K-Way Mixture model.
private static java.lang.String MARGIN_PERCEPTRON_M
          Identifier for the Margin Perceptron model.
private static java.lang.String MAX_ENT_M
          Identifier for the Maximum Entropy model.
private static java.lang.String MAX_SCORE_F
          Identifier for the maximum score feature.
private static java.lang.String MEAN_SCORE_F
          Identifier for the mean score feature.
private static java.lang.String MIN_SCORE_F
          Identifier for the minimum score feature.
private static java.lang.String NAIVE_BAYES_M
          Identifier for the Naive Bayes model.
private static java.lang.String NEGATIVE_BINOMIAL_M
          Identifier for the Negative Binomial model.
private static java.lang.String NUM_ANSWERS_F
          Identifier for the number of answers feature.
private static int NUM_BOOSTS
          The N in ADA_BOOST_N_M.
private static int NUM_FOLDS
          Number of folds for cross validation.
private static java.lang.String SCORE_F
          Identifier for the score feature.
private static java.lang.String[] SELECTED_FEATURES
          Subset of the features used to train the classifier.
private static java.lang.String SELECTED_MODEL
          Model used for the classifier.
private static java.lang.String SVM_M
          Identifier for the SVM model.
private static java.lang.String VOTED_PERCEPTRON_M
          Identifier for the Voted Perceptron model.
 
Constructor Summary
ScoreNormalizationFilter(java.lang.String classifierFilename)
          Creates the filter and loads a serialized classifier from a file.
 
Method Summary
private static void addAnswerTypeFeatures(edu.cmu.minorthird.classify.MutableInstance instance, Result result)
          Adds the answer types of the question as features to the instance.
private static void addExtractorFeature(edu.cmu.minorthird.classify.MutableInstance instance, Result result)
          Adds the extractor used to obtain the answer candidate as a feature to the instance.
private static void addMaxScoreFeature(edu.cmu.minorthird.classify.MutableInstance instance, Result result, Result[] results)
          Adds the maximum score of all factoid answers from the same extractor as a feature to the instance.
private static void addMeanScoreFeature(edu.cmu.minorthird.classify.MutableInstance instance, Result result, Result[] results)
          Adds the mean score of all factoid answers from the same extractor as a feature to the instance.
private static void addMinScoreFeature(edu.cmu.minorthird.classify.MutableInstance instance, Result result, Result[] results)
          Adds the minimum score of all factoid answers from the same extractor as a feature to the instance.
private static void addNumAnswersFeature(edu.cmu.minorthird.classify.MutableInstance instance, Result result, Result[] results)
          Adds the number of factoid answers from the same extractor as a feature to the instance.
private static void addScoreFeature(edu.cmu.minorthird.classify.MutableInstance instance, Result result)
          Adds the score of the answer candidate as a feature to the instance.
private static void addSelectedFeatures(edu.cmu.minorthird.classify.MutableInstance instance, java.lang.String[] features, Result result, Result[] results)
          Adds the selected features to the instance.
 Result[] apply(Result[] results)
          Normalizes the scores of the factoid answers, using the features specified in SELECTED_FEATURES and the classifier specified in classifier.
private static edu.cmu.minorthird.classify.Dataset createDataset(java.lang.String[] features, java.lang.String serializedDir)
          Creates a training/evaluation set from serialized judged Result objects.
private static edu.cmu.minorthird.classify.Example createExample(java.lang.String[] features, Result result, Result[] results, java.lang.String qid)
          Creates a training/evaluation example from a judged answer candidate.
private static edu.cmu.minorthird.classify.Instance createInstance(java.lang.String[] features, Result result, Result[] results)
          Creates an instance for training/evaluation or classification from an answer candidate.
private static edu.cmu.minorthird.classify.Instance createInstance(java.lang.String[] features, Result result, Result[] results, java.lang.String qid)
          Creates an instance for training/evaluation or classification from an answer candidate, using the question ID as a subpopulation ID.
private static edu.cmu.minorthird.classify.ClassifierLearner createLearner(java.lang.String model)
          Creates a classifier learner for the given model.
private static java.lang.String createReport(java.lang.String[] dataSets, java.lang.String[] features, java.lang.String model, edu.cmu.minorthird.classify.experiments.Evaluation eval, long runTime)
          Builds a report comprising the selected parameters (data sets, features and model) and evaluation statistics.
static edu.cmu.minorthird.classify.experiments.Evaluation evaluate(java.lang.String serializedDir, java.lang.String[] features, java.lang.String model)
          Performs a cross-validation on the given data set for the given features and model.
static java.lang.String[][] evaluateAll(java.lang.String serializedDir, java.lang.String reportDir)
          Performs a cross-validation on the given data set for all combinations of features and models and writes a report for each evaluation.
static void loadClassifier(java.lang.String classifierFilename)
          Loads a serialized classifier for score normalization from a file.
static void main(java.lang.String[] args)
          Evaluates all combinations of features and models and trains a classifier using the best combination.
 Result[] preserveOrderAveraging(Result[] results)
          Calculates the average normalization factor for each extraction technique and normalizes the scores with this factor to ensure that the order suggested by the original scores is preserved.
 Result[] preserveOrderResorting(Result[] results)
          Reassigns the normalized scores for each extraction technique to ensure that the order suggested by the original scores is preserved.
 Result[] preserveOrderTop(Result[] results)
          Calculates the normalization factor of the top answer for each extraction technique and normalizes the scores with this factor to ensure that the order suggested by the original scores is preserved.
private static Result[] readSerializedResults(java.io.File input)
          Reads serialized results from a file.
static edu.cmu.minorthird.classify.Classifier train(java.lang.String serializedDir)
          Trains a classifier using the given training data, the features specified in SELECTED_FEATURES and the model specified in SELECTED_MODEL.
static edu.cmu.minorthird.classify.Classifier train(java.lang.String serializedDir, java.lang.String[] features, java.lang.String model)
          Trains a classifier using the given training data, features and model.
 
Methods inherited from class info.ephyra.answerselection.filters.Filter
apply
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SCORE_F

private static final java.lang.String SCORE_F
Identifier for the score feature.

See Also:
Constant Field Values

EXTRACTORS_F

private static final java.lang.String EXTRACTORS_F
Identifier for the extractor features.

See Also:
Constant Field Values

ANSWER_TYPES_F

private static final java.lang.String ANSWER_TYPES_F
Identifier for the answer type features.

See Also:
Constant Field Values

NUM_ANSWERS_F

private static final java.lang.String NUM_ANSWERS_F
Identifier for the number of answers feature.

See Also:
Constant Field Values

MEAN_SCORE_F

private static final java.lang.String MEAN_SCORE_F
Identifier for the mean score feature.

See Also:
Constant Field Values

MAX_SCORE_F

private static final java.lang.String MAX_SCORE_F
Identifier for the maximum score feature.

See Also:
Constant Field Values

MIN_SCORE_F

private static final java.lang.String MIN_SCORE_F
Identifier for the minimum score feature.

See Also:
Constant Field Values

ALL_FEATURES

private static final java.lang.String[] ALL_FEATURES
All feature identifiers.


SELECTED_FEATURES

private static final java.lang.String[] SELECTED_FEATURES
Subset of the features used to train the classifier.


ADA_BOOST_10_M

private static final java.lang.String ADA_BOOST_10_M
Identifier for the Ada Boost model (boosts a decision tree learner 10 times).

See Also:
Constant Field Values

ADA_BOOST_100_M

private static final java.lang.String ADA_BOOST_100_M
Identifier for the Ada Boost model (boosts a decision tree learner 100 times).

See Also:
Constant Field Values

NUM_BOOSTS

private static int NUM_BOOSTS
The N in ADA_BOOST_N_M.


ADA_BOOST_N_M

private static java.lang.String ADA_BOOST_N_M
Identifier for the Ada Boost model (boosts a decision tree learner NUM_BOOSTS times).


ADA_BOOST_L_M

private static final java.lang.String ADA_BOOST_L_M
Identifier for the Ada Boost model (Logistic Regression version).

See Also:
Constant Field Values

BALANCED_WINNOW_M

private static final java.lang.String BALANCED_WINNOW_M
Identifier for the Balanced Winnow model.

See Also:
Constant Field Values

DECISION_TREE_M

private static final java.lang.String DECISION_TREE_M
Identifier for the Decision Tree model.

See Also:
Constant Field Values

KNN_M

private static final java.lang.String KNN_M
Identifier for the K-Nearest-Neighbor model.

See Also:
Constant Field Values

KWAY_MIXTURE_M

private static final java.lang.String KWAY_MIXTURE_M
Identifier for the K-Way Mixture model.

See Also:
Constant Field Values

MARGIN_PERCEPTRON_M

private static final java.lang.String MARGIN_PERCEPTRON_M
Identifier for the Margin Perceptron model.

See Also:
Constant Field Values

MAX_ENT_M

private static final java.lang.String MAX_ENT_M
Identifier for the Maximum Entropy model.

See Also:
Constant Field Values

NAIVE_BAYES_M

private static final java.lang.String NAIVE_BAYES_M
Identifier for the Naive Bayes model.

See Also:
Constant Field Values

NEGATIVE_BINOMIAL_M

private static final java.lang.String NEGATIVE_BINOMIAL_M
Identifier for the Negative Binomial model.

See Also:
Constant Field Values

SVM_M

private static final java.lang.String SVM_M
Identifier for the SVM model.

See Also:
Constant Field Values

VOTED_PERCEPTRON_M

private static final java.lang.String VOTED_PERCEPTRON_M
Identifier for the Voted Perceptron model.

See Also:
Constant Field Values

ALL_MODELS

private static final java.lang.String[] ALL_MODELS
All model identifiers.


SELECTED_MODEL

private static final java.lang.String SELECTED_MODEL
Model used for the classifier.


NUM_FOLDS

private static final int NUM_FOLDS
Number of folds for cross validation.

See Also:
Constant Field Values

classifier

private static edu.cmu.minorthird.classify.Classifier classifier
Classifier for score normalization.

Constructor Detail

ScoreNormalizationFilter

public ScoreNormalizationFilter(java.lang.String classifierFilename)
Creates the filter and loads a serialized classifier from a file.

Parameters:
classifierFilename - filename of a serialized classifier
Method Detail

readSerializedResults

private static Result[] readSerializedResults(java.io.File input)
Reads serialized results from a file.

Parameters:
input - input file
Returns:
result objects

addScoreFeature

private static void addScoreFeature(edu.cmu.minorthird.classify.MutableInstance instance,
                                    Result result)
Adds the score of the answer candidate as a feature to the instance.


addExtractorFeature

private static void addExtractorFeature(edu.cmu.minorthird.classify.MutableInstance instance,
                                        Result result)
Adds the extractor used to obtain the answer candidate as a feature to the instance.


addAnswerTypeFeatures

private static void addAnswerTypeFeatures(edu.cmu.minorthird.classify.MutableInstance instance,
                                          Result result)
Adds the answer types of the question as features to the instance.


addNumAnswersFeature

private static void addNumAnswersFeature(edu.cmu.minorthird.classify.MutableInstance instance,
                                         Result result,
                                         Result[] results)
Adds the number of factoid answers from the same extractor as a feature to the instance.


addMeanScoreFeature

private static void addMeanScoreFeature(edu.cmu.minorthird.classify.MutableInstance instance,
                                        Result result,
                                        Result[] results)
Adds the mean score of all factoid answers from the same extractor as a feature to the instance.


addMaxScoreFeature

private static void addMaxScoreFeature(edu.cmu.minorthird.classify.MutableInstance instance,
                                       Result result,
                                       Result[] results)
Adds the maximum score of all factoid answers from the same extractor as a feature to the instance.


addMinScoreFeature

private static void addMinScoreFeature(edu.cmu.minorthird.classify.MutableInstance instance,
                                       Result result,
                                       Result[] results)
Adds the minimum score of all factoid answers from the same extractor as a feature to the instance.


addSelectedFeatures

private static void addSelectedFeatures(edu.cmu.minorthird.classify.MutableInstance instance,
                                        java.lang.String[] features,
                                        Result result,
                                        Result[] results)
Adds the selected features to the instance.


createInstance

private static edu.cmu.minorthird.classify.Instance createInstance(java.lang.String[] features,
                                                                   Result result,
                                                                   Result[] results)
Creates an instance for training/evaluation or classification from an answer candidate.

Parameters:
features - selected features
result - answer candidate
results - all answers to the question
Returns:
instance for training/evaluation or classification

createInstance

private static edu.cmu.minorthird.classify.Instance createInstance(java.lang.String[] features,
                                                                   Result result,
                                                                   Result[] results,
                                                                   java.lang.String qid)
Creates an instance for training/evaluation or classification from an answer candidate, using the question ID as a subpopulation ID.

Parameters:
features - selected features
result - answer candidate
results - all answers to the question
qid - question ID
Returns:
instance for training/evaluation or classification

createExample

private static edu.cmu.minorthird.classify.Example createExample(java.lang.String[] features,
                                                                 Result result,
                                                                 Result[] results,
                                                                 java.lang.String qid)
Creates a training/evaluation example from a judged answer candidate.

Parameters:
features - selected features
result - judged answer candidate
results - all answers to the question
qid - question ID
Returns:
training/evaluation example

createDataset

private static edu.cmu.minorthird.classify.Dataset createDataset(java.lang.String[] features,
                                                                 java.lang.String serializedDir)
Creates a training/evaluation set from serialized judged Result objects.

Parameters:
features - selected features
serializedDir - directory containing serialized results
Returns:
training/evaluation set

createLearner

private static edu.cmu.minorthird.classify.ClassifierLearner createLearner(java.lang.String model)
Creates a classifier learner for the given model.

Parameters:
model - selected model
Returns:
classifier learner

createReport

private static java.lang.String createReport(java.lang.String[] dataSets,
                                             java.lang.String[] features,
                                             java.lang.String model,
                                             edu.cmu.minorthird.classify.experiments.Evaluation eval,
                                             long runTime)
Builds a report comprising the selected parameters (data sets, features and model) and evaluation statistics.

Parameters:
dataSets - used data sets
features - selected features
model - selected model
eval - evaluation statistics
runTime - run time of the evaluation
Returns:
report

train

public static edu.cmu.minorthird.classify.Classifier train(java.lang.String serializedDir)
Trains a classifier using the given training data, the features specified in SELECTED_FEATURES and the model specified in SELECTED_MODEL.

Parameters:
serializedDir - directory containing serialized results
Returns:
trained classifier

train

public static edu.cmu.minorthird.classify.Classifier train(java.lang.String serializedDir,
                                                           java.lang.String[] features,
                                                           java.lang.String model)
Trains a classifier using the given training data, features and model.

Parameters:
serializedDir - directory containing serialized results
features - selected features
model - selected model
Returns:
trained classifier

evaluate

public static edu.cmu.minorthird.classify.experiments.Evaluation evaluate(java.lang.String serializedDir,
                                                                          java.lang.String[] features,
                                                                          java.lang.String model)
Performs a cross-validation on the given data set for the given features and model.

Parameters:
serializedDir - directory containing serialized results
features - selected features
model - selected model
Returns:
evaluation statistics

evaluateAll

public static java.lang.String[][] evaluateAll(java.lang.String serializedDir,
                                               java.lang.String reportDir)
Performs a cross-validation on the given data set for all combinations of features and models and writes a report for each evaluation. Determines the best combination according to the F1 measure.

Parameters:
serializedDir - directory containing serialized results
reportDir - output directory for evaluation reports
Returns:
best combination of features and model

main

public static void main(java.lang.String[] args)
Evaluates all combinations of features and models and trains a classifier using the best combination.

Parameters:
args - {directory containing serialized results, output directory for evaluation reports and classifier}

loadClassifier

public static void loadClassifier(java.lang.String classifierFilename)
Loads a serialized classifier for score normalization from a file.

Parameters:
classifierFilename - filename of a serialized classifier

preserveOrderResorting

public Result[] preserveOrderResorting(Result[] results)
Reassigns the normalized scores for each extraction technique to ensure that the order suggested by the original scores is preserved.

Parameters:
results - array of Result objects
Returns:
array of Result objects with new normalized scores

preserveOrderAveraging

public Result[] preserveOrderAveraging(Result[] results)
Calculates the average normalization factor for each extraction technique and normalizes the scores with this factor to ensure that the order suggested by the original scores is preserved. The factor is adjusted to avoid normalized scores larger 1.

Parameters:
results - array of Result objects
Returns:
array of Result objects with new normalized scores

preserveOrderTop

public Result[] preserveOrderTop(Result[] results)
Calculates the normalization factor of the top answer for each extraction technique and normalizes the scores with this factor to ensure that the order suggested by the original scores is preserved.

Parameters:
results - array of Result objects
Returns:
array of Result objects with new normalized scores

apply

public Result[] apply(Result[] results)
Normalizes the scores of the factoid answers, using the features specified in SELECTED_FEATURES and the classifier specified in classifier.

Overrides:
apply in class Filter
Parameters:
results - array of Result objects
Returns:
array of Result objects with normalized scores