EDU.cmu.cs.coral.learning
Class i_PriQLearner_id

java.lang.Object
  |
  +--EDU.gatech.cc.is.learning.i_ReinforcementLearner_id
        |
        +--EDU.cmu.cs.coral.learning.i_PriQLearner_id

public class i_PriQLearner_id
extends i_ReinforcementLearner_id
implements java.lang.Cloneable, java.io.Serializable

An object that learns to select from several actions based on a reward. Uses the Prioritized Sweeping technique of Moore.

The module will learn to select a discrete output based on state and a continuous reinforcement input. The "i"s in front of and behind the name imply that this class takes integers as input and output. The "d" indicates a double for the reinforcement input (i.e. a continuous value).

Copyright (c)2000 Tucker Balch

See Also:
Serialized Form

Inner Class Summary
protected  class i_PriQLearner_id.state
           
 
Field Summary
protected  PriorityQueue changeQueue
           
protected  int criteria
           
static int DISCOUNTED
          Used to indicate the learner uses discounted rewards.
protected  int numactions
           
protected  i_PriQLearner_id.state[] states
           
 
Fields inherited from class EDU.gatech.cc.is.learning.i_ReinforcementLearner_id
logging, numactions, numstates, policyfilename
 
Constructor Summary
i_PriQLearner_id(int numstatesin, int numactionsin)
          Instantiate a Q learner using default parameters.
i_PriQLearner_id(int numstatesin, int numactionsin, int criteriain)
          Instantiate a Q learner using default parameters.
i_PriQLearner_id(int numstatesin, int numactionsin, int criteriain, long seedin)
          Instantiate a Prioritized Sweeping learner using default parameters.
 
Method Summary
 void endTrial(double Vn, double rn)
          Called when the current trial ends.
 double getAvgReward()
          Report the average reward per step in the trial.
 int getPolicyChanges()
          Report the number of policy changes in the trial.
 int getQueries()
          Report the number of queries in the trial.
 int initTrial(int s)
          Called to initialize for a new trial.
 int query(int yn, double rn)
          Select an output based on the state and reward.
 void readPolicy()
          Read the policy from a file.
 void savePolicy()
          Write the policy to a file.
 void saveProfile(java.lang.String profile_filename)
          Write the policy profile to a file.
 void setGamma(double g)
          Set gamma for the Q-learner.
 void setRandomRate(double r)
          Set the random rate for the Q-learner.
 void setRandomRateDecay(double r)
          Set the random decay for the Q-learner.
 java.lang.String toString()
          Generate a String that describes the current state of the learner.
protected  void updateState(i_PriQLearner_id.state st)
           
 
Methods inherited from class EDU.gatech.cc.is.learning.i_ReinforcementLearner_id
log, loggingOff, loggingOn, loggingOn, setPolicyFileName
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DISCOUNTED

public static final int DISCOUNTED
Used to indicate the learner uses discounted rewards.

criteria

protected int criteria

states

protected i_PriQLearner_id.state[] states

changeQueue

protected PriorityQueue changeQueue

numactions

protected int numactions
Constructor Detail

i_PriQLearner_id

public i_PriQLearner_id(int numstatesin,
                        int numactionsin,
                        int criteriain,
                        long seedin)
Instantiate a Prioritized Sweeping learner using default parameters. Parameters may be adjusted using accessor methods.
Parameters:
numstates - int, the number of states the system could be in.
numactions - int, the number of actions or outputs to select from.
criteria - int, should be DISCOUNTED or AVERAGE.
seed - long, the seed.

i_PriQLearner_id

public i_PriQLearner_id(int numstatesin,
                        int numactionsin,
                        int criteriain)
Instantiate a Q learner using default parameters. This version assumes you will use a seed of 0. Parameters may be adjusted using accessor methods.
Parameters:
numstates - int, the number of states the system could be in.
numactions - int, the number of actions or outputs to select from.
criteria - int, should be DISCOUNTED or AVERAGE.

i_PriQLearner_id

public i_PriQLearner_id(int numstatesin,
                        int numactionsin)
Instantiate a Q learner using default parameters. This version assumes you will use discounted rewards. Parameters may be adjusted using accessor methods.
Parameters:
numstates - int, the number of states the system could be in.
numactions - int, the number of actions or outputs to select from.
Method Detail

setGamma

public void setGamma(double g)
Set gamma for the Q-learner. This is the discount rate, 0.8 is typical value. It should be between 0 and 1.
Parameters:
g - double, the new value for gamma (0 < g < 1).

setRandomRate

public void setRandomRate(double r)
Set the random rate for the Q-learner. This reflects how frequently it picks a random action. Should be between 0 and 1.
Parameters:
r - double, the new value for random rate (0 < r < 1).

setRandomRateDecay

public void setRandomRateDecay(double r)
Set the random decay for the Q-learner. This reflects how quickly the rate of chosing random actions decays. 1 would never decay, 0 would cause it to immediately quit chosing random values. Should be between 0 and 1.
Parameters:
r - double, the new value for randomdecay (0 < r < 1).

toString

public java.lang.String toString()
Generate a String that describes the current state of the learner.
Overrides:
toString in class i_ReinforcementLearner_id
Returns:
a String describing the learner.

updateState

protected void updateState(i_PriQLearner_id.state st)

query

public int query(int yn,
                 double rn)
Select an output based on the state and reward.
Overrides:
query in class i_ReinforcementLearner_id
Parameters:
statein - int, the current state.
rewardin - double, reward for the last output, positive numbers are "good."

endTrial

public void endTrial(double Vn,
                     double rn)
Called when the current trial ends.
Overrides:
endTrial in class i_ReinforcementLearner_id
Parameters:
Vn - double, the value of the absorbing state.
reward - double, the reward for the last output.

initTrial

public int initTrial(int s)
Called to initialize for a new trial.
Overrides:
initTrial in class i_ReinforcementLearner_id
Tags copied from class: i_ReinforcementLearner_id
Parameters:
statein - int, the current state.

getAvgReward

public double getAvgReward()
Report the average reward per step in the trial.
Overrides:
getAvgReward in class i_ReinforcementLearner_id
Returns:
the average.

getQueries

public int getQueries()
Report the number of queries in the trial.
Overrides:
getQueries in class i_ReinforcementLearner_id
Returns:
the total.

getPolicyChanges

public int getPolicyChanges()
Report the number of policy changes in the trial.
Overrides:
getPolicyChanges in class i_ReinforcementLearner_id
Returns:
the total.

readPolicy

public void readPolicy()
                throws java.io.IOException
Read the policy from a file.
Overrides:
readPolicy in class i_ReinforcementLearner_id
Parameters:
filename - String, the name of the file to read from.

savePolicy

public void savePolicy()
                throws java.io.IOException
Write the policy to a file.
Overrides:
savePolicy in class i_ReinforcementLearner_id
Parameters:
filename - String, the name of the file to write to.

saveProfile

public void saveProfile(java.lang.String profile_filename)
                 throws java.io.IOException
Write the policy profile to a file.
Parameters:
filename - String, the name of the file to write to.