: Class i_QLearner

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: INNER | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

EDU.gatech.cc.is.learning
Class i_QLearner_id

java.lang.Object
  |
  +--EDU.gatech.cc.is.learning.i_ReinforcementLearner_id
        |
        +--EDU.gatech.cc.is.learning.i_QLearner_id

public class i_QLearner_id
extends i_ReinforcementLearner_id
implements java.lang.Cloneable, java.io.Serializable

An object that learns to select from several actions based on a reward. Uses the Q-learning method as defined by Watkins.

The module will learn to select a discrete output based on state and a continuous reinforcement input. The "i"s in front of and behind the name imply that this class takes integers as input and output. The "d" indicates a double for the reinforcement input (i.e. a continuous value).

See Also:: Serialized Form

Field Summary

static int AVERAGE
          Used to indicate the learner uses average rewards.

static int DISCOUNTED
          Used to indicate the learner uses discounted rewards.

Fields inherited from class EDU.gatech.cc.is.learning.i_ReinforcementLearner_id

logging, numactions, numstates, policyfilename

Constructor Summary

i_QLearner_id(int numstatesin, int numactionsin)
          Instantiate a Q learner using default parameters.

i_QLearner_id(int numstatesin, int numactionsin, int criteriain)
          Instantiate a Q learner using default parameters.

i_QLearner_id(int numstatesin, int numactionsin, int criteriain, long seedin)
          Instantiate a Q learner using default parameters.

Method Summary

void endTrial(double Vn, double rn)
          Called when the current trial ends.

double getAvgReward()
          Report the average reward per step in the trial.

int getPolicyChanges()
          Report the number of policy changes in the trial.

int getQueries()
          Report the number of queries in the trial.

int initTrial(int s)
          Called to initialize for a new trial.

int query(int yn, double rn)
          Select an output based on the state and reward.

void readPolicy()
          Read the policy from a file.

void savePolicy()
          Write the policy to a file.

void saveProfile(java.lang.String profile_filename)
          Write the policy profile to a file.

void setAlpha(double a)
          Set alpha for the Q-learner.

void setGamma(double g)
          Set gamma for the Q-learner.

void setRandomRate(double r)
          Set the random rate for the Q-learner.

void setRandomRateDecay(double r)
          Set the random decay for the Q-learner.

java.lang.String toString()
          Generate a String that describes the current state of the learner.

Methods inherited from class EDU.gatech.cc.is.learning.i_ReinforcementLearner_id

log, loggingOff, loggingOn, loggingOn, setPolicyFileName

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail

AVERAGE

public static final int AVERAGE

Used to indicate the learner uses average rewards.

DISCOUNTED

public static final int DISCOUNTED

Used to indicate the learner uses discounted rewards.

Constructor Detail

i_QLearner_id

public i_QLearner_id(int numstatesin,
                     int numactionsin,
                     int criteriain,
                     long seedin)

Instantiate a Q learner using default parameters. Parameters may be adjusted using accessor methods.

Parameters:: numstates - int, the number of states the system could be in.; numactions - int, the number of actions or outputs to select from.; criteria - int, should be DISCOUNTED or AVERAGE.; seed - long, the seed.

i_QLearner_id

public i_QLearner_id(int numstatesin,
                     int numactionsin,
                     int criteriain)

Instantiate a Q learner using default parameters. This version assumes you will use a seed of 0. Parameters may be adjusted using accessor methods.

Parameters:: numstates - int, the number of states the system could be in.; numactions - int, the number of actions or outputs to select from.; criteria - int, should be DISCOUNTED or AVERAGE.

i_QLearner_id

public i_QLearner_id(int numstatesin,
                     int numactionsin)

Instantiate a Q learner using default parameters. This version assumes you will use discounted rewards. Parameters may be adjusted using accessor methods.

Parameters:: numstates - int, the number of states the system could be in.; numactions - int, the number of actions or outputs to select from.

Method Detail

setGamma

public void setGamma(double g)

Set gamma for the Q-learner. This is the discount rate, 0.8 is typical value. It should be between 0 and 1.

Parameters:: g - double, the new value for gamma (0 < g < 1).

setAlpha

public void setAlpha(double a)

Set alpha for the Q-learner. This reflects how quickly it should learn. Alpha should be between 0 and 1.

Parameters:: a - double, the new value for alpha (0 < a < 1).

setRandomRate

public void setRandomRate(double r)

Set the random rate for the Q-learner. This reflects how frequently it picks a random action. Should be between 0 and 1.

Parameters:: r - double, the new value for random rate (0 < r < 1).

setRandomRateDecay

public void setRandomRateDecay(double r)

Set the random decay for the Q-learner. This reflects how quickly the rate of chosing random actions decays. 1 would never decay, 0 would cause it to immediately quit chosing random values. Should be between 0 and 1.

Parameters:: r - double, the new value for randomdecay (0 < r < 1).

toString

public java.lang.String toString()

Generate a String that describes the current state of the learner.

Overrides:: toString in class i_ReinforcementLearner_id

Returns:: a String describing the learner.

query

public int query(int yn,
                 double rn)

Select an output based on the state and reward.

Overrides:: query in class i_ReinforcementLearner_id

Parameters:: statein - int, the current state.; rewardin - double, reward for the last output, positive numbers are "good."

endTrial

public void endTrial(double Vn,
                     double rn)

Called when the current trial ends.

Overrides:: endTrial in class i_ReinforcementLearner_id

Parameters:: Vn - double, the value of the absorbing state.; reward - double, the reward for the last output.

initTrial

public int initTrial(int s)

Called to initialize for a new trial.

Overrides:: initTrial in class i_ReinforcementLearner_id

Tags copied from class: i_ReinforcementLearner_id

Parameters:: statein - int, the current state.

getAvgReward

public double getAvgReward()

Report the average reward per step in the trial.

Overrides:: getAvgReward in class i_ReinforcementLearner_id

Returns:: the average.

getQueries

public int getQueries()

Report the number of queries in the trial.

Overrides:: getQueries in class i_ReinforcementLearner_id

Returns:: the total.

getPolicyChanges

public int getPolicyChanges()

Report the number of policy changes in the trial.

Overrides:: getPolicyChanges in class i_ReinforcementLearner_id

Returns:: the total.

readPolicy

public void readPolicy()
                throws java.io.IOException

Read the policy from a file.

Overrides:: readPolicy in class i_ReinforcementLearner_id

Parameters:: filename - String, the name of the file to read from.

savePolicy

public void savePolicy()
                throws java.io.IOException

Write the policy to a file.

Overrides:: savePolicy in class i_ReinforcementLearner_id

Parameters:: filename - String, the name of the file to write to.

saveProfile

public void saveProfile(java.lang.String profile_filename)
                 throws java.io.IOException

Write the policy profile to a file.

Parameters:: filename - String, the name of the file to write to.

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: INNER | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Field Summary
`static int`	`AVERAGE` Used to indicate the learner uses average rewards.
`static int`	`DISCOUNTED` Used to indicate the learner uses discounted rewards.

Constructor Summary
`i_QLearner_id(int numstatesin, int numactionsin)` Instantiate a Q learner using default parameters.
`i_QLearner_id(int numstatesin, int numactionsin, int criteriain)` Instantiate a Q learner using default parameters.
`i_QLearner_id(int numstatesin, int numactionsin, int criteriain, long seedin)` Instantiate a Q learner using default parameters.

Method Summary
`void`	`endTrial(double Vn, double rn)` Called when the current trial ends.
`double`	`getAvgReward()` Report the average reward per step in the trial.
`int`	`getPolicyChanges()` Report the number of policy changes in the trial.
`int`	`getQueries()` Report the number of queries in the trial.
`int`	`initTrial(int s)` Called to initialize for a new trial.
`int`	`query(int yn, double rn)` Select an output based on the state and reward.
`void`	`readPolicy()` Read the policy from a file.
`void`	`savePolicy()` Write the policy to a file.
`void`	`saveProfile(java.lang.String profile_filename)` Write the policy profile to a file.
`void`	`setAlpha(double a)` Set alpha for the Q-learner.
`void`	`setGamma(double g)` Set gamma for the Q-learner.
`void`	`setRandomRate(double r)` Set the random rate for the Q-learner.
`void`	`setRandomRateDecay(double r)` Set the random decay for the Q-learner.
`java.lang.String`	`toString()` Generate a String that describes the current state of the learner.

EDU.gatech.cc.is.learning Class i_QLearner_id

AVERAGE

DISCOUNTED

i_QLearner_id

i_QLearner_id

i_QLearner_id

setGamma

setAlpha

setRandomRate

setRandomRateDecay

toString

query

endTrial

initTrial

getAvgReward

getQueries

getPolicyChanges

readPolicy

savePolicy

saveProfile

EDU.gatech.cc.is.learning
Class i_QLearner_id