GEGELATI
Public Member Functions | Protected Attributes | List of all members
Learn::LearningAgent Class Reference

Class used to control the learning steps of a TPGGraph within a given LearningEnvironment. More...

#include <learningAgent.h>

Inheritance diagram for Learn::LearningAgent:
Learn::ParallelLearningAgent Learn::AdversarialLearningAgent Learn::ClassificationLearningAgent< BaseLearningAgent >

Public Member Functions

 LearningAgent (LearningEnvironment &le, const Instructions::Set &iSet, const LearningParameters &p, const TPG::TPGFactory &factory=TPG::TPGFactory())
 Constructor for LearningAgent. More...
 
virtual ~LearningAgent ()=default
 Default destructor for polymorphism.
 
std::shared_ptr< TPG::TPGGraphgetTPGGraph ()
 Getter for the TPGGraph built by the LearningAgent. More...
 
const ArchivegetArchive () const
 Getter for the Archive filled by the LearningAgent. More...
 
Mutator::RNGgetRNG ()
 Getter for the RNG used by the LearningAgent. More...
 
void addLogger (Log::LALogger &logger)
 Adds a LALogger to the loggers vector. More...
 
virtual std::shared_ptr< EvaluationResultevaluateJob (TPG::TPGExecutionEngine &tee, const Job &job, uint64_t generationNumber, LearningMode mode, LearningEnvironment &le) const
 Evaluates policy starting from the given root. More...
 
bool isRootEvalSkipped (const TPG::TPGVertex &root, std::shared_ptr< Learn::EvaluationResult > &previousResult) const
 Method detecting whether a root should be evaluated again. More...
 
virtual std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > evaluateAllRoots (uint64_t generationNumber, LearningMode mode)
 Evaluate all root TPGVertex of the TPGGraph. More...
 
virtual void trainOneGeneration (uint64_t generationNumber)
 Train the TPGGraph for one generation. More...
 
virtual void decimateWorstRoots (std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results)
 Removes from the TPGGraph the root TPGVertex with the worst results. More...
 
uint64_t train (volatile bool &altTraining, bool printProgressBar)
 Train the TPGGraph for a given number of generation. More...
 
void updateEvaluationRecords (const std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results)
 Update the bestRoot and resultsPerRoot attributes. More...
 
void forgetPreviousResults ()
 This method resets the previous registered scores per root. More...
 
const std::pair< const TPG::TPGVertex *, std::shared_ptr< EvaluationResult > > & getBestRoot () const
 Get the best root TPG::Vertex encountered since the last init. More...
 
void keepBestPolicy ()
 This method keeps only the bestRoot policy in the TPGGraph. More...
 
virtual std::shared_ptr< Learn::JobmakeJob (int num, Learn::LearningMode mode, int idx=0, TPG::TPGGraph *tpgGraph=nullptr)
 Takes a given root index and creates a job containing it. Useful for example in adversarial mode where a job could contain a match of several roots. More...
 
virtual std::queue< std::shared_ptr< Learn::Job > > makeJobs (Learn::LearningMode mode, TPG::TPGGraph *tpgGraph=nullptr)
 Puts all roots into jobs to be able to use them in simulation later. More...
 
void init (uint64_t seed=0)
 Initialize the LearningAgent. More...
 

Protected Attributes

LearningEnvironmentlearningEnvironment
 LearningEnvironment with which the LearningAgent will interact.
 
Environment env
 Environment for executing Program of the LearningAgent.
 
Archive archive
 Archive used during the training process.
 
LearningParameters params
 Parameters for the learning process.
 
std::shared_ptr< TPG::TPGGraphtpg
 TPGGraph built during the learning process.
 
std::pair< const TPG::TPGVertex *, std::shared_ptr< EvaluationResult > > bestRoot {nullptr, nullptr}
 
std::map< const TPG::TPGVertex *, std::shared_ptr< EvaluationResult > > resultsPerRoot
 Map associating root TPG::TPGVertex to their EvaluationResult. More...
 
Mutator::RNG rng
 Random Number Generator for this Learning Agent.
 
uint64_t maxNbThreads = 1
 Control the maximum number of threads when running in parallel.
 
std::vector< std::reference_wrapper< Log::LALogger > > loggers
 Set of LALogger called throughout the training process. More...
 

Detailed Description

Class used to control the learning steps of a TPGGraph within a given LearningEnvironment.

Constructor & Destructor Documentation

◆ LearningAgent()

Learn::LearningAgent::LearningAgent ( LearningEnvironment le,
const Instructions::Set iSet,
const LearningParameters p,
const TPG::TPGFactory factory = TPG::TPGFactory() 
)
inline

Constructor for LearningAgent.

Parameters
[in]leThe LearningEnvironment for the TPG.
[in]iSetSet of Instruction used to compose Programs in the learning process.
[in]pThe LearningParameters for the LearningAgent.
[in]factoryThe TPGFactory used to create the TPGGraph. A default TPGFactory is used if none is provided.

Member Function Documentation

◆ addLogger()

void Learn::LearningAgent::addLogger ( Log::LALogger logger)

Adds a LALogger to the loggers vector.

Adds a logger to the loggers vector, so that it will be called in addition of the others at some determined moments. This enables to have several loggers that log different things on different outputs simultaneously.

Parameters
[in]loggerThe logger that will be added to the vector.

◆ decimateWorstRoots()

void Learn::LearningAgent::decimateWorstRoots ( std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &  results)
virtual

Removes from the TPGGraph the root TPGVertex with the worst results.

The given multimap is updated by removing entries corresponding to decimated vertices.

The resultsPerRoot attribute is updated to remove results associated to removed vertices.

Parameters
[in,out]resultsa multimap containing root TPGVertex associated to their score during an evaluation.

Reimplemented in Learn::ClassificationLearningAgent< BaseLearningAgent >.

◆ evaluateAllRoots()

std::multimap< std::shared_ptr< Learn::EvaluationResult >, const TPG::TPGVertex * > Learn::LearningAgent::evaluateAllRoots ( uint64_t  generationNumber,
Learn::LearningMode  mode 
)
virtual

Evaluate all root TPGVertex of the TPGGraph.

This method calls the evaluateJob method for every root TPGVertex of the TPGGraph. The method returns a sorted map associating each root vertex to its average score, in ascending order or score.

Parameters
[in]generationNumberthe integer number of the current generation.
[in]modethe LearningMode to use during the policy evaluation.

Reimplemented in Learn::AdversarialLearningAgent, and Learn::ParallelLearningAgent.

◆ evaluateJob()

std::shared_ptr< Learn::EvaluationResult > Learn::LearningAgent::evaluateJob ( TPG::TPGExecutionEngine tee,
const Job job,
uint64_t  generationNumber,
Learn::LearningMode  mode,
LearningEnvironment le 
) const
virtual

Evaluates policy starting from the given root.

The policy, that is, the TPGGraph execution starting from the given TPGVertex is evaluated nbIteration times. The generationNumber is combined with the current iteration number to generate a set of seeds for evaluating the policy.

The method is const to enable potential parallel calls to it.

Parameters
[in]teeThe TPGExecutionEngine to use.
[in]jobThe job containing the root and archiveSeed for the evaluation.
[in]generationNumberthe integer number of the current generation.
[in]modethe LearningMode to use during the policy evaluation.
[in]leReference to the LearningEnvironment to use during the policy evaluation (may be different from the attribute of the class in child LearningAgentClass).
Returns
a std::shared_ptr to the EvaluationResult for the root. If this root was already evaluated more times then the limit in params.maxNbEvaluationPerPolicy, then the EvaluationResult from the resultsPerRoot map is returned, else the EvaluationResult of the current generation is returned, already combined with the resultsPerRoot for this root (if any).

Reimplemented in Learn::AdversarialLearningAgent, and Learn::ClassificationLearningAgent< BaseLearningAgent >.

◆ forgetPreviousResults()

void Learn::LearningAgent::forgetPreviousResults ( )

This method resets the previous registered scores per root.

Resets resultsPerRoot so that, in the next training, the current roots will be considered as if they had never been tested. To use for example when there is a scoring policy change.

◆ getArchive()

const Archive & Learn::LearningAgent::getArchive ( ) const

Getter for the Archive filled by the LearningAgent.

Returns
a const reference to the Archive.

◆ getBestRoot()

const std::pair< const TPG::TPGVertex *, std::shared_ptr< Learn::EvaluationResult > > & Learn::LearningAgent::getBestRoot ( ) const

Get the best root TPG::Vertex encountered since the last init.

The returned pointers may be nullptr if no generation was trained since the last init.

Returns
a reference to the bestRoot attribute.

◆ getRNG()

Mutator::RNG & Learn::LearningAgent::getRNG ( )

Getter for the RNG used by the LearningAgent.

Returns
Get a reference to the RNG.

◆ getTPGGraph()

std::shared_ptr< TPG::TPGGraph > Learn::LearningAgent::getTPGGraph ( )

Getter for the TPGGraph built by the LearningAgent.

Returns
Get a shared_pointer to the TPGGraph.

Copyright or © or Copr. IETR/INSA - Rennes (2019 - 2021) :

Karol Desnos kdesn.nosp@m.os@i.nosp@m.nsa-r.nosp@m.enne.nosp@m.s.fr (2019 - 2021) Nicolas Sourbier nsour.nosp@m.bie@.nosp@m.insa-.nosp@m.renn.nosp@m.es.fr (2019 - 2020) Pierre-Yves Le Rolland-Raumer plero.nosp@m.lla@.nosp@m.insa-.nosp@m.renn.nosp@m.es.fr (2020)

GEGELATI is an open-source reinforcement learning framework for training artificial intelligence based on Tangled Program Graphs (TPGs).

This software is governed by the CeCILL-C license under French law and abiding by the rules of distribution of free software. You can use, modify and/ or redistribute the software under the terms of the CeCILL-C license as circulated by CEA, CNRS and INRIA at the following URL "http://www.cecill.info".

As a counterpart to the access to the source code and rights to copy, modify and redistribute granted by the license, users are provided only with a limited warranty and the software's author, the holder of the economic rights, and the successive licensors have only limited liability.

In this respect, the user's attention is drawn to the risks associated with loading, using, modifying and/or developing or reproducing the software by the user in light of its specific status of free software, that may mean that it is complicated to manipulate, and that also therefore means that it is reserved for developers and experienced professionals having in-depth computer knowledge. Users are therefore encouraged to load and test the software's suitability as regards their requirements in conditions enabling the security of their systems and/or data to be ensured and, more generally, to use and operate it in the same conditions as regards security.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-C license and that you accept its terms.

◆ init()

void Learn::LearningAgent::init ( uint64_t  seed = 0)

Initialize the LearningAgent.

Calls the TPGMutator::initRandomTPG function. Initialize the Mutator::RNG with the given seed. Clears the Archive.

Parameters
[in]seedthe seed given to the TPGMutator.

◆ isRootEvalSkipped()

bool Learn::LearningAgent::isRootEvalSkipped ( const TPG::TPGVertex root,
std::shared_ptr< Learn::EvaluationResult > &  previousResult 
) const

Method detecting whether a root should be evaluated again.

Using the resultsPerRoot map and the params.maxNbEvaluationPerPolicy, this method checks whether a root should be evaluated again, or if sufficient evaluations were already performed.

Parameters
[in]rootThe root TPGVertex whose number of evaluation is checked.
[out]previousResultthe std::shared_ptr to the EvaluationResult of the root from the resultsPerRoot if any.
Returns
true if the root has been evaluated enough times, false otherwise.

◆ keepBestPolicy()

void Learn::LearningAgent::keepBestPolicy ( )

This method keeps only the bestRoot policy in the TPGGraph.

If the TPGVertex referenced in the bestRoot attribute is no longer a TPGVertex of the TPGGraph, nothing happens.

◆ makeJob()

std::shared_ptr< Learn::Job > Learn::LearningAgent::makeJob ( int  num,
Learn::LearningMode  mode,
int  idx = 0,
TPG::TPGGraph tpgGraph = nullptr 
)
virtual

Takes a given root index and creates a job containing it. Useful for example in adversarial mode where a job could contain a match of several roots.

Parameters
[in]numThe index of the root we want to put in a job.
[in]modethe mode of the training, determining for example if we generate values that we only need for training.
[in]idxThe index of the job, can be used to organize a map for example.
[in]tpgGraphThe TPG graph from which we will take the root.
Returns
A job representing the root.

◆ makeJobs()

std::queue< std::shared_ptr< Learn::Job > > Learn::LearningAgent::makeJobs ( Learn::LearningMode  mode,
TPG::TPGGraph tpgGraph = nullptr 
)
virtual

Puts all roots into jobs to be able to use them in simulation later.

Parameters
[in]modethe mode of the training, determining for example if we generate values that we only need for training.
[in]tpgGraphThe TPG graph from which we will take the roots.
Returns
A queue containing pointers of the newly created jobs.

Reimplemented in Learn::AdversarialLearningAgent.

◆ train()

uint64_t Learn::LearningAgent::train ( volatile bool &  altTraining,
bool  printProgressBar 
)

Train the TPGGraph for a given number of generation.

The method trains the TPGGraph for a given number of generation, unless the referenced boolean value becomes false (evaluated at each generation). Optionally, a simple progress bar can be printed within the terminal. The TPGGraph is NOT (re)initialized before starting the training.

Parameters
[in]altTraininga reference to a boolean value that can be used to halt the training process before its completion.
[in]printProgressBarselect whether a progress bar will be printed in the console.
Returns
the number of completed generations.

◆ trainOneGeneration()

void Learn::LearningAgent::trainOneGeneration ( uint64_t  generationNumber)
virtual

Train the TPGGraph for one generation.

Training for one generation includes:

  • Populating the TPGGraph according to given MutationParameters.
  • Evaluating all roots of the TPGGraph. (call to evaluateAllRoots)
  • Removing from the TPGGraph the worst performing root TPGVertex.
Parameters
[in]generationNumberthe integer number of the current generation.

◆ updateEvaluationRecords()

void Learn::LearningAgent::updateEvaluationRecords ( const std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &  results)

Update the bestRoot and resultsPerRoot attributes.

This method updates the value of the bestRoot attribute with the TPG::Vertex given as an argument in the following cases:

It should be noted that the last case alone (i.e. without validating the first one) indicates a great variability of the evaluation process as it means that a vertex currently known as the best root from previous generations, with an EvaluationResult never beaten, was removed from the graph in a following generation, beaten by root vertex with lower scores than the current record.

Parameters
[in]resultsMap from the evaluateAllRoots method.

Member Data Documentation

◆ bestRoot

std::pair<const TPG::TPGVertex*, std::shared_ptr<EvaluationResult> > Learn::LearningAgent::bestRoot {nullptr, nullptr}
protected

Pointer to the best root encountered during training, together with its EvaluationResult.

◆ loggers

std::vector<std::reference_wrapper<Log::LALogger> > Learn::LearningAgent::loggers
protected

Set of LALogger called throughout the training process.

Each LALogger of this set will be invoked at pre-defined steps of the training process. Dedicated method in the LALogger are used for each step.

◆ resultsPerRoot

std::map<const TPG::TPGVertex*, std::shared_ptr<EvaluationResult> > Learn::LearningAgent::resultsPerRoot
protected

Map associating root TPG::TPGVertex to their EvaluationResult.

If a given TPGVertex is evaluated several times, its EvaluationResult may be updated with the newer results.

Whenever a TPGVertex is removed from the TPGGraph, its EvaluationResult should also be removed from this map.

This map may be used to avoid reevaluating a root that was already evaluated more than LearningParameters::maxNbEvaluationPerPolicy times.


The documentation for this class was generated from the following files: