GEGELATI
Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
Learn::AdversarialLearningAgent Class Reference

Class used to control the learning steps of a TPGGraph within a given LearningEnvironment, with a support of adversarial allowing multi-agent simulations. To have several agents per evaluation, we use a job object embedding some TPG roots. More...

#include <adversarialLearningAgent.h>

Inheritance diagram for Learn::AdversarialLearningAgent:
Learn::ParallelLearningAgent Learn::LearningAgent

Public Member Functions

 AdversarialLearningAgent (LearningEnvironment &le, const Instructions::Set &iSet, const LearningParameters &p, size_t agentsPerEval=2, const TPG::TPGFactory &factory=TPG::TPGFactory())
 Constructor for AdversarialLearningAgent. More...
 
std::multimap< std::shared_ptr< Learn::EvaluationResult >, const TPG::TPGVertex * > evaluateAllRoots (uint64_t generationNumber, Learn::LearningMode mode) override
 Evaluate all root TPGVertex of the TPGGraph. More...
 
virtual std::shared_ptr< EvaluationResultevaluateJob (TPG::TPGExecutionEngine &tee, const Job &job, uint64_t generationNumber, LearningMode mode, LearningEnvironment &le) const override
 Evaluates policy starting from the given root, taking adversarial in charge. More...
 
std::queue< std::shared_ptr< Learn::Job > > makeJobs (Learn::LearningMode mode, TPG::TPGGraph *tpgGraph=nullptr) override
 Puts all roots into AdversarialJob to be able to use them in simulation later. The difference with the base learning agent makeJobs is that here we make jobs containing several roots to play together. More...
 
- Public Member Functions inherited from Learn::ParallelLearningAgent
 ParallelLearningAgent (LearningEnvironment &le, const Instructions::Set &iSet, const LearningParameters &p, const TPG::TPGFactory &factory=TPG::TPGFactory())
 Constructor for ParallelLearningAgent. More...
 
std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > evaluateAllRoots (uint64_t generationNumber, LearningMode mode) override
 Evaluate all root TPGVertex of the TPGGraph. More...
 
- Public Member Functions inherited from Learn::LearningAgent
 LearningAgent (LearningEnvironment &le, const Instructions::Set &iSet, const LearningParameters &p, const TPG::TPGFactory &factory=TPG::TPGFactory())
 Constructor for LearningAgent. More...
 
virtual ~LearningAgent ()=default
 Default destructor for polymorphism.
 
std::shared_ptr< TPG::TPGGraphgetTPGGraph ()
 Getter for the TPGGraph built by the LearningAgent. More...
 
const ArchivegetArchive () const
 Getter for the Archive filled by the LearningAgent. More...
 
Mutator::RNGgetRNG ()
 Getter for the RNG used by the LearningAgent. More...
 
void addLogger (Log::LALogger &logger)
 Adds a LALogger to the loggers vector. More...
 
virtual std::shared_ptr< EvaluationResultevaluateJob (TPG::TPGExecutionEngine &tee, const Job &job, uint64_t generationNumber, LearningMode mode, LearningEnvironment &le) const
 Evaluates policy starting from the given root. More...
 
bool isRootEvalSkipped (const TPG::TPGVertex &root, std::shared_ptr< Learn::EvaluationResult > &previousResult) const
 Method detecting whether a root should be evaluated again. More...
 
virtual std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > evaluateAllRoots (uint64_t generationNumber, LearningMode mode)
 Evaluate all root TPGVertex of the TPGGraph. More...
 
virtual void trainOneGeneration (uint64_t generationNumber)
 Train the TPGGraph for one generation. More...
 
virtual void decimateWorstRoots (std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results)
 Removes from the TPGGraph the root TPGVertex with the worst results. More...
 
uint64_t train (volatile bool &altTraining, bool printProgressBar)
 Train the TPGGraph for a given number of generation. More...
 
void updateEvaluationRecords (const std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results)
 Update the bestRoot and resultsPerRoot attributes. More...
 
void forgetPreviousResults ()
 This method resets the previous registered scores per root. More...
 
const std::pair< const TPG::TPGVertex *, std::shared_ptr< EvaluationResult > > & getBestRoot () const
 Get the best root TPG::Vertex encountered since the last init. More...
 
void keepBestPolicy ()
 This method keeps only the bestRoot policy in the TPGGraph. More...
 
virtual std::shared_ptr< Learn::JobmakeJob (int num, Learn::LearningMode mode, int idx=0, TPG::TPGGraph *tpgGraph=nullptr)
 Takes a given root index and creates a job containing it. Useful for example in adversarial mode where a job could contain a match of several roots. More...
 
virtual std::queue< std::shared_ptr< Learn::Job > > makeJobs (Learn::LearningMode mode, TPG::TPGGraph *tpgGraph=nullptr)
 Puts all roots into jobs to be able to use them in simulation later. More...
 
void init (uint64_t seed=0)
 Initialize the LearningAgent. More...
 

Protected Member Functions

void evaluateAllRootsInParallelCompileResults (std::map< uint64_t, std::pair< std::shared_ptr< EvaluationResult >, std::shared_ptr< Job > > > &resultsPerJobMap, std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results, std::map< uint64_t, Archive * > &archiveMap) override
 Subfunction of evaluateAllRootsInParallel which handles the gathering of results and the merge of the archives, adapted to several roots in jobs for adversarial. More...
 
- Protected Member Functions inherited from Learn::ParallelLearningAgent
virtual void evaluateAllRootsInParallel (uint64_t generationNumber, LearningMode mode, std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results)
 Method for evaluating all roots with parallelism. More...
 
virtual void evaluateAllRootsInParallelExecute (uint64_t generationNumber, LearningMode mode, std::map< uint64_t, std::pair< std::shared_ptr< EvaluationResult >, std::shared_ptr< Job > > > &resultsPerJobMap, std::map< uint64_t, Archive * > &archiveMap)
 Subfunction of evaluateAllRootsInParallel which handles the creation of threads, their execution and junction. More...
 
virtual void evaluateAllRootsInParallelCompileResults (std::map< uint64_t, std::pair< std::shared_ptr< EvaluationResult >, std::shared_ptr< Job > > > &resultsPerJobMap, std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results, std::map< uint64_t, Archive * > &archiveMap)
 Subfunction of evaluateAllRootsInParallel which handles the gathering of results and the merge of the archives. More...
 
void slaveEvalJobThread (uint64_t generationNumber, LearningMode mode, std::queue< std::shared_ptr< Learn::Job > > &jobsToProcess, std::mutex &rootsToProcessMutex, std::map< uint64_t, std::pair< std::shared_ptr< EvaluationResult >, std::shared_ptr< Job > > > &resultsPerRootMap, std::mutex &resultsPerRootMapMutex, std::map< uint64_t, Archive * > &archiveMap, std::mutex &archiveMapMutex, bool useMainEnvironment)
 Function implementing the behavior of slave threads during parallel evaluation of roots. More...
 
void mergeArchiveMap (std::map< uint64_t, Archive * > &archiveMap)
 Method to merge several Archive created in parallel threads. More...
 

Protected Attributes

std::vector< const TPG::TPGVertex * > champions
 Champions of the last generation. More...
 
size_t agentsPerEvaluation
 Number of agents per evaluation (e.g. 2 in tic-tac-toe).
 
- Protected Attributes inherited from Learn::LearningAgent
LearningEnvironmentlearningEnvironment
 LearningEnvironment with which the LearningAgent will interact.
 
Environment env
 Environment for executing Program of the LearningAgent.
 
Archive archive
 Archive used during the training process.
 
LearningParameters params
 Parameters for the learning process.
 
std::shared_ptr< TPG::TPGGraphtpg
 TPGGraph built during the learning process.
 
std::pair< const TPG::TPGVertex *, std::shared_ptr< EvaluationResult > > bestRoot {nullptr, nullptr}
 
std::map< const TPG::TPGVertex *, std::shared_ptr< EvaluationResult > > resultsPerRoot
 Map associating root TPG::TPGVertex to their EvaluationResult. More...
 
Mutator::RNG rng
 Random Number Generator for this Learning Agent.
 
uint64_t maxNbThreads = 1
 Control the maximum number of threads when running in parallel.
 
std::vector< std::reference_wrapper< Log::LALogger > > loggers
 Set of LALogger called throughout the training process. More...
 

Detailed Description

Class used to control the learning steps of a TPGGraph within a given LearningEnvironment, with a support of adversarial allowing multi-agent simulations. To have several agents per evaluation, we use a job object embedding some TPG roots.

Globally the process of the adversarial learning agent normal training can be summed up as follow : 1-Initialize, create/populate the TPG. 2-Create jobs with makeJobs. Each job is a simulation configuration : it contains some IDs and more important the roots that will be evaluated, in their order of play. There will be agentsPerEvaluation roots in each one. There can the same roots several times in the same job, and each root can be in several jobs. 3-Evaluate each job nbIterationsPerJob times, getting as many results scores as there are roots in the job. 4-Browse the results of every job and accumulate them to compute the results per root. 5-Eliminate bad roots. 6-Validate if params.doValidation is true. 7-Go back to step 1 until we want to stop.

Note that the process only differs from the normal Learning Agent in steps 2, 3, 4 and 6.

Constructor & Destructor Documentation

◆ AdversarialLearningAgent()

Learn::AdversarialLearningAgent::AdversarialLearningAgent ( LearningEnvironment le,
const Instructions::Set iSet,
const LearningParameters p,
size_t  agentsPerEval = 2,
const TPG::TPGFactory factory = TPG::TPGFactory() 
)
inline

Constructor for AdversarialLearningAgent.

Based on default constructor of ParallelLearningAgent

Parameters
[in]leThe LearningEnvironment for the TPG.
[in]iSetSet of Instruction used to compose Programs in the learning process.
[in]pThe LearningParameters for the LearningAgent.
[in]agentsPerEvalThe number of agents each simulation will need.
[in]factoryThe TPGFactory used to create the TPGGraph. A default TPGFactory is used if none is provided.

Member Function Documentation

◆ evaluateAllRoots()

std::multimap< std::shared_ptr< Learn::EvaluationResult >, const TPG::TPGVertex * > Learn::AdversarialLearningAgent::evaluateAllRoots ( uint64_t  generationNumber,
Learn::LearningMode  mode 
)
overridevirtual

Evaluate all root TPGVertex of the TPGGraph.

Replaces the function from the base class ParallelLearningAgent.

This method calls the evaluateJob method for every root TPGVertex of the TPGGraph. The method returns a sorted map associating each root vertex to its average score, in ascending order of score. Sequential or parallel, both situations should output the same result.

Parameters
[in]generationNumberthe integer number of the current generation.
[in]modethe LearningMode to use during the policy evaluation.

Copyright or © or Copr. IETR/INSA - Rennes (2020) :

Pierre-Yves Le Rolland-Raumer plero.nosp@m.lla@.nosp@m.insa-.nosp@m.renn.nosp@m.es.fr (2020)

GEGELATI is an open-source reinforcement learning framework for training artificial intelligence based on Tangled Program Graphs (TPGs).

This software is governed by the CeCILL-C license under French law and abiding by the rules of distribution of free software. You can use, modify and/ or redistribute the software under the terms of the CeCILL-C license as circulated by CEA, CNRS and INRIA at the following URL "http://www.cecill.info".

As a counterpart to the access to the source code and rights to copy, modify and redistribute granted by the license, users are provided only with a limited warranty and the software's author, the holder of the economic rights, and the successive licensors have only limited liability.

In this respect, the user's attention is drawn to the risks associated with loading, using, modifying and/or developing or reproducing the software by the user in light of its specific status of free software, that may mean that it is complicated to manipulate, and that also therefore means that it is reserved for developers and experienced professionals having in-depth computer knowledge. Users are therefore encouraged to load and test the software's suitability as regards their requirements in conditions enabling the security of their systems and/or data to be ensured and, more generally, to use and operate it in the same conditions as regards security.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-C license and that you accept its terms.

Reimplemented from Learn::LearningAgent.

◆ evaluateAllRootsInParallelCompileResults()

void Learn::AdversarialLearningAgent::evaluateAllRootsInParallelCompileResults ( std::map< uint64_t, std::pair< std::shared_ptr< EvaluationResult >, std::shared_ptr< Job > > > &  resultsPerJobMap,
std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &  results,
std::map< uint64_t, Archive * > &  archiveMap 
)
overrideprotectedvirtual

Subfunction of evaluateAllRootsInParallel which handles the gathering of results and the merge of the archives, adapted to several roots in jobs for adversarial.

This method gathers results in a map linking root to result, and then reverts the map to match the "results" argument. The archive will just be merged like in ParallelLearningAgent.

Note that if there is a "posOfStudiedRoot" different from -1 in the jobs, only the EvaluationResult of the posOfStudiedRoot will be written to the results map. And the results of the other roots within the Job will be discarded. The reason is that when roots face champions, the champions shouldn't have their scores updated, or as they encounter many unskilled roots they will always have a high score.

Parameters
[in]resultsPerJobMapmap linking the job number with its results and itself.
[out]resultsmap linking single results to their root vertex.
[in,out]archiveMapmap linking the job number with its gathered archive. These archives will later be merged with the ones of the other jobs.

Reimplemented from Learn::ParallelLearningAgent.

◆ evaluateJob()

std::shared_ptr< Learn::EvaluationResult > Learn::AdversarialLearningAgent::evaluateJob ( TPG::TPGExecutionEngine tee,
const Job job,
uint64_t  generationNumber,
Learn::LearningMode  mode,
LearningEnvironment le 
) const
overridevirtual

Evaluates policy starting from the given root, taking adversarial in charge.

The policy, that is, the TPGGraph execution starting from the given TPGVertex is evaluated nbIteration times. The generationNumber is combined with the current iteration number to generate a set of seeds for evaluating the policy.

The method is const to enable potential parallel calls to it.

Parameters
[in]teeThe TPGExecutionEngine to use.
[in]jobthe TPGVertex group from which the policy evaluation starts. Each of the roots of the group shall be an agent of the same simulation.
[in]generationNumberthe integer number of the current generation.
[in]modethe LearningMode to use during the policy evaluation.
[in]leReference to the LearningEnvironment to use during the policy evaluation (may be different from the attribute of the class in child LearningAgentClass).
Returns
a std::shared_ptr to the EvaluationResult for the root. This will be an AdversarialEvaluationResult that contains the score of each root of the job. The same root can appear in several jobs, so these scores are to be combined by the element that calls this method. AdversarialEvaluationResult will also contain the number of iterations that have been done in this job, that could be useful to combine results later.

Reimplemented from Learn::LearningAgent.

◆ makeJobs()

std::queue< std::shared_ptr< Learn::Job > > Learn::AdversarialLearningAgent::makeJobs ( Learn::LearningMode  mode,
TPG::TPGGraph tpgGraph = nullptr 
)
overridevirtual

Puts all roots into AdversarialJob to be able to use them in simulation later. The difference with the base learning agent makeJobs is that here we make jobs containing several roots to play together.

To make jobs, this method used champions. If no champion exists, the first roots of the roots list are taken. Otherwise, the best roots from the previous generation are kept in the list of champions. Several champions are put together to create "teams" of predefined roots. They are chosen randomly and are of size agentsPerEvaluation-1. Then, to create the job each root of the population is put to fulfill this team at every possible location (for example if the team is made of roots A-B and if we put a root R in it, we will have R-A-B, A-R-B and A-B-R as jobs). The number of teams is calculated so that each root will be evaluated nbIterationsPerPolicyEvaluation times.

Parameters
[in]modethe mode of the training, determining for example if we generate values that we only need for training.
[in]tpgGraphThe TPG graph from which we will take the roots.
Returns
A queue containing pointers of the created AdversarialJobs.

Reimplemented from Learn::LearningAgent.

Member Data Documentation

◆ champions

std::vector<const TPG::TPGVertex*> Learn::AdversarialLearningAgent::champions
protected

Champions of the last generation.

All roots of a generation that are kept are put in this list. Then, the roots of the next generation will fight against these champions to be evaluated.


The documentation for this class was generated from the following files: