GEGELATI
|
Class used to control the learning steps of a TPGGraph within a given LearningEnvironment, with parallel executions for speedup purposes. More...
#include <parallelLearningAgent.h>
Public Member Functions | |
ParallelLearningAgent (LearningEnvironment &le, const Instructions::Set &iSet, const LearningParameters &p, const TPG::TPGFactory &factory=TPG::TPGFactory()) | |
Constructor for ParallelLearningAgent. More... | |
std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > | evaluateAllRoots (uint64_t generationNumber, LearningMode mode) override |
Evaluate all root TPGVertex of the TPGGraph. More... | |
![]() | |
LearningAgent (LearningEnvironment &le, const Instructions::Set &iSet, const LearningParameters &p, const TPG::TPGFactory &factory=TPG::TPGFactory()) | |
Constructor for LearningAgent. More... | |
virtual | ~LearningAgent ()=default |
Default destructor for polymorphism. | |
std::shared_ptr< TPG::TPGGraph > | getTPGGraph () |
Getter for the TPGGraph built by the LearningAgent. More... | |
const Archive & | getArchive () const |
Getter for the Archive filled by the LearningAgent. More... | |
Mutator::RNG & | getRNG () |
Getter for the RNG used by the LearningAgent. More... | |
void | addLogger (Log::LALogger &logger) |
Adds a LALogger to the loggers vector. More... | |
virtual std::shared_ptr< EvaluationResult > | evaluateJob (TPG::TPGExecutionEngine &tee, const Job &job, uint64_t generationNumber, LearningMode mode, LearningEnvironment &le) const |
Evaluates policy starting from the given root. More... | |
bool | isRootEvalSkipped (const TPG::TPGVertex &root, std::shared_ptr< Learn::EvaluationResult > &previousResult) const |
Method detecting whether a root should be evaluated again. More... | |
virtual std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > | evaluateAllRoots (uint64_t generationNumber, LearningMode mode) |
Evaluate all root TPGVertex of the TPGGraph. More... | |
virtual void | trainOneGeneration (uint64_t generationNumber) |
Train the TPGGraph for one generation. More... | |
virtual void | decimateWorstRoots (std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results) |
Removes from the TPGGraph the root TPGVertex with the worst results. More... | |
uint64_t | train (volatile bool &altTraining, bool printProgressBar) |
Train the TPGGraph for a given number of generation. More... | |
void | updateEvaluationRecords (const std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results) |
Update the bestRoot and resultsPerRoot attributes. More... | |
void | forgetPreviousResults () |
This method resets the previous registered scores per root. More... | |
const std::pair< const TPG::TPGVertex *, std::shared_ptr< EvaluationResult > > & | getBestRoot () const |
Get the best root TPG::Vertex encountered since the last init. More... | |
void | keepBestPolicy () |
This method keeps only the bestRoot policy in the TPGGraph. More... | |
virtual std::shared_ptr< Learn::Job > | makeJob (int num, Learn::LearningMode mode, int idx=0, TPG::TPGGraph *tpgGraph=nullptr) |
Takes a given root index and creates a job containing it. Useful for example in adversarial mode where a job could contain a match of several roots. More... | |
virtual std::queue< std::shared_ptr< Learn::Job > > | makeJobs (Learn::LearningMode mode, TPG::TPGGraph *tpgGraph=nullptr) |
Puts all roots into jobs to be able to use them in simulation later. More... | |
void | init (uint64_t seed=0) |
Initialize the LearningAgent. More... | |
Protected Member Functions | |
virtual void | evaluateAllRootsInParallel (uint64_t generationNumber, LearningMode mode, std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results) |
Method for evaluating all roots with parallelism. More... | |
virtual void | evaluateAllRootsInParallelExecute (uint64_t generationNumber, LearningMode mode, std::map< uint64_t, std::pair< std::shared_ptr< EvaluationResult >, std::shared_ptr< Job > > > &resultsPerJobMap, std::map< uint64_t, Archive * > &archiveMap) |
Subfunction of evaluateAllRootsInParallel which handles the creation of threads, their execution and junction. More... | |
virtual void | evaluateAllRootsInParallelCompileResults (std::map< uint64_t, std::pair< std::shared_ptr< EvaluationResult >, std::shared_ptr< Job > > > &resultsPerJobMap, std::multimap< std::shared_ptr< EvaluationResult >, const TPG::TPGVertex * > &results, std::map< uint64_t, Archive * > &archiveMap) |
Subfunction of evaluateAllRootsInParallel which handles the gathering of results and the merge of the archives. More... | |
void | slaveEvalJobThread (uint64_t generationNumber, LearningMode mode, std::queue< std::shared_ptr< Learn::Job > > &jobsToProcess, std::mutex &rootsToProcessMutex, std::map< uint64_t, std::pair< std::shared_ptr< EvaluationResult >, std::shared_ptr< Job > > > &resultsPerRootMap, std::mutex &resultsPerRootMapMutex, std::map< uint64_t, Archive * > &archiveMap, std::mutex &archiveMapMutex, bool useMainEnvironment) |
Function implementing the behavior of slave threads during parallel evaluation of roots. More... | |
void | mergeArchiveMap (std::map< uint64_t, Archive * > &archiveMap) |
Method to merge several Archive created in parallel threads. More... | |
Additional Inherited Members | |
![]() | |
LearningEnvironment & | learningEnvironment |
LearningEnvironment with which the LearningAgent will interact. | |
Environment | env |
Environment for executing Program of the LearningAgent. | |
Archive | archive |
Archive used during the training process. | |
LearningParameters | params |
Parameters for the learning process. | |
std::shared_ptr< TPG::TPGGraph > | tpg |
TPGGraph built during the learning process. | |
std::pair< const TPG::TPGVertex *, std::shared_ptr< EvaluationResult > > | bestRoot {nullptr, nullptr} |
std::map< const TPG::TPGVertex *, std::shared_ptr< EvaluationResult > > | resultsPerRoot |
Map associating root TPG::TPGVertex to their EvaluationResult. More... | |
Mutator::RNG | rng |
Random Number Generator for this Learning Agent. | |
uint64_t | maxNbThreads = 1 |
Control the maximum number of threads when running in parallel. | |
std::vector< std::reference_wrapper< Log::LALogger > > | loggers |
Set of LALogger called throughout the training process. More... | |
Class used to control the learning steps of a TPGGraph within a given LearningEnvironment, with parallel executions for speedup purposes.
This class is intented to replace the default LearningAgent soon.
Because of parallelism, determinism of the LearningProcess could easiliy be lost, but this implementation must remain deterministic at all costs.
|
inline |
Constructor for ParallelLearningAgent.
Based on default constructor of LearningAgent
[in] | le | The LearningEnvironment for the TPG. |
[in] | iSet | Set of Instruction used to compose Programs in the learning process. |
[in] | p | The LearningParameters for the LearningAgent. |
[in] | factory | The TPGFactory used to create the TPGGraph. A default TPGFactory is used if none is provided. |
|
overridevirtual |
Evaluate all root TPGVertex of the TPGGraph.
Replaces the function from the base class LearningAgent.
This method must always the same results as the evaluateAllRoots for a sequential execution. The Archive should also be updated in the exact same manner.
This method calls the evaluateJob method for every root TPGVertex of the TPGGraph. The method returns a sorted map associating each root vertex to its average score, in ascending order or score.
[in] | generationNumber | the integer number of the current generation. |
[in] | mode | the LearningMode to use during the policy evaluation. |
Copyright or © or Copr. IETR/INSA - Rennes (2019 - 2020) :
Karol Desnos kdesn.nosp@m.os@i.nosp@m.nsa-r.nosp@m.enne.nosp@m.s.fr (2019 - 2020) Nicolas Sourbier nsour.nosp@m.bie@.nosp@m.insa-.nosp@m.renn.nosp@m.es.fr (2020) Pierre-Yves Le Rolland-Raumer plero.nosp@m.lla@.nosp@m.insa-.nosp@m.renn.nosp@m.es.fr (2020)
GEGELATI is an open-source reinforcement learning framework for training artificial intelligence based on Tangled Program Graphs (TPGs).
This software is governed by the CeCILL-C license under French law and abiding by the rules of distribution of free software. You can use, modify and/ or redistribute the software under the terms of the CeCILL-C license as circulated by CEA, CNRS and INRIA at the following URL "http://www.cecill.info".
As a counterpart to the access to the source code and rights to copy, modify and redistribute granted by the license, users are provided only with a limited warranty and the software's author, the holder of the economic rights, and the successive licensors have only limited liability.
In this respect, the user's attention is drawn to the risks associated with loading, using, modifying and/or developing or reproducing the software by the user in light of its specific status of free software, that may mean that it is complicated to manipulate, and that also therefore means that it is reserved for developers and experienced professionals having in-depth computer knowledge. Users are therefore encouraged to load and test the software's suitability as regards their requirements in conditions enabling the security of their systems and/or data to be ensured and, more generally, to use and operate it in the same conditions as regards security.
The fact that you are presently reading this means that you have had knowledge of the CeCILL-C license and that you accept its terms.
Reimplemented from Learn::LearningAgent.
|
protectedvirtual |
Method for evaluating all roots with parallelism.
The work is delegated in two distinct methods (this structure is made for inheritance purpose) : evaluateAllRootsInParallelExecute and evaluateAllRootsInParallelCompileResults.
[in] | generationNumber | the integer number of the current generation. |
[in] | mode | the LearningMode to use during the policy evaluation. |
[in] | results | Map to store the resulting score of evaluated roots. |
|
protectedvirtual |
Subfunction of evaluateAllRootsInParallel which handles the gathering of results and the merge of the archives.
This method just emplaces results from resultsPerJobMap, as each job only contains 1 root is is quite easy. The archive is merged with the mergeArchiveMap method.
[in] | resultsPerJobMap | map linking the job number with its results and itself. |
[out] | results | map linking single results to their root vertex. |
[in,out] | archiveMap | map linking the job number with its gathered archive. These archive swill later be merged with the ones of the other jobs. |
Reimplemented in Learn::AdversarialLearningAgent.
|
protectedvirtual |
Subfunction of evaluateAllRootsInParallel which handles the creation of threads, their execution and junction.
[in] | generationNumber | the integer number of the current generation. |
[in] | mode | the LearningMode to use during the policy evaluation. |
[out] | resultsPerJobMap | map linking the job number with its results and itself. |
[out] | archiveMap | map linking the job number with its gathered archive. These archive swill later be merged with the ones of the other jobs. |
|
protected |
Method to merge several Archive created in parallel threads.
The purpose of this method is to merhe several Archive into the archive attribute of this ParallelLearningAgent. This method is the key to obtain deterministic Archive even in a parallel context.
[in,out] | archiveMap | Map storing the Archive to be merged. |
|
protected |
Function implementing the behavior of slave threads during parallel evaluation of roots.
[in] | generationNumber | the integer number of the current generation. |
[in] | mode | the LearningMode to use during the policy evaluation. |
[in,out] | jobsToProcess | Ordered list of jobs of TPGVertex to process, stored as a pair with an id filling the archiveMap. The jobs are groups of roots that shall be agents in the same simulation, there is only 1 root if there is no adversarial (e.g. if the environmnent is not multiplayer). |
[in] | rootsToProcessMutex | Mutex protecting the rootsToProcess |
[in] | resultsPerRootMap | Map to store the resulting score of evaluated roots. |
[in] | resultsPerRootMapMutex | Mutex protecting the results. |
[in,out] | archiveMap | Map storing the exhaustiveArchive to be merged. |
[in] | archiveMapMutex | Mutex protecting the archiveMap. |
[in] | useMainEnvironment | Boolean that is true if we use the declared LearningEnvironment, otherwise the method will clone it. |