diff --git a/doc/source/rllib-toc.rst b/doc/source/rllib-toc.rst index bc6b8b630..9596b1831 100644 --- a/doc/source/rllib-toc.rst +++ b/doc/source/rllib-toc.rst @@ -96,22 +96,14 @@ Algorithms - |pytorch| :ref:`Decentralized Distributed Proximal Policy Optimization (DD-PPO) ` - - |pytorch| :ref:`Single-Player AlphaZero (contrib/AlphaZero) ` - * Gradient-based - |pytorch| |tensorflow| :ref:`Advantage Actor-Critic (A2C, A3C) ` - |pytorch| |tensorflow| :ref:`Deep Deterministic Policy Gradients (DDPG, TD3) ` - - |pytorch| :ref:`Dreamer ` - - |pytorch| |tensorflow| :ref:`Deep Q Networks (DQN, Rainbow, Parametric DQN) ` - - |pytorch| |tensorflow| :ref:`Model-Agnostic Meta-Learning (MAML) ` - - - |pytorch| :ref:`Model-Based Meta-Policy-Optimization (MBMPO) ` - - |pytorch| |tensorflow| :ref:`Policy Gradients ` - |pytorch| |tensorflow| :ref:`Proximal Policy Optimization (PPO) ` @@ -124,7 +116,17 @@ Algorithms - |pytorch| |tensorflow| :ref:`Evolution Strategies ` -* Multi-agent specific +* Model-based / Meta-learning + + - |pytorch| :ref:`Single-Player AlphaZero (contrib/AlphaZero) ` + + - |pytorch| |tensorflow| :ref:`Model-Agnostic Meta-Learning (MAML) ` + + - |pytorch| :ref:`Model-Based Meta-Policy-Optimization (MBMPO) ` + + - |pytorch| :ref:`Dreamer (DREAMER) ` + +* Multi-agent - |pytorch| :ref:`QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) ` - |tensorflow| :ref:`Multi-Agent Deep Deterministic Policy Gradient (contrib/MADDPG) `