diff --git a/doc/source/rllib-algorithms.rst b/doc/source/rllib-algorithms.rst index dfa1bde00..f849cdf09 100644 --- a/doc/source/rllib-algorithms.rst +++ b/doc/source/rllib-algorithms.rst @@ -27,6 +27,7 @@ Algorithm Frameworks Discrete Actions Continuous Actions Multi- `PG`_ tf + torch **Yes** `+parametric`_ **Yes** **Yes** `+RNN`_, `+LSTM auto-wrapping`_, `+Transformer`_, `+autoreg`_ `PPO`_, `APPO`_ tf + torch **Yes** `+parametric`_ **Yes** **Yes** `+RNN`_, `+LSTM auto-wrapping`_, `+Transformer`_, `+autoreg`_ `SAC`_ tf + torch **Yes** **Yes** **Yes** +`SlateQ`_ torch **Yes** No No `LinUCB`_, `LinTS`_ torch **Yes** `+parametric`_ No **Yes** `AlphaZero`_ torch **Yes** `+parametric`_ No No =================== ========== ======================= ================== =========== ============================================================= @@ -523,6 +524,24 @@ Cheetah-Run 640 ~800 :start-after: __sphinx_doc_begin__ :end-before: __sphinx_doc_end__ +.. _slateq: + +SlateQ +------- +|pytorch| +`[paper] `__ `[implementation] `__ + +SlateQ is a model-free RL method that builds on top of DQN and generates recommendation slates for recommender system environments. Since these types of environments come with large combinatorial action spaces, SlateQ mitigates this by decomposing the Q-value into single-item Q-values and solves the decomposed objective via mixing integer programming and deep learning optimization. SlateQ can be evaluated on Google's RecSim `environment `__. `An RLlib wrapper for RecSim can be found here < `__. + +RecSim environment wrapper: `Google RecSim `__ + +**SlateQ-specific configs** (see also `common configs `__): + +.. literalinclude:: ../../rllib/agents/slateq/slateq.py + :language: python + :start-after: __sphinx_doc_begin__ + :end-before: __sphinx_doc_end__ + Derivative-free ~~~~~~~~~~~~~~~ diff --git a/doc/source/rllib-toc.rst b/doc/source/rllib-toc.rst index f4331eb6b..04ef292f1 100644 --- a/doc/source/rllib-toc.rst +++ b/doc/source/rllib-toc.rst @@ -109,6 +109,8 @@ Algorithms - |pytorch| |tensorflow| :ref:`Soft Actor Critic (SAC) ` + - |pytorch| :ref:`Slate Q-Learning (SlateQ) ` + * Derivative-free - |pytorch| |tensorflow| :ref:`Augmented Random Search (ARS) `