From 67229bf3500ed64d7dfc28bdf83000e077ddf477 Mon Sep 17 00:00:00 2001 From: Michael Luo Date: Sat, 9 Jan 2021 02:21:51 -0800 Subject: [PATCH] [RLlib] SlateQ Documentation (#13266) --- doc/source/rllib-algorithms.rst | 19 +++++++++++++++++++ doc/source/rllib-toc.rst | 2 ++ 2 files changed, 21 insertions(+) diff --git a/doc/source/rllib-algorithms.rst b/doc/source/rllib-algorithms.rst index dfa1bde00..f849cdf09 100644 --- a/doc/source/rllib-algorithms.rst +++ b/doc/source/rllib-algorithms.rst @@ -27,6 +27,7 @@ Algorithm Frameworks Discrete Actions Continuous Actions Multi- `PG`_ tf + torch **Yes** `+parametric`_ **Yes** **Yes** `+RNN`_, `+LSTM auto-wrapping`_, `+Transformer`_, `+autoreg`_ `PPO`_, `APPO`_ tf + torch **Yes** `+parametric`_ **Yes** **Yes** `+RNN`_, `+LSTM auto-wrapping`_, `+Transformer`_, `+autoreg`_ `SAC`_ tf + torch **Yes** **Yes** **Yes** +`SlateQ`_ torch **Yes** No No `LinUCB`_, `LinTS`_ torch **Yes** `+parametric`_ No **Yes** `AlphaZero`_ torch **Yes** `+parametric`_ No No =================== ========== ======================= ================== =========== ============================================================= @@ -523,6 +524,24 @@ Cheetah-Run 640 ~800 :start-after: __sphinx_doc_begin__ :end-before: __sphinx_doc_end__ +.. _slateq: + +SlateQ +------- +|pytorch| +`[paper] `__ `[implementation] `__ + +SlateQ is a model-free RL method that builds on top of DQN and generates recommendation slates for recommender system environments. Since these types of environments come with large combinatorial action spaces, SlateQ mitigates this by decomposing the Q-value into single-item Q-values and solves the decomposed objective via mixing integer programming and deep learning optimization. SlateQ can be evaluated on Google's RecSim `environment `__. `An RLlib wrapper for RecSim can be found here < `__. + +RecSim environment wrapper: `Google RecSim `__ + +**SlateQ-specific configs** (see also `common configs `__): + +.. literalinclude:: ../../rllib/agents/slateq/slateq.py + :language: python + :start-after: __sphinx_doc_begin__ + :end-before: __sphinx_doc_end__ + Derivative-free ~~~~~~~~~~~~~~~ diff --git a/doc/source/rllib-toc.rst b/doc/source/rllib-toc.rst index f4331eb6b..04ef292f1 100644 --- a/doc/source/rllib-toc.rst +++ b/doc/source/rllib-toc.rst @@ -109,6 +109,8 @@ Algorithms - |pytorch| |tensorflow| :ref:`Soft Actor Critic (SAC) ` + - |pytorch| :ref:`Slate Q-Learning (SlateQ) ` + * Derivative-free - |pytorch| |tensorflow| :ref:`Augmented Random Search (ARS) `