From 67229bf3500ed64d7dfc28bdf83000e077ddf477 Mon Sep 17 00:00:00 2001
From: Michael Luo <michael.luo123456789@gmail.com>
Date: Sat, 9 Jan 2021 02:21:51 -0800
Subject: [PATCH] [RLlib] SlateQ Documentation (#13266)

---
 doc/source/rllib-algorithms.rst | 19 +++++++++++++++++++
 doc/source/rllib-toc.rst        |  2 ++
 2 files changed, 21 insertions(+)
diff --git a/doc/source/rllib-algorithms.rst b/doc/source/rllib-algorithms.rst
index dfa1bde00..f849cdf09 100644
--- a/doc/source/rllib-algorithms.rst
+++ b/doc/source/rllib-algorithms.rst
@@ -27,6 +27,7 @@ Algorithm           Frameworks Discrete Actions        Continuous Actions Multi-
 `PG`_               tf + torch **Yes** `+parametric`_  **Yes**            **Yes**     `+RNN`_, `+LSTM auto-wrapping`_, `+Transformer`_, `+autoreg`_
 `PPO`_, `APPO`_     tf + torch **Yes** `+parametric`_  **Yes**            **Yes**     `+RNN`_, `+LSTM auto-wrapping`_, `+Transformer`_, `+autoreg`_
 `SAC`_              tf + torch **Yes**                 **Yes**            **Yes**
+`SlateQ`_           torch      **Yes**                 No                 No
 `LinUCB`_, `LinTS`_ torch      **Yes** `+parametric`_  No                 **Yes**
 `AlphaZero`_        torch      **Yes** `+parametric`_  No                 No
 =================== ========== ======================= ================== =========== =============================================================
@@ -523,6 +524,24 @@ Cheetah-Run    640             ~800
    :start-after: __sphinx_doc_begin__
    :end-before: __sphinx_doc_end__
 
+.. _slateq:
+
+SlateQ
+-------
+|pytorch|
+`[paper] <https://storage.googleapis.com/pub-tools-public-publication-data/pdf/9f91de1fa0ac351ecb12e4062a37afb896aa1463.pdf>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/rllib/agents/slateq/slateq.py>`__
+
+SlateQ is a model-free RL method that builds on top of DQN and generates recommendation slates for recommender system environments. Since these types of environments come with large combinatorial action spaces, SlateQ mitigates this by decomposing the Q-value into single-item Q-values and solves the decomposed objective via mixing integer programming and deep learning optimization. SlateQ can be evaluated on Google's RecSim `environment <https://github.com/google-research/recsim>`__. `An RLlib wrapper for RecSim can be found here < <https://github.com/ray-project/ray/blob/master/rllib/env/wrappers/recsim_wrapper.py>`__.
+
+RecSim environment wrapper: `Google RecSim <https://github.com/ray-project/ray/blob/master/rllib/env/wrappers/recsim_wrapper.py>`__
+
+**SlateQ-specific configs** (see also `common configs <rllib-training.html#common-parameters>`__):
+
+.. literalinclude:: ../../rllib/agents/slateq/slateq.py
+   :language: python
+   :start-after: __sphinx_doc_begin__
+   :end-before: __sphinx_doc_end__
+
 Derivative-free
 ~~~~~~~~~~~~~~~
 
diff --git a/doc/source/rllib-toc.rst b/doc/source/rllib-toc.rst
index f4331eb6b..04ef292f1 100644
--- a/doc/source/rllib-toc.rst
+++ b/doc/source/rllib-toc.rst
@@ -109,6 +109,8 @@ Algorithms
 
    -  |pytorch| |tensorflow| :ref:`Soft Actor Critic (SAC) <sac>`
 
+   -  |pytorch| :ref:`Slate Q-Learning (SlateQ) <slateq>`
+
 *  Derivative-free
 
    -  |pytorch| |tensorflow| :ref:`Augmented Random Search (ARS) <ars>`