[RLlib] SlateQ Documentation (#13266)

This commit is contained in:
Michael Luo
2021-01-09 02:21:51 -08:00
committed by GitHub
parent d434ba6518
commit 67229bf350
2 changed files with 21 additions and 0 deletions
+19
View File
@@ -27,6 +27,7 @@ Algorithm Frameworks Discrete Actions Continuous Actions Multi-
`PG`_ tf + torch **Yes** `+parametric`_ **Yes** **Yes** `+RNN`_, `+LSTM auto-wrapping`_, `+Transformer`_, `+autoreg`_
`PPO`_, `APPO`_ tf + torch **Yes** `+parametric`_ **Yes** **Yes** `+RNN`_, `+LSTM auto-wrapping`_, `+Transformer`_, `+autoreg`_
`SAC`_ tf + torch **Yes** **Yes** **Yes**
`SlateQ`_ torch **Yes** No No
`LinUCB`_, `LinTS`_ torch **Yes** `+parametric`_ No **Yes**
`AlphaZero`_ torch **Yes** `+parametric`_ No No
=================== ========== ======================= ================== =========== =============================================================
@@ -523,6 +524,24 @@ Cheetah-Run 640 ~800
:start-after: __sphinx_doc_begin__
:end-before: __sphinx_doc_end__
.. _slateq:
SlateQ
-------
|pytorch|
`[paper] <https://storage.googleapis.com/pub-tools-public-publication-data/pdf/9f91de1fa0ac351ecb12e4062a37afb896aa1463.pdf>`__ `[implementation] <https://github.com/ray-project/ray/blob/master/rllib/agents/slateq/slateq.py>`__
SlateQ is a model-free RL method that builds on top of DQN and generates recommendation slates for recommender system environments. Since these types of environments come with large combinatorial action spaces, SlateQ mitigates this by decomposing the Q-value into single-item Q-values and solves the decomposed objective via mixing integer programming and deep learning optimization. SlateQ can be evaluated on Google's RecSim `environment <https://github.com/google-research/recsim>`__. `An RLlib wrapper for RecSim can be found here < <https://github.com/ray-project/ray/blob/master/rllib/env/wrappers/recsim_wrapper.py>`__.
RecSim environment wrapper: `Google RecSim <https://github.com/ray-project/ray/blob/master/rllib/env/wrappers/recsim_wrapper.py>`__
**SlateQ-specific configs** (see also `common configs <rllib-training.html#common-parameters>`__):
.. literalinclude:: ../../rllib/agents/slateq/slateq.py
:language: python
:start-after: __sphinx_doc_begin__
:end-before: __sphinx_doc_end__
Derivative-free
~~~~~~~~~~~~~~~
+2
View File
@@ -109,6 +109,8 @@ Algorithms
- |pytorch| |tensorflow| :ref:`Soft Actor Critic (SAC) <sac>`
- |pytorch| :ref:`Slate Q-Learning (SlateQ) <slateq>`
* Derivative-free
- |pytorch| |tensorflow| :ref:`Augmented Random Search (ARS) <ars>`