Commit Graph

42 Commits

Author SHA1 Message Date
Sven Mika fe0bdb23ff [RLlib] Attention Net/Transformers docs improvement. 2020-08-17 13:07:17 -07:00
Sven Mika 66d204e078 [RLlib] Model documentation enhancements. (#10011) 2020-08-13 13:36:40 +02:00
Eric Liang be26a7b1b0 [rllib] Support for complex / variable-length observation spaces (#8393) 2020-06-06 12:22:19 +02:00
Sven Mika c74dc58f8b [RLlib] Fix use_lstm flag for ModelV2 (w/o ModelV1 wrapping) and add it for PyTorch. (#8734) 2020-06-05 15:40:30 +02:00
Sven Mika 2746fc0476 [RLlib] Auto-framework, retire use_pytorch in favor of framework=... (#8520) 2020-05-27 16:19:13 +02:00
Sven Mika 0422e9c5a8 [RLlib] Add 2 Transformer learning test cases on StatelessCartPole (PPO and IMPALA). (#8624) 2020-05-27 10:19:47 +02:00
Eric Liang 9a83908c46 [rllib] Deprecate policy optimizers (#8345) 2020-05-21 10:16:18 -07:00
Sven Mika 796a834c48 [RLlib] Attention Net integration into ModelV2 and learning RL example. (#8371) 2020-05-18 17:26:40 +02:00
Sven Mika 42991d723f [RLlib] rllib/examples folder restructuring (#8250)
Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well).
2020-05-01 22:59:34 +02:00
Eric Liang fbc545c03b [rllib] Support parallel, parameterized evaluation (#6981)
* eval api

* update

* sync eval filters

* sync fix

* docs

* update

* docs

* update

* link

* nit

* doc updates

* format
2020-02-01 22:12:12 -08:00
Eric Liang a229bdf272 [rllib] Deprecate custom preprocessors (#6833)
* deprecation warnings

* add log warn

* fix test
2020-01-18 23:30:09 -08:00
Shital Shah 670cb6374e Doc enhancement: use build.sh for ray, clarification on how rllib selects VisionNetwork, note on setup-dev.py for rllib. (#6092) 2019-12-02 22:19:01 -08:00
Olli Huotari 0916603e61 Fixed few broken links in docs (#5477)
* hyperband link changed

* tuned_examples link fix

* doc lstm link fix

* kubernetes example link fix
2019-08-19 14:22:25 -07:00
Eric Liang a1d2e17623 [rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00
Eric Liang 5d7afe8092 [rllib] Try moving RLlib to top level dir (#5324) 2019-08-05 23:25:49 -07:00
Eric Liang a62c5f40f6 [rllib] Document ModelV2 and clean up the models/ directory (#5277) 2019-07-27 02:08:16 -07:00
Eric Liang bf9199ad77 [rllib] ModelV2 support for pytorch (#5249) 2019-07-25 11:02:53 -07:00
Eric Liang 34d054ff19 [rllib] ModelV2 API (#4926) 2019-07-03 15:59:47 -07:00
Eric Liang 2dd0beb5bd [rllib] Allow access to batches prior to postprocessing (#4871) 2019-05-29 18:17:14 -07:00
Eric Liang 02583a8598 [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (#4819)
This implements some of the renames proposed in #4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.
2019-05-20 16:46:05 -07:00
Eric Liang 3807fb505b [rllib] TensorFlow 2 compatibility (#4802) 2019-05-16 22:12:07 -07:00
Eric Liang 37208216ae [rllib] Rename Agent to Trainer (#4556) 2019-04-07 00:36:18 -07:00
Eric Liang ba03048254 [rllib] TF model custom_loss() should actually allow access to full rollout data (#4220) 2019-03-02 22:57:51 -08:00
Robert Nishihara 4b89eebfc7 Move test folders under rllib/tune from test -> tests. (#4214) 2019-03-02 13:37:16 -08:00
Eric Liang d9da183c7d [rllib] Custom supervised loss API (#4083) 2019-02-24 15:36:13 -08:00
Megan Kawakami 346885068c [rllib] add torch pg (#3857)
* add torch pg

* add torch imports

* added torch pg

* working torch pg implementation

* add pg pytorch

* Update a3c.py

* Update a3c.py

* Update torch_policy_graph.py

* Update torch_policy_graph.py
2019-02-16 19:54:14 -08:00
Eric Liang ca864faece [rllib] Documentation for I/O API and multi-agent support / cleanup (#3650) 2019-01-03 15:15:36 +08:00
Eric Liang 47d36d7bd6 [rllib] Refactor pytorch custom model support (#3634) 2019-01-03 13:48:33 +08:00
Eric Liang d8205976e8 [rllib] Auto clip actions to Box space range; deprecate squash_to_range (#3426)
* fix clip

* tweak wording

* remove squash entirely

* Update rllib-models.rst

* fix argument order

* Apply suggestions from code review

Co-Authored-By: ericl <ekhliang@gmail.com>
2018-12-03 19:55:25 -08:00
Eric Liang 07d8cbf414 [rllib] Support batch norm layers (#3369)
* batch norm

* lint

* fix dqn/ddpg update ops

* bn model

* Update tf_policy_graph.py

* Update multi_gpu_impl.py

* Apply suggestions from code review

Co-Authored-By: ericl <ekhliang@gmail.com>
2018-11-29 13:33:39 -08:00
Eric Liang f0df97db6f [rllib] example and docs on how to use parametric actions with DQN / PG algorithms (#3384) 2018-11-27 23:35:19 -08:00
Eric Liang d90f365394 [rllib] Add self-supervised loss to model (#3291)
# What do these changes do?

Allow self-supervised losses to be easily defined in custom models. Add this to the reference policy graphs.
2018-11-12 18:55:24 -08:00
Eric Liang a221f55b0d [rllib] Add custom value functions, fix up and document multi-agent variable sharing (#3151) 2018-10-29 19:37:27 -07:00
Eric Liang 59901a88a0 [rllib] Native support for Dict and Tuple spaces; fix Tuple action spaces; add prev a, r to LSTM (#3051) 2018-10-20 15:21:22 -07:00
Eric Liang a9e454f6fd [rllib] Include config dicts in the sphinx docs (#3064) 2018-10-16 15:55:11 -07:00
Eric Liang b45bed4bce [rllib] Propagate model options correctly in ARS / ES, to action dist of PPO (#2974)
* fix

* fix

* fix it

* propagate conf to action dist

* move carla example too

* rr

* Update policies.py

* wip

* lint
2018-10-01 12:49:39 -07:00
Eric Liang 814c35b7d7 [rllib] Simplify sample batch size and num envs config, n_step adjustment (#2995)
* simplify vec batch requirements

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-models.rst
2018-09-30 18:36:22 -07:00
Eric Liang fbe6c59f72 [rllib] Misc fixes, A2C (#2679)
A bunch of minor rllib fixes:

pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
2018-08-20 15:28:03 -07:00
Eric Liang 5f430da180 [rllib] Provide internal access to episode state in compute_actions() and allow returning extra batches (#2559)
The goal of this PR is to allow custom policies to perform model-based rollouts. In the multi-agent setting, this requires access to not only policies of other agents, but also their current observations.
Also, you might want to return the model-based trajectories as part of the rollout for efficiency.

  compute_actions() now takes a new keyword arg episodes
  pull out internal episode class into a top-level file
  add function to return extra trajectories from an episode that will be appended to the sample batch
  documentation
2018-08-16 14:37:21 -07:00
Sergey Kolesnikov 05490b8cb9 [rllib] dqn/ddpg policy customization (#2445)
* dqn policy update - more customization

* docs for custom DQN graph

* Update rllib-training.rst

* Update rllib-models.rst

* Update rllib.rst

* Update rllib-training.rst

* Update rllib-concepts.rst

* yapf codestyle
2018-07-22 14:47:14 -07:00
Eric Liang d24f19fd1e [rllib] Fix stats collection and some docs bugs since the refactoring (#2361)
* fix

* fix pbt example

* fix

* fix

* single thread by default

* vec

* fix

* fix
2018-07-07 13:29:20 -07:00
Eric Liang 8aa56c12e6 [rllib] Document "v2" APIs (#2316)
* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* wip

* tf

* update

* fix

* cleanup

* cleanup

* spacing

* model

* fix

* dqn

* fix ddpg

* doc

* keep names

* update

* fix

* com

* docs

* clarify model outputs

* Update torch_policy_graph.py

* fix obs filter

* pass thru worker index

* fix

* rename

* vlad torch comments

* fix log action

* debug name

* fix lstm

* remove unused ddpg net

* remove conv net

* revert lstm

* wip

* wip

* cast

* wip

* works

* fix a3c

* works

* lstm util test

* doc

* clean up

* update

* fix lstm check

* move to end

* fix sphinx

* fix cmd

* remove bad doc

* envs

* vec

* doc prep

* models

* rl

* alg

* up

* clarify

* copy

* async sa

* fix

* comments

* fix a3c conf

* tune lstm

* fix reshape

* fix

* back to 16

* tuned a3c update

* update

* tuned

* optional

* merge

* wip

* fix up

* move pg class

* rename env

* wip

* update

* tip

* alg

* readme

* fix catalog

* readme

* doc

* context

* remove prep

* comma

* add env

* link to paper

* paper

* update

* rnn

* update

* wip

* clean up ev creation

* fix

* fix

* fix

* fix lint

* up

* no comma

* ma

* Update run_multi_node_tests.sh

* fix

* sphinx is stupid

* sphinx is stupid

* clarify torch graph

* no horizon

* fix config

* sb

* Update test_optimizers.py
2018-07-01 00:05:08 -07:00