Commit Graph

496 Commits

Author SHA1 Message Date
Eric Liang 5d7afe8092 [rllib] Try moving RLlib to top level dir (#5324) 2019-08-05 23:25:49 -07:00
Eric Liang 955154a19d Reduce Ray / RLlib startup messages (#5368) 2019-08-05 13:23:54 -07:00
Kristian Hartikainen 13fb9fe3db [rllib] Feature/soft actor critic v2 (#5328)
* Add base for Soft Actor-Critic

* Pick changes from old SAC branch

* Update sac.py

* First implementation of sac model

* Remove unnecessary SAC imports

* Prune unnecessary noise and exploration code

* Implement SAC model and use that in SAC policy

* runs but doesn't learn

* clear state

* fix batch size

* Add missing alpha grads and vars

* -200 by 2k timesteps

* doc

* lazy squash

* one file

* ignore tfp

* revert done
2019-08-01 23:37:36 -07:00
Eric Liang 20450a4e82 [rllib] Add rock paper scissors multi-agent example (#5336) 2019-08-01 13:03:59 -07:00
jichan3751 bd6dfc994f [sgd] Replaced class Resources in sgd with use_gpu (#5252) 2019-08-01 01:03:10 -07:00
Jaroslaw Rzepecki b3c8091a35 Fix Tuple spaces in rollout.py (#5332)
Make sure that the initial action is also properly flattened.
2019-07-31 11:38:49 -07:00
Michael Luo 1337c98f02 [rllib] Importance Sampling and KL Loss for APPO (#5051) 2019-07-29 15:02:32 -07:00
Eric Liang 3bdd114282 [rllib] Better example rnn envs (#5300) 2019-07-28 14:07:18 -07:00
Eric Liang a62c5f40f6 [rllib] Document ModelV2 and clean up the models/ directory (#5277) 2019-07-27 02:08:16 -07:00
Richard Liaw 5e15b36d6e [tune] experiment_analysis split to Analysis (#5115) 2019-07-27 01:10:52 -07:00
Antoine Galataud 827618254a [rllib] Configure learner queue timeout (#5270)
* configure learner queue timeout

* lint

* use config

* fix method args order, add unit test

* fix wrong param name
2019-07-25 21:18:05 -07:00
Eric Liang bf9199ad77 [rllib] ModelV2 support for pytorch (#5249) 2019-07-25 11:02:53 -07:00
Eric Liang 60f59639c1 [rllib] Port DDPG to the build_tf_policy pattern (#5242) 2019-07-24 13:55:55 -07:00
Eric Liang 690b374581 [rllib] Add Keras LSTM example with ModelV2 (#5258) 2019-07-24 13:09:41 -07:00
Eric Liang 97c43284a6 [rllib] Fix trainer state restore (#5257) 2019-07-23 21:18:58 -07:00
Eric Liang f9043cc49a [rllib] Remove experimental eager support 2019-07-21 12:27:17 -07:00
Eric Liang d58b986858 [rllib] MultiCategorical shouldn't return array for kl or entropy (#5215)
* wip

* fix
2019-07-19 12:12:04 -07:00
Jones Wong da7676c925 Removed the implicit sync barrier at the end of each training iteration (#5217)
*  removed sync barrier at the end of each training iteration

*  formatted

*  modify the comment according to current semantics

*  lint check

* Update trainer.py
2019-07-18 22:59:52 -07:00
Eric Liang 28e5c5555d [rllib] Move some inline defs to avoid deserialization errors (#5228)
* fix bug

* move metrics too
2019-07-18 21:01:16 -07:00
Jones Wong 0af07bd493 Enable seeding actors for reproducible experiments (#5197)
*  enable graph-level worker-specific seed

*  lint checked

*  revised according to eric's suggestions

*  revised accordingly and added a test case

*  formated

* Update test_reproducibility.py

* Update trainer.py

* Update rollout_worker.py

* Update run_rllib_tests.sh

* Update worker_set.py
2019-07-17 23:31:34 -07:00
Jones Wong 81d297f87e Remove redundant scaler of l2 reg (#5172)
*  remove redundant scaler of l2 reg

*  lint formatted

* Update ddpg_policy.py
2019-07-17 15:11:27 -07:00
Jones Wong ae03c42dd6 Fixed inconsistent action placeholder (#5213) 2019-07-17 10:55:14 -07:00
Sam Toyer 214f09d969 [rllib] Make RLLib handle zero-length observation arrays (#5208)
* [rllib] Make _summarize handle zero-len arrays

Fixes #5207

* [rllib] Make aligned_array() handle empty arrays

* [rllib] Conform with old yapf
2019-07-16 22:37:57 -07:00
Eric Liang 4fa2a6006c [rllib] Remove nested import (#5204)
* remove nested import

* Update metrics.py
2019-07-16 10:52:56 -07:00
Eric Liang 047f4ccd61 [rllib] Fix rollout.py with tuple action space (#5201)
* fix it

* update doc too

* fix rollout
2019-07-16 10:52:35 -07:00
Jones Wong 5b13a7eb90 Keep parameter space noise consistent with action space noise (Fix 5173) (#5193)
*  make parameter space noise consistent with action space noise

*  modified according to lint check

*  indent
2019-07-14 12:20:35 -07:00
Richard Liaw 691c9733f9 [tune] Document trainable attributes and enable user-checkpoint… (#4868) 2019-07-10 18:51:11 -07:00
Eric Liang 5ab5017c67 [rllib] Fix impala stress test (#5101)
* add copy

* upgrade to tf 1.14

* update

* reduce count to workaround https://github.com/ray-project/ray/issues/5125

* Update impala.py

* placeholder

* comments

* update
2019-07-09 20:22:30 -07:00
Stefan Pantic dfc94ce7bc [rllib]Add entropy coeff decay (#5043) 2019-07-08 18:30:32 -07:00
Eric Liang 893744b3be [rllib] Revert "use make template" which seems to break DQN/Atari (#5134)
* Revert "use make template"

This reverts commit 291e9e0031c6e315fe24e5b4973dea375fe73918.

* debug vars
2019-07-07 19:51:26 -07:00
Eric Liang 932d6b2517 [rllib] Port IMPALA to ModelV2/build_tf_policy (#5130)
* port vtrace

* fix vf

* fix vs

* fix the example

* wip ddpg

* fix tests

* fix tests

* remove ddpg model

* comments

* set vf share layers True by default

* typo

* fix test
2019-07-07 15:06:41 -07:00
Eric Liang 445bcb29b0 [hotfix] fix backward compat with older yaml libraries 2019-07-06 20:41:28 -07:00
Eric Liang c15ed3ac55 [rllib] Shuffle RNN sequences in PPO as well (#5129)
* shuffle seq

* fix test
2019-07-06 20:40:49 -07:00
Brandon Bertelsen c04b69902c Updates for #5072 (#5091) 2019-07-06 16:05:50 -07:00
Aleksei Petrenko 09bde397c9 Multiagent experiment resume (#5102)
* Fixed problem with multiagent experiment resume

* Applied format script

* fix lint
2019-07-06 11:38:17 -07:00
Dušan Josipović e9b88dcbed [wingman -> tune] Add system performance tracking (#4924) 2019-07-06 00:57:35 -07:00
Eric Liang 34d054ff19 [rllib] ModelV2 API (#4926) 2019-07-03 15:59:47 -07:00
Kristian Hartikainen 9e0192bc0b [tune] Change the log syncing behavior (#4450)
* Change the log syncing behavior

* fix up abstractions for syncer

* Finished checkpoint syncing

* Code

* Set of changes to get things running

* Fixes for log syncing

* Fix parts

* Lint and other fixes

* fix some test

* Remove extra parsing functionality

* some test fixes

* Fix up cloud syncing

* Another thing to do

* Fix up tests and local sync

Changes LogSync into a mixin, and adds tests for different
functionalities.

* Fix up tests, start on local migration

* fix distributed migrations

* comments

* formatting

* Better checkpoint directory handling

* fix tests

* fix tests

* fix click

* comments

* formatting comments

* formatting and comments

* sync function deprecations

* syncfunction

* Add documentation for Syncing and Uploading

* nit

* BaseSyncer as base for Mixin in edge case

* more docs

* clean up assertions

* validate

* nit

* Update test_cluster.py

* betterdoc

* Update tune-usage.rst

* cleanup

* nit
2019-07-02 20:46:00 -07:00
Philipp Moritz bbe3e5b4ed [rllib] Give error if sample_async is used with pytorch for A3C (#5000)
* give error if sample_async is used with pytorch

* update

* Update a3c.py
2019-06-25 22:06:35 -07:00
Eric Liang aa5fc52e32 [rllib] Add QMIX mixer parameters to optimizer param list (#5014)
* add mixer params

* Update qmix_policy.py
2019-06-25 19:02:40 -07:00
Eric Liang 1d17125333 temp fix for build (#5006) 2019-06-20 18:07:44 -07:00
Eric Liang fa1d4c9807 [rllib] Fix DDPG example (#4973) 2019-06-13 15:07:46 -07:00
Eric Liang 4f8e100fe0 fix (#4950) 2019-06-10 10:20:55 +08:00
Eric Liang 77689d1116 [rllib] Port remainder of algorithms to build_trainer() pattern (#4920) 2019-06-07 16:45:36 -07:00
Eric Liang 9e328fbe6f [rllib] Add docs on how to use TF eager execution (#4927) 2019-06-07 16:42:37 -07:00
Eric Liang 7501ee51db [rllib] Rename PolicyEvaluator => RolloutWorker (#4820) 2019-06-03 06:49:24 +08:00
Eric Liang 99eae05cf6 [tune] Disallow setting resources_per_trial when it is already configured (#4880)
* disallow it

* import fix

* fix example

* fix test

* fix tests

* Update mock.py

* fix

* make less convoluted

* fix tests
2019-06-03 06:47:39 +08:00
Eric Liang 665d081fe9 [rllib] Rough port of DQN to build_tf_policy() pattern (#4823) 2019-06-02 14:14:31 +08:00
Eric Liang 9aa1cd613d [rllib] Allow Torch policies access to full action input dict in extra_action_out_fn (#4894)
* fix torch extra out

* preserve setitem

* fix docs
2019-06-01 16:58:49 +08:00
Eric Liang 1c073e92e4 [rllib] Fix documentation on custom policies (#4910)
* wip

* add docs

* lint

* todo sections

* fix doc
2019-06-01 16:13:21 +08:00