Commit Graph

124 Commits

Author SHA1 Message Date
butchcom 936bebef99 [rllib] Upgrade to OpenAI Gym 0.10.3 (#1601) 2018-03-06 00:31:02 -08:00
Eric Liang ecb811c26e [rllib] Ape-X implementation and DQN refactor to handle replay in policy optimizer (#1604)
* minimal apex checkin

* cleanup dqn options

* actor utils

* Sun Feb 25 17:39:54 PST 2018

* update

* compression refactor

* fix

* add test

* fix models

* Sun Feb 25 21:46:27 PST 2018

* Wed Feb 28 10:26:34 PST 2018

* Wed Feb 28 10:28:09 PST 2018

* Wed Feb 28 10:42:59 PST 2018

* refactor

* Wed Feb 28 11:17:19 PST 2018

* Wed Feb 28 11:42:08 PST 2018

* Wed Feb 28 11:42:13 PST 2018

* Wed Feb 28 11:59:02 PST 2018

* Wed Feb 28 11:59:58 PST 2018

* Wed Feb 28 12:00:08 PST 2018

* Wed Feb 28 12:02:19 PST 2018

* Wed Feb 28 13:44:31 PST 2018

* Wed Feb 28 17:01:20 PST 2018

* Sat Mar  3 14:55:59 PST 2018

* make optimizer construction explicit

* Sat Mar  3 18:23:08 PST 2018

* Sat Mar  3 18:24:28 PST 2018

* Sat Mar  3 18:49:28 PST 2018

* Sat Mar  3 18:50:42 PST 2018

* Sat Mar  3 18:56:10 PST 2018
2018-03-04 12:25:25 -08:00
Eric Liang 75293a0ba0 [rllib] Basic regression tests on CartPole (#1608)
* Sun Feb 25 21:36:22 PST 2018

* Sun Feb 25 21:42:09 PST 2018

* Sun Feb 25 21:44:30 PST 2018

* fix lint

* Wed Feb 28 12:41:49 PST 2018
2018-03-03 16:27:56 -08:00
Richard Liaw b79597dc00 [rllib] PPO Thread Limit (#1568) 2018-02-26 22:22:05 -08:00
Richard Liaw c2ad800cbf [rllib] Registry fix for DQN Replay Evaluators (#1593) 2018-02-25 22:30:11 -08:00
Eric Liang 1b596f7d3b [rllib] Rollout script needs to pipe in config and update states (#1566)
* Mon Feb 19 15:20:09 PST 2018

* fix it actually
2018-02-20 12:04:41 -08:00
Richard Liaw 0f766ae24b [rllib] Fix testGetFilters in A3C (#1557) 2018-02-19 22:44:14 -08:00
Eric Liang 4a6cfee887 [rllib] add tuned example for pendulum (#1552) 2018-02-18 00:46:42 -08:00
Robert Nishihara 61d8a17de0 [rllib] Change NotImplemented -> NotImplementedError. (#1535) 2018-02-16 17:08:25 -08:00
Eric Liang ca0f08d100 [tune] Recover experiments from last checkpoint (#1532) 2018-02-12 14:01:19 -08:00
Eric Liang 7e998db656 [rllib] Reduce concat memory usage, allow object store memory to be specified in init (#1529)
* c

* stop agents

* comment

* Sat Feb 10 02:33:30 PST 2018

* Sat Feb 10 02:33:39 PST 2018

* Update sample_batch.py

* Sun Feb 11 14:38:55 PST 2018

* add ppo config warn
2018-02-11 19:14:51 -08:00
eugenevinitsky 639df85fda updated multiagent docs (#1523)
* updated multiagent docs

* Update rllib.rst

* Update multiagent_mountaincar_env.py

* Update multiagent_pendulum_env.py
2018-02-11 16:35:03 -08:00
alvkao58 81a4be8f65 [rllib] Added vanilla policy gradient (#1497) 2018-02-10 13:54:51 -08:00
Eric Liang 0a9dbc84b5 Tue Feb 6 20:57:42 PST 2018 (#1521)
The test failure was unrelated
2018-02-06 23:11:31 -08:00
the-sea d0dd33e13c not share registered objects between _Regitry objects (#1508) 2018-02-03 15:03:52 -08:00
Eric Liang b948405532 [tune] clean up population based training prototype (#1478)
* patch up pbt

* Sat Jan 27 01:00:03 PST 2018

* Sat Jan 27 01:04:14 PST 2018

* Sat Jan 27 01:04:21 PST 2018

* Sat Jan 27 01:15:15 PST 2018

* Sat Jan 27 01:15:42 PST 2018

* Sat Jan 27 01:16:14 PST 2018

* Sat Jan 27 01:38:42 PST 2018

* Sat Jan 27 01:39:21 PST 2018

* add pbt

* Sat Jan 27 01:41:19 PST 2018

* Sat Jan 27 01:44:21 PST 2018

* Sat Jan 27 01:45:46 PST 2018

* Sat Jan 27 16:54:42 PST 2018

* Sat Jan 27 16:57:53 PST 2018

* clean up test

* Sat Jan 27 18:01:15 PST 2018

* Sat Jan 27 18:02:54 PST 2018

* Sat Jan 27 18:11:18 PST 2018

* Sat Jan 27 18:11:55 PST 2018

* Sat Jan 27 18:14:09 PST 2018

* review

* try out a ppo example

* some tweaks to ppo example

* add postprocess hook

* Sun Jan 28 15:00:40 PST 2018

* clean up custom explore fn

* Sun Jan 28 15:10:21 PST 2018

* Sun Jan 28 15:14:53 PST 2018

* Sun Jan 28 15:17:04 PST 2018

* Sun Jan 28 15:33:13 PST 2018

* Sun Jan 28 15:56:40 PST 2018

* Sun Jan 28 15:57:36 PST 2018

* Sun Jan 28 16:00:35 PST 2018

* Sun Jan 28 16:02:58 PST 2018

* Sun Jan 28 16:29:50 PST 2018

* Sun Jan 28 16:30:36 PST 2018

* Sun Jan 28 16:31:44 PST 2018

* improve tune doc

* concepts

* update humanoid

* Fri Feb  2 18:03:33 PST 2018

* fix example

* show error file
2018-02-02 23:03:12 -08:00
the-sea a936468f99 [tune] using None as the parameter default value instead of mutable dict (#1501)
* do not use dict as default parameter

* Update trial.py
2018-02-02 21:47:51 -08:00
eugenevinitsky 369773d3e8 [rllib] minor bug fix to shared model, model wasnt actually shared due to new scope (#1503) 2018-02-02 20:37:00 -08:00
Philipp Moritz 7550b628bf fix indentation for ES (#1484) 2018-01-31 17:22:38 -08:00
Eric Liang 35b1d6189b [tune] save error msg, cleanup after object checkpoints 2018-01-29 18:48:45 -08:00
Kaahan 7aa979a024 [tune] Added Population Based Training (#1355)
Adds a Population-Based Training (as described in https://arxiv.org/abs/1711.09846) scheduler to Ray.tune. Currently mutates hyperparameters according to either a user-defined list of possible values to mutate to (necessary if hyperparameters can only be certain values ex. sgd_batch_size), or by a factor of 0.8 or 1.2.
2018-01-25 21:38:37 -08:00
eugenevinitsky 0a01d3c71f [rllib] Mountaincar fix (#1472)
* Fix for gym version 0.9.5.

* fixed bug in reshaper that was causing discrete spaces to fail
2018-01-25 13:58:35 -08:00
Robert Nishihara f6c835e4b8 Fix for gym version 0.9.5. (#1471) 2018-01-25 13:58:15 -08:00
Eric Liang 173f1d629a [tune] Ray Tune API cleanup (#1454)
Remove rllib dep: trainable is now a standalone abstract class that can be easily subclassed.

Clean up hyperband: fix debug string and add an example.

Remove YAML api / ScriptRunner: this was never really used.

Move ray.init() out of run_experiments(): This provides greater flexibility and should be less confusing since there isn't an implicit init() done there. Note that this is a breaking API change for tune.
2018-01-24 16:55:17 -08:00
Eric Liang 1d2a28ab07 [rllib] test all combinations of {obs_space} x {action_space} (#1449) 2018-01-24 11:03:43 -08:00
Roy Fox 4b0ef5eb2c [rllib] Behavior Cloning (#1400)
* Behavior Cloning

* episode_reward_mean -> mean_loss

* removing vestigial code

* punctuation

* unnecessary

* Behavior Cloning

* Behavior Cloning

* Update __init__.py
2018-01-23 10:50:45 -08:00
Eric Liang ee36effd8e [rllib] Add n-step Q learning for DQN (#1439)
* n-step

* add sample adjustm

* Oops

* fix nstep

* metric adjustment

* Sat Jan 20 23:30:34 PST 2018

* Sun Jan 21 16:40:46 PST 2018

* Mon Jan 22 22:24:57 PST 2018
2018-01-23 10:31:19 -08:00
Eric Liang 424bd7f74d [rllib] improve custom env docs (#1447)
* env docs

* add env

* update env

* Fri Jan 19 18:55:34 PST 2018
2018-01-19 21:36:18 -08:00
Eric Liang e216766bbc [rllib] Update docs with api and components overview figures (#1443) 2018-01-19 10:08:45 -08:00
eugenevinitsky 37076a9ff8 Multiagent model using concatenated observations (#1416)
* working multi action distribution and multiagent model

* currently working but the splits arent done in the right place

* added shared models

* added categorical support and mountain car example

* now compatible with generalized advantage estimation

* working multiagent code with discrete and continuous example

* moved reshaper to utils

* code review changes made, ppo action placeholder moved to model catalog, all multiagent code moved out of fcnet

* added examples in

* added PEP8 compliance

* examples are mostly pep8 compliant

* removed all flake errors

* added examples to jenkins tests

* fixed custom options bug

* added lines to let docker file find multiagent tests

* shortened example run length

* corrected nits

* fixed flake errors
2018-01-18 19:51:31 -08:00
Peter Schafhalter 215d526e0d Load evaluation configuration from checkpoint (#1392) 2018-01-17 10:51:33 -08:00
Philipp Moritz 1290072764 [rllib] Expose PPO evaluator resource requirements (#1391) 2018-01-11 11:09:01 -08:00
Eric Liang c60ccbad46 [carla] [rllib] Add support for carla nav planner and scenarios from paper (#1382)
* wip

* Sat Dec 30 15:07:28 PST 2017

* log video

* video doesn't work well

* scenario integration

* Sat Dec 30 17:30:22 PST 2017

* Sat Dec 30 17:31:05 PST 2017

* Sat Dec 30 17:31:32 PST 2017

* Sat Dec 30 17:32:16 PST 2017

* Sat Dec 30 17:34:11 PST 2017

* Sat Dec 30 17:34:50 PST 2017

* Sat Dec 30 17:35:34 PST 2017

* Sat Dec 30 17:38:49 PST 2017

* Sat Dec 30 17:40:39 PST 2017

* Sat Dec 30 17:43:00 PST 2017

* Sat Dec 30 17:43:04 PST 2017

* Sat Dec 30 17:45:56 PST 2017

* Sat Dec 30 17:46:26 PST 2017

* Sat Dec 30 17:47:02 PST 2017

* Sat Dec 30 17:51:53 PST 2017

* Sat Dec 30 17:52:54 PST 2017

* Sat Dec 30 17:56:43 PST 2017

* Sat Dec 30 18:27:07 PST 2017

* Sat Dec 30 18:27:52 PST 2017

* fix train

* Sat Dec 30 18:41:51 PST 2017

* Sat Dec 30 18:54:11 PST 2017

* Sat Dec 30 18:56:22 PST 2017

* Sat Dec 30 19:05:04 PST 2017

* Sat Dec 30 19:05:23 PST 2017

* Sat Dec 30 19:11:53 PST 2017

* Sat Dec 30 19:14:31 PST 2017

* Sat Dec 30 19:16:20 PST 2017

* Sat Dec 30 19:18:05 PST 2017

* Sat Dec 30 19:18:45 PST 2017

* Sat Dec 30 19:22:44 PST 2017

* Sat Dec 30 19:24:41 PST 2017

* Sat Dec 30 19:26:57 PST 2017

* Sat Dec 30 19:40:37 PST 2017

* wip models

* reward bonus

* test prep

* Sun Dec 31 18:45:25 PST 2017

* Sun Dec 31 18:58:28 PST 2017

* Sun Dec 31 18:59:34 PST 2017

* Sun Dec 31 19:03:33 PST 2017

* Sun Dec 31 19:05:05 PST 2017

* Sun Dec 31 19:09:25 PST 2017

* fix train

* kill

* add tuple preprocessor

* Sun Dec 31 20:38:33 PST 2017

* Sun Dec 31 22:51:24 PST 2017

* Sun Dec 31 23:14:13 PST 2017

* Sun Dec 31 23:16:04 PST 2017

* Mon Jan  1 00:08:35 PST 2018

* Mon Jan  1 00:10:48 PST 2018

* Mon Jan  1 01:08:31 PST 2018

* Mon Jan  1 14:45:44 PST 2018

* Mon Jan  1 14:54:56 PST 2018

* Mon Jan  1 17:29:29 PST 2018

* switch to euclidean dists

* Mon Jan  1 17:39:27 PST 2018

* Mon Jan  1 17:41:47 PST 2018

* Mon Jan  1 17:44:18 PST 2018

* Mon Jan  1 17:47:09 PST 2018

* Mon Jan  1 20:31:02 PST 2018

* Mon Jan  1 20:39:33 PST 2018

* Mon Jan  1 20:40:55 PST 2018

* Mon Jan  1 20:55:06 PST 2018

* Mon Jan  1 21:05:52 PST 2018

* fix env path

* merge richards fix

* fix hash

* Mon Jan  1 22:04:00 PST 2018

* Mon Jan  1 22:25:29 PST 2018

* Mon Jan  1 22:30:42 PST 2018

* simplified reward function

* add framestack

* add env configs

* simplify speed reward

* Tue Jan  2 17:36:15 PST 2018

* Tue Jan  2 17:49:16 PST 2018

* Tue Jan  2 18:10:38 PST 2018

* add lane keeping simple mode

* Tue Jan  2 20:25:26 PST 2018

* Tue Jan  2 20:30:30 PST 2018

* Tue Jan  2 20:33:26 PST 2018

* Tue Jan  2 20:41:42 PST 2018

* ppo lane keep

* simplify discrete actions

* Tue Jan  2 21:41:05 PST 2018

* Tue Jan  2 21:49:03 PST 2018

* Tue Jan  2 22:12:23 PST 2018

* Tue Jan  2 22:14:42 PST 2018

* Tue Jan  2 22:20:59 PST 2018

* Tue Jan  2 22:23:43 PST 2018

* Tue Jan  2 22:26:27 PST 2018

* Tue Jan  2 22:27:20 PST 2018

* Tue Jan  2 22:44:00 PST 2018

* Tue Jan  2 22:57:58 PST 2018

* Tue Jan  2 23:08:51 PST 2018

* Tue Jan  2 23:11:32 PST 2018

* update dqn reward

* Thu Jan  4 12:29:40 PST 2018

* Thu Jan  4 12:30:26 PST 2018

* Update train_dqn.py

* fix
2018-01-05 21:32:41 -08:00
Eric Liang 6e6674a824 [rllib] Split docs into user and development guide (#1377)
* docs

* Update README.rst

* Sat Dec 30 15:23:49 PST 2017

* comments

* Sun Dec 31 23:33:30 PST 2017

* Sun Dec 31 23:33:38 PST 2017

* Sun Dec 31 23:37:46 PST 2017

* Sun Dec 31 23:39:28 PST 2017

* Sun Dec 31 23:43:05 PST 2017

* Sun Dec 31 23:51:55 PST 2017

* Sun Dec 31 23:52:51 PST 2017
2018-01-01 11:10:44 -08:00
Richard Liaw 3304099cc4 [rllib] Evaluators and Optimizers Refactoring (#1339) 2017-12-30 00:24:54 -08:00
Eric Liang 22c7c87e14 [rllib] [tune] Custom preprocessors and models, various fixes (#1372) 2017-12-28 13:19:04 -08:00
Richard Liaw 4bb5b6bd5b [rllib] A3C Configurations (#1370)
* initial introduction of a3c configs

* fix sample batch

* flake but need to check save

* save,resotre

* fix

* pickles

* entropy

* fix

* moving ppo

* results

* jenkins
2017-12-24 12:25:13 -08:00
Richard Liaw b217a5ef14 [rllib] Fix Pong-PPO tuned example Config (#1369) 2017-12-23 01:36:33 -08:00
Cathy Wu 772527caa4 [rllib] Support 1-dimensional action spaces (PPO) (#1347)
* Small fix for supporting custom preprocessors

* PEP8

* Remove squeeze from actions
2017-12-19 14:17:06 -08:00
Eric Liang 6724f57b03 [Examples] Add Carla test env (#1343)
* add carla example

* add reward

* set obs

* Sun Dec 17 16:06:00 PST 2017

* add spec

* fix measurement

* add train script

* resize to 80x80

* null

* initial small training run

* robustify env, clean up action space

* clean up vars

* switch to town2 which is faster

* tunify train.py

* add discrete mode

* update

* fix excessive brakinG

* fix the weather

* rename

* redirect output and from future import

* doc

* update

* fix rebase

* allow dqn gpu growht

* adjust dqn hyperparams

* better ppo parameters
2017-12-19 12:57:58 -08:00
Eric Liang 47b1f02d3e [rllib] Pull out multi-gpu optimizer as a generic class (#1313) 2017-12-17 15:59:57 -08:00
Cathy Wu 53e736fe01 [rllib] Small fix for supporting custom preprocessors (#1334)
* Small fix for supporting custom preprocessors

* PEP8

* fix test
2017-12-17 04:37:29 -08:00
Eric Liang fbf1806b8a [tune] Clean up result logging: move out of /tmp, add timestamp (#1297) 2017-12-15 14:19:08 -08:00
Richard Liaw c5c83a4465 [rllib] PPO and A3C unification (#1253) 2017-12-14 01:08:23 -08:00
Richard Liaw cabbd27c56 [rllib] Support Nested Configuration Merging (#1268) 2017-12-13 14:39:01 -08:00
Peter Schafhalter 20d6b74aa6 [rllib] Added evaluation script to RLLib (#1295) 2017-12-11 11:59:44 -08:00
Zongheng Yang 7e4a28f933 [rllib] Add tuned_examples/pong-ppo.yaml (#1302)
* Add tuned_examples/pong-ppo.yaml: 21 rew in ~3380sec

* Header comments
2017-12-09 01:20:22 -08:00
Richard Liaw 2e0eb0e4c7 [rllib] Adding dependencies (#1298) 2017-12-08 01:57:19 -08:00
Philipp Moritz 26125e1547 Fixing the jenkins tests (#1299)
* trying to fix jenkins tests

* comment out more tests

* remove pytorch stuff

* use non-monotonic clock (monotonic not supported on python 2.7)

* whitespace
2017-12-07 17:03:58 -08:00
Eric Liang 35f7398666 [rllib] Update RLlib docs and README (#1288)
Updates the rllib docs and README.
2017-12-06 18:17:51 -08:00