Commit Graph

465 Commits

Author SHA1 Message Date
Eric Liang 445bcb29b0 [hotfix] fix backward compat with older yaml libraries 2019-07-06 20:41:28 -07:00
Eric Liang c15ed3ac55 [rllib] Shuffle RNN sequences in PPO as well (#5129)
* shuffle seq

* fix test
2019-07-06 20:40:49 -07:00
Brandon Bertelsen c04b69902c Updates for #5072 (#5091) 2019-07-06 16:05:50 -07:00
Aleksei Petrenko 09bde397c9 Multiagent experiment resume (#5102)
* Fixed problem with multiagent experiment resume

* Applied format script

* fix lint
2019-07-06 11:38:17 -07:00
Dušan Josipović e9b88dcbed [wingman -> tune] Add system performance tracking (#4924) 2019-07-06 00:57:35 -07:00
Eric Liang 34d054ff19 [rllib] ModelV2 API (#4926) 2019-07-03 15:59:47 -07:00
Kristian Hartikainen 9e0192bc0b [tune] Change the log syncing behavior (#4450)
* Change the log syncing behavior

* fix up abstractions for syncer

* Finished checkpoint syncing

* Code

* Set of changes to get things running

* Fixes for log syncing

* Fix parts

* Lint and other fixes

* fix some test

* Remove extra parsing functionality

* some test fixes

* Fix up cloud syncing

* Another thing to do

* Fix up tests and local sync

Changes LogSync into a mixin, and adds tests for different
functionalities.

* Fix up tests, start on local migration

* fix distributed migrations

* comments

* formatting

* Better checkpoint directory handling

* fix tests

* fix tests

* fix click

* comments

* formatting comments

* formatting and comments

* sync function deprecations

* syncfunction

* Add documentation for Syncing and Uploading

* nit

* BaseSyncer as base for Mixin in edge case

* more docs

* clean up assertions

* validate

* nit

* Update test_cluster.py

* betterdoc

* Update tune-usage.rst

* cleanup

* nit
2019-07-02 20:46:00 -07:00
Philipp Moritz bbe3e5b4ed [rllib] Give error if sample_async is used with pytorch for A3C (#5000)
* give error if sample_async is used with pytorch

* update

* Update a3c.py
2019-06-25 22:06:35 -07:00
Eric Liang aa5fc52e32 [rllib] Add QMIX mixer parameters to optimizer param list (#5014)
* add mixer params

* Update qmix_policy.py
2019-06-25 19:02:40 -07:00
Eric Liang 1d17125333 temp fix for build (#5006) 2019-06-20 18:07:44 -07:00
Eric Liang fa1d4c9807 [rllib] Fix DDPG example (#4973) 2019-06-13 15:07:46 -07:00
Eric Liang 4f8e100fe0 fix (#4950) 2019-06-10 10:20:55 +08:00
Eric Liang 77689d1116 [rllib] Port remainder of algorithms to build_trainer() pattern (#4920) 2019-06-07 16:45:36 -07:00
Eric Liang 9e328fbe6f [rllib] Add docs on how to use TF eager execution (#4927) 2019-06-07 16:42:37 -07:00
Eric Liang 7501ee51db [rllib] Rename PolicyEvaluator => RolloutWorker (#4820) 2019-06-03 06:49:24 +08:00
Eric Liang 99eae05cf6 [tune] Disallow setting resources_per_trial when it is already configured (#4880)
* disallow it

* import fix

* fix example

* fix test

* fix tests

* Update mock.py

* fix

* make less convoluted

* fix tests
2019-06-03 06:47:39 +08:00
Eric Liang 665d081fe9 [rllib] Rough port of DQN to build_tf_policy() pattern (#4823) 2019-06-02 14:14:31 +08:00
Eric Liang 9aa1cd613d [rllib] Allow Torch policies access to full action input dict in extra_action_out_fn (#4894)
* fix torch extra out

* preserve setitem

* fix docs
2019-06-01 16:58:49 +08:00
Eric Liang 1c073e92e4 [rllib] Fix documentation on custom policies (#4910)
* wip

* add docs

* lint

* todo sections

* fix doc
2019-06-01 16:13:21 +08:00
Eric Liang 3f4d37cd0e [rllib] Fix Multidiscrete support (#4869) 2019-05-29 20:41:02 -07:00
Eric Liang 2dd0beb5bd [rllib] Allow access to batches prior to postprocessing (#4871) 2019-05-29 18:17:14 -07:00
Philipp Moritz 64eb7b322c Upgrade arrow to latest master (#4858) 2019-05-28 16:04:16 -07:00
Eric Liang d7be5a5d36 [rllib] Fix error getting kl when simple_optimizer: True in multi-agent PPO 2019-05-27 17:24:45 -07:00
Eric Liang a45c61e19b [rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (#4821)
* wip

* fix index

* fix bugs

* todo

* add imports

* note on get ph

* note on get ph

* rename to building custom algs

* add rnn state info
2019-05-27 14:17:32 -07:00
Eric Liang 7237ea70c4 [rllib] [RFC] Deprecate Python 2 / RLlib (#4832) 2019-05-25 10:45:26 -07:00
Eric Liang 02583a8598 [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (#4819)
This implements some of the renames proposed in #4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.
2019-05-20 16:46:05 -07:00
Eric Liang 6cb5b90bd6 [rllib] [RFC] Dynamic definition of loss functions and modularization support (#4795)
* dynamic graph

* wip

* clean up

* fix

* document trainer

* wip

* initialize the graph using a fake batch

* clean up dynamic init

* wip

* spelling

* use builder for ppo pol graph

* add ppo graph

* fix naming

* order

* docs

* set class name correctly

* add torch builder

* add custom model support in builder

* cleanup

* remove underscores

* fix py2 compat

* Update dynamic_tf_policy_graph.py

* Update tracking_dict.py

* wip

* rename

* debug level

* rename policy_graph -> policy in new classes

* fix test

* rename ppo tf policy

* port appo too

* forgot grads

* default policy optimizer

* make default config optional

* add config to optimizer

* use lr by default in optimizer

* update

* comments

* remove optimizer

* fix tuple actions support in dynamic tf graph
2019-05-18 00:23:11 -07:00
Eric Liang 3807fb505b [rllib] TensorFlow 2 compatibility (#4802) 2019-05-16 22:12:07 -07:00
Eric Liang 7d5ef6d99c [rllib] Support continuous action distributions in IMPALA/APPO (#4771) 2019-05-16 22:05:07 -07:00
Jones Wong c5161a2c4d [rllib] fix clip by value issue as TF upgraded (#4697)
*  fix clip_by_value issue

*  fix typo
2019-05-13 15:39:25 -07:00
Eric Liang 69352e3302 [rllib] Implement learn_on_batch() in torch policy graph 2019-05-12 21:29:58 -07:00
Eric Liang 351753aae5 [rllib] Remove dependency on TensorFlow (#4764)
* remove hard tf dep

* add test

* comment fix

* fix test
2019-05-10 20:36:18 -07:00
Jacob Beck 28496c8b50 [rllib] Qmix padding patch (#4735)
* Qmix padding patch

* Update qmix_policy_graph.py

* lint errors

* more linting

* Update qmix_policy_graph.py
2019-05-08 14:07:29 -07:00
Eric Liang 71b2dec3b4 [rllib] Fix bounds of space returned by preprocessor.observation_space (#4736) 2019-05-05 18:25:38 -07:00
Richard Liaw f2faf5ce75 [tune] Contributor Guide and Design Page (#4716)
* Move setup script out

* some changes

* Finished Contributor guide

* some comments to the design

* move

* Apply suggestions from code review

Co-Authored-By: richardliaw <rliaw@berkeley.edu>

* sourcecode

* comments
2019-05-05 00:04:13 -07:00
Federico Fontana 78bb26286e Replaced discontinued rnn_cell.BasicLSTMCell with rnn_cell.LSTMCell (#4703)
* Fixed bug in Dirichlet (#4440)

* Replaced deprecated rnn_cell.BasicLSTMCell with rnn_cell.LSTMCell
2019-05-02 13:19:27 -07:00
Sam Toyer 663e92ab3f [rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694)
* [rllib] Separate optimisers for DDPG actor & crit.

* [rllib] Better names for DDPG variables & options

Config changes:

- noise_scale -> exploration_ou_noise_scale
- exploration_theta -> exploration_ou_theta
- exploration_sigma -> exploration_ou_sigma
- act_noise -> exploration_gaussian_sigma
- noise_clip -> target_noise_clip

* [rllib] Make DDPG less class-y

Used functions to replace three classes with only an __init__ method & a
handful of unrelated attributes.

* [rllib] Refactor DDPG noise

* [rllib] Unify DDPG exploration annealing

Added option "exploration_should_anneal" to enable linear annealing of
exploration noise. By default this is off, for consistency with DDPG &
TD3 papers. Also renamed "exploration_final_eps" to
"exploration_final_scale" (that name seems to have been carried over
from DQN, and doesn't really make sense here). Finally, tried to rename
"eps" to "noise_scale" wherever possible.
2019-04-26 17:49:53 -07:00
Andrew 06c768823c [rllib] train-eval loop implementation for rllib.Trainer class (#4647) 2019-04-21 12:08:04 -07:00
Vlad Firoiu 39a09fa457 Turn replay into a circular queue. (#4667) 2019-04-19 11:42:00 -07:00
Wang Qing 9d481cc2e6 [hotfix] Missing import breaks Travis builds 2019-04-18 23:12:44 -07:00
Eric Liang 5a562bbf12 [rllib] Fix num_gpus cast and raise error on large batch (#4652) 2019-04-18 15:23:29 -07:00
Eric Liang 6848dfd179 [rllib] Replace ray.get() with ray_get_and_free() to optimize memory usage (#4586) 2019-04-17 20:30:03 -04:00
Eric Liang 3fd9dea721 [rllib] Fix tune.run(Agent class) (#4630)
* update

* Update __init__.py
2019-04-15 09:12:23 -07:00
Vlad Firoiu f600591468 Cast MultiCategorical num_outputs to int. (#4629) 2019-04-14 19:51:37 -07:00
Eric Liang 6e7680bf21 [rllib] Clean up concepts documentation and policy optimizer creation (#4592) 2019-04-12 21:03:26 -07:00
cfan bb207a205b [rllib] Support torch device and distributions. (#4553) 2019-04-12 11:39:14 -07:00
justinwyang e88e706fcc Enforce quoting style in Travis. (#4589) 2019-04-11 14:24:26 -07:00
Vlad Firoiu 74fd3d7e21 [rllib] Support prev_state/prev_action in rollout and fix multiagent (#4565)
* Cleaner and more correct treatment of agent states in rollout.py

* support lstm_use_prev_action_reward in rollout.py

* Linter.

* appease flake8

* Use _DUMMY_AGENT_ID instead of 0.

* All agents have a policy_agent_mapping.
Reset the mapping cache at the start of each episode.

* Update rollout.py

* Fix rollout.py for single-agent envs.

* Use agent_id, not policy_id.
2019-04-10 00:01:25 -07:00
Eric Liang 4f46d3e9bf [rllib] Add multi-agent examples for hand-coded policy, centralized VF (#4554) 2019-04-09 00:36:49 -07:00
Jones Wong da5a471485 [rllib] validate observation in NoPreprocessor (#4546) 2019-04-07 16:11:50 -07:00