Eric Liang
445bcb29b0
[hotfix] fix backward compat with older yaml libraries
2019-07-06 20:41:28 -07:00
Eric Liang
c15ed3ac55
[rllib] Shuffle RNN sequences in PPO as well ( #5129 )
...
* shuffle seq
* fix test
2019-07-06 20:40:49 -07:00
Brandon Bertelsen
c04b69902c
Updates for #5072 ( #5091 )
2019-07-06 16:05:50 -07:00
Aleksei Petrenko
09bde397c9
Multiagent experiment resume ( #5102 )
...
* Fixed problem with multiagent experiment resume
* Applied format script
* fix lint
2019-07-06 11:38:17 -07:00
Dušan Josipović
e9b88dcbed
[wingman -> tune] Add system performance tracking ( #4924 )
2019-07-06 00:57:35 -07:00
Eric Liang
34d054ff19
[rllib] ModelV2 API ( #4926 )
2019-07-03 15:59:47 -07:00
Kristian Hartikainen
9e0192bc0b
[tune] Change the log syncing behavior ( #4450 )
...
* Change the log syncing behavior
* fix up abstractions for syncer
* Finished checkpoint syncing
* Code
* Set of changes to get things running
* Fixes for log syncing
* Fix parts
* Lint and other fixes
* fix some test
* Remove extra parsing functionality
* some test fixes
* Fix up cloud syncing
* Another thing to do
* Fix up tests and local sync
Changes LogSync into a mixin, and adds tests for different
functionalities.
* Fix up tests, start on local migration
* fix distributed migrations
* comments
* formatting
* Better checkpoint directory handling
* fix tests
* fix tests
* fix click
* comments
* formatting comments
* formatting and comments
* sync function deprecations
* syncfunction
* Add documentation for Syncing and Uploading
* nit
* BaseSyncer as base for Mixin in edge case
* more docs
* clean up assertions
* validate
* nit
* Update test_cluster.py
* betterdoc
* Update tune-usage.rst
* cleanup
* nit
2019-07-02 20:46:00 -07:00
Philipp Moritz
bbe3e5b4ed
[rllib] Give error if sample_async is used with pytorch for A3C ( #5000 )
...
* give error if sample_async is used with pytorch
* update
* Update a3c.py
2019-06-25 22:06:35 -07:00
Eric Liang
aa5fc52e32
[rllib] Add QMIX mixer parameters to optimizer param list ( #5014 )
...
* add mixer params
* Update qmix_policy.py
2019-06-25 19:02:40 -07:00
Eric Liang
1d17125333
temp fix for build ( #5006 )
2019-06-20 18:07:44 -07:00
Eric Liang
fa1d4c9807
[rllib] Fix DDPG example ( #4973 )
2019-06-13 15:07:46 -07:00
Eric Liang
4f8e100fe0
fix ( #4950 )
2019-06-10 10:20:55 +08:00
Eric Liang
77689d1116
[rllib] Port remainder of algorithms to build_trainer() pattern ( #4920 )
2019-06-07 16:45:36 -07:00
Eric Liang
9e328fbe6f
[rllib] Add docs on how to use TF eager execution ( #4927 )
2019-06-07 16:42:37 -07:00
Eric Liang
7501ee51db
[rllib] Rename PolicyEvaluator => RolloutWorker ( #4820 )
2019-06-03 06:49:24 +08:00
Eric Liang
99eae05cf6
[tune] Disallow setting resources_per_trial when it is already configured ( #4880 )
...
* disallow it
* import fix
* fix example
* fix test
* fix tests
* Update mock.py
* fix
* make less convoluted
* fix tests
2019-06-03 06:47:39 +08:00
Eric Liang
665d081fe9
[rllib] Rough port of DQN to build_tf_policy() pattern ( #4823 )
2019-06-02 14:14:31 +08:00
Eric Liang
9aa1cd613d
[rllib] Allow Torch policies access to full action input dict in extra_action_out_fn ( #4894 )
...
* fix torch extra out
* preserve setitem
* fix docs
2019-06-01 16:58:49 +08:00
Eric Liang
1c073e92e4
[rllib] Fix documentation on custom policies ( #4910 )
...
* wip
* add docs
* lint
* todo sections
* fix doc
2019-06-01 16:13:21 +08:00
Eric Liang
3f4d37cd0e
[rllib] Fix Multidiscrete support ( #4869 )
2019-05-29 20:41:02 -07:00
Eric Liang
2dd0beb5bd
[rllib] Allow access to batches prior to postprocessing ( #4871 )
2019-05-29 18:17:14 -07:00
Philipp Moritz
64eb7b322c
Upgrade arrow to latest master ( #4858 )
2019-05-28 16:04:16 -07:00
Eric Liang
d7be5a5d36
[rllib] Fix error getting kl when simple_optimizer: True in multi-agent PPO
2019-05-27 17:24:45 -07:00
Eric Liang
a45c61e19b
[rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section ( #4821 )
...
* wip
* fix index
* fix bugs
* todo
* add imports
* note on get ph
* note on get ph
* rename to building custom algs
* add rnn state info
2019-05-27 14:17:32 -07:00
Eric Liang
7237ea70c4
[rllib] [RFC] Deprecate Python 2 / RLlib ( #4832 )
2019-05-25 10:45:26 -07:00
Eric Liang
02583a8598
[rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ ( #4819 )
...
This implements some of the renames proposed in #4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.
2019-05-20 16:46:05 -07:00
Eric Liang
6cb5b90bd6
[rllib] [RFC] Dynamic definition of loss functions and modularization support ( #4795 )
...
* dynamic graph
* wip
* clean up
* fix
* document trainer
* wip
* initialize the graph using a fake batch
* clean up dynamic init
* wip
* spelling
* use builder for ppo pol graph
* add ppo graph
* fix naming
* order
* docs
* set class name correctly
* add torch builder
* add custom model support in builder
* cleanup
* remove underscores
* fix py2 compat
* Update dynamic_tf_policy_graph.py
* Update tracking_dict.py
* wip
* rename
* debug level
* rename policy_graph -> policy in new classes
* fix test
* rename ppo tf policy
* port appo too
* forgot grads
* default policy optimizer
* make default config optional
* add config to optimizer
* use lr by default in optimizer
* update
* comments
* remove optimizer
* fix tuple actions support in dynamic tf graph
2019-05-18 00:23:11 -07:00
Eric Liang
3807fb505b
[rllib] TensorFlow 2 compatibility ( #4802 )
2019-05-16 22:12:07 -07:00
Eric Liang
7d5ef6d99c
[rllib] Support continuous action distributions in IMPALA/APPO ( #4771 )
2019-05-16 22:05:07 -07:00
Jones Wong
c5161a2c4d
[rllib] fix clip by value issue as TF upgraded ( #4697 )
...
* fix clip_by_value issue
* fix typo
2019-05-13 15:39:25 -07:00
Eric Liang
69352e3302
[rllib] Implement learn_on_batch() in torch policy graph
2019-05-12 21:29:58 -07:00
Eric Liang
351753aae5
[rllib] Remove dependency on TensorFlow ( #4764 )
...
* remove hard tf dep
* add test
* comment fix
* fix test
2019-05-10 20:36:18 -07:00
Jacob Beck
28496c8b50
[rllib] Qmix padding patch ( #4735 )
...
* Qmix padding patch
* Update qmix_policy_graph.py
* lint errors
* more linting
* Update qmix_policy_graph.py
2019-05-08 14:07:29 -07:00
Eric Liang
71b2dec3b4
[rllib] Fix bounds of space returned by preprocessor.observation_space ( #4736 )
2019-05-05 18:25:38 -07:00
Richard Liaw
f2faf5ce75
[tune] Contributor Guide and Design Page ( #4716 )
...
* Move setup script out
* some changes
* Finished Contributor guide
* some comments to the design
* move
* Apply suggestions from code review
Co-Authored-By: richardliaw <rliaw@berkeley.edu >
* sourcecode
* comments
2019-05-05 00:04:13 -07:00
Federico Fontana
78bb26286e
Replaced discontinued rnn_cell.BasicLSTMCell with rnn_cell.LSTMCell ( #4703 )
...
* Fixed bug in Dirichlet (#4440 )
* Replaced deprecated rnn_cell.BasicLSTMCell with rnn_cell.LSTMCell
2019-05-02 13:19:27 -07:00
Sam Toyer
663e92ab3f
[rllib] TD3/DDPG improvements and MuJoCo benchmarks ( #4694 )
...
* [rllib] Separate optimisers for DDPG actor & crit.
* [rllib] Better names for DDPG variables & options
Config changes:
- noise_scale -> exploration_ou_noise_scale
- exploration_theta -> exploration_ou_theta
- exploration_sigma -> exploration_ou_sigma
- act_noise -> exploration_gaussian_sigma
- noise_clip -> target_noise_clip
* [rllib] Make DDPG less class-y
Used functions to replace three classes with only an __init__ method & a
handful of unrelated attributes.
* [rllib] Refactor DDPG noise
* [rllib] Unify DDPG exploration annealing
Added option "exploration_should_anneal" to enable linear annealing of
exploration noise. By default this is off, for consistency with DDPG &
TD3 papers. Also renamed "exploration_final_eps" to
"exploration_final_scale" (that name seems to have been carried over
from DQN, and doesn't really make sense here). Finally, tried to rename
"eps" to "noise_scale" wherever possible.
2019-04-26 17:49:53 -07:00
Andrew
06c768823c
[rllib] train-eval loop implementation for rllib.Trainer class ( #4647 )
2019-04-21 12:08:04 -07:00
Vlad Firoiu
39a09fa457
Turn replay into a circular queue. ( #4667 )
2019-04-19 11:42:00 -07:00
Wang Qing
9d481cc2e6
[hotfix] Missing import breaks Travis builds
2019-04-18 23:12:44 -07:00
Eric Liang
5a562bbf12
[rllib] Fix num_gpus cast and raise error on large batch ( #4652 )
2019-04-18 15:23:29 -07:00
Eric Liang
6848dfd179
[rllib] Replace ray.get() with ray_get_and_free() to optimize memory usage ( #4586 )
2019-04-17 20:30:03 -04:00
Eric Liang
3fd9dea721
[rllib] Fix tune.run(Agent class) ( #4630 )
...
* update
* Update __init__.py
2019-04-15 09:12:23 -07:00
Vlad Firoiu
f600591468
Cast MultiCategorical num_outputs to int. ( #4629 )
2019-04-14 19:51:37 -07:00
Eric Liang
6e7680bf21
[rllib] Clean up concepts documentation and policy optimizer creation ( #4592 )
2019-04-12 21:03:26 -07:00
cfan
bb207a205b
[rllib] Support torch device and distributions. ( #4553 )
2019-04-12 11:39:14 -07:00
justinwyang
e88e706fcc
Enforce quoting style in Travis. ( #4589 )
2019-04-11 14:24:26 -07:00
Vlad Firoiu
74fd3d7e21
[rllib] Support prev_state/prev_action in rollout and fix multiagent ( #4565 )
...
* Cleaner and more correct treatment of agent states in rollout.py
* support lstm_use_prev_action_reward in rollout.py
* Linter.
* appease flake8
* Use _DUMMY_AGENT_ID instead of 0.
* All agents have a policy_agent_mapping.
Reset the mapping cache at the start of each episode.
* Update rollout.py
* Fix rollout.py for single-agent envs.
* Use agent_id, not policy_id.
2019-04-10 00:01:25 -07:00
Eric Liang
4f46d3e9bf
[rllib] Add multi-agent examples for hand-coded policy, centralized VF ( #4554 )
2019-04-09 00:36:49 -07:00
Jones Wong
da5a471485
[rllib] validate observation in NoPreprocessor ( #4546 )
2019-04-07 16:11:50 -07:00