Eric Liang
27cd6ea401
[rllib] Flip sign of A2C, IMPALA entropy coefficient; raise DeprecationWarning if negative ( #4374 )
2019-03-17 18:07:37 -07:00
Eric Liang
a45019d98c
[rllib] Add option to proceed even if some workers crashed ( #4376 )
2019-03-16 13:34:09 -07:00
Stefan Pantic
2202a81773
Fix multi discrete ( #4338 )
...
* Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967 )" (#4332 )"
This reverts commit 3c41cb9b60 .
* Fix a bug with log rhos for vtrace
* Reformat
* lint
2019-03-12 20:32:11 -07:00
Eric Liang
3c41cb9b60
Revert "[wingman -> rllib] IMPALA MultiDiscrete changes ( #3967 )" ( #4332 )
...
This reverts commit 962b17f567 .
2019-03-11 22:51:26 -07:00
Stefan Pantic
36cbde651a
Add action space to model ( #4210 )
2019-03-09 19:23:12 -08:00
Eric Liang
c7f74dbdc7
[rllib] Add async remote workers ( #4253 )
2019-03-08 15:39:48 -08:00
Eric Liang
b0332551dd
[rllib] Fix APPO + continuous spaces, feed prev_rew/act to A3C properly ( #4286 )
2019-03-06 21:36:26 -08:00
Eric Liang
2781d74680
[rllib] Reserve CPUs for replay actors in apex ( #4217 )
2019-03-06 10:22:12 -08:00
Eric Liang
ba03048254
[rllib] TF model custom_loss() should actually allow access to full rollout data ( #4220 )
2019-03-02 22:57:51 -08:00
bjg2
962b17f567
[wingman -> rllib] IMPALA MultiDiscrete changes ( #3967 )
2019-03-01 19:47:06 -08:00
Eric Liang
b5799b5286
[rllib] Set PPO observation filter to NoFilter by default ( #4191 )
2019-03-01 13:19:33 -08:00
Eric Liang
d9da183c7d
[rllib] Custom supervised loss API ( #4083 )
2019-02-24 15:36:13 -08:00
Robert Nishihara
7b04ed059e
Move TensorFlowVariables to ray.experimental.tf_utils. ( #4145 )
2019-02-24 14:26:46 -08:00
Eric Liang
9896df7799
[rllib] Guard against PPO value function not training with RNN models ( #4037 )
...
* better lstm settings
* 1.0
* docs
* warn on truncate
* clarify
* Update ppo_policy_graph.py
* Update ppo_policy_graph.py
* Update ppo_policy_graph.py
2019-02-22 11:18:51 -08:00
Stefan Pantic
a54386e499
Added custom LSTM detection ( #4087 )
...
* Added autodetection of custom LSTM usage
* Reverted line separators
* Added check for LSTM
* Update vtrace_policy_graph.py
* Update appo_policy_graph.py
2019-02-21 21:07:48 -08:00
Jones Wong
acbe0b4e5f
Fix twin q bug ( #4108 )
2019-02-21 10:47:01 -08:00
Jones Wong
3ac8fd7ee8
Exploration with Parameter Space Noise ( #4048 )
...
* enable parameter space noise for exploration
* enable parameter space noise for exploration
* yapf formatted
* remove the usage of scipy softmax avialable in the latest version only
* enable subclass that has no parameter_noise in the config
* run user specified callbacks and test parameter space noise in multi node setting
* formatted by yapf
* Update dqn.py
* lint
2019-02-20 22:35:18 -08:00
mika
64c95aea85
[rllib] Update README.md for qmix ( #4101 )
...
## What do these changes do?
Fixed PyMARL repository path.
## Related issue number
N/A
2019-02-20 10:21:08 -08:00
Philipp Moritz
f51969964d
Fix linting on master ( #4077 )
2019-02-17 13:55:40 -08:00
Megan Kawakami
346885068c
[rllib] add torch pg ( #3857 )
...
* add torch pg
* add torch imports
* added torch pg
* working torch pg implementation
* add pg pytorch
* Update a3c.py
* Update a3c.py
* Update torch_policy_graph.py
* Update torch_policy_graph.py
2019-02-16 19:54:14 -08:00
Eric Liang
0c0bd4d41c
[rllib] Use model.value_function() in MARWIL ( #4036 )
...
* fix marwil
* add ph
* fix
2019-02-14 19:35:21 -08:00
Eric Liang
2dccf383dd
[rllib] Basic infrastructure for off-policy estimation (IS, WIS) ( #3941 )
2019-02-13 16:25:05 -08:00
bjg2
0e37ac6d1d
[wingman -> rllib] Remote and entangled environments ( #3968 )
...
* added all our environment changes
* fixed merge request comments and remote env
* fixed remote check
* moved remote_worker_envs to correct config section
* lint
* auto wrap impl
* fix
* fixed the tests
2019-02-13 10:08:26 -08:00
Eric Liang
8df772867c
[rllib] rename compute_apply to learn_on_batch
2019-02-11 15:22:15 -08:00
Eric Liang
29322c7389
[rllib] Replay buffer for IMPALA should default to 0 slots. ( #3971 )
...
* disable replay
* make lq configurable
* leak test
* Update run_multi_node_tests.sh
2019-02-08 10:03:11 -08:00
Michael Luo
1a015e420b
Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented ( #3934 )
2019-02-02 22:10:58 -08:00
Eric Liang
0f81bc9a33
[rllib] on_train_result results do not get logged ( #3865 )
2019-02-01 20:32:07 -08:00
Tianming Xu
1302fafc0b
[Tune] Add export_formats option to export policy graphs ( #3868 )
...
In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint.
For Ray Tune users, these APIs are not accessible through YAML configurations.
In this pull request, export_formats option is provided to enable users to choose the desired export format.
2019-01-31 17:07:27 -08:00
Eric Liang
152375aa8a
[rllib] Add evaluation option to DQN agent ( #3835 )
...
* add eval
* interval
* multiagent minor fix
* Update rllib.rst
* Update ddpg.py
* Update qmix.py
2019-01-29 21:19:53 -08:00
Eric Liang
fb73cedf70
[rllib] Add examples page, add hierarchical training example, delete SC2 examples ( #3815 )
...
* wip
* lint
* wip
* up
* wip
* update examples
* wip
* remove carla
* update
* improve envspec
* link to custom
* Update rllib-env.rst
* update
* fix
* fn
* lint
* ds
* ssd games
* desc
* fix up docs
* fix
2019-01-29 21:06:09 -08:00
Eric Liang
04ec47cbd4
[rllib] annotate public vs developer vs private APIs ( #3808 )
2019-01-23 21:27:26 -08:00
Eric Liang
aad48ee5a5
[tune] Fully deprecate raw function literals in Tune ( #3788 )
...
Related: https://github.com/ray-project/ray/issues/3785
2019-01-19 17:09:36 -08:00
Michael Luo
16f7ca45e4
Appo ( #3779 )
...
* Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder
* Deleted unneccesary vtrace.py file
* Update pong-impala.yaml
* Cleaned PPO Code
* Update pong-impala.yaml
* Update pong-impala.yaml
* wip
* new ifle
* refactor
* add vtrace off option
* revert
* support any space
* docs
* fix comment
* remove kl
* Update cartpole-appo-vtrace.yaml
2019-01-18 13:40:26 -08:00
Jones Wong
319c1340cb
[rllib] Develop MARWIL ( #3635 )
...
* add marvil policy graph
* fix typo
* add offline optimizer and enable running marwil
* fix loss function
* add maintaining the moving average of advantage norm
* use sync replay optimizer for unifying
* remove offline optimizer and use sync replay optimizer
* format by yapf
* add imitation learning objective
* fix according to eric's review
* format by yapf
* revise
* add test data
* marwil
2019-01-16 19:00:43 -08:00
Eric Liang
401e656b95
[rllib] Sync filters at end of iteration not start; hierarchical docs ( #3769 )
2019-01-15 16:25:25 -08:00
Eric Liang
e78562b2e8
[rllib] Misc fixes: set lr for PG, better error message for LSTM/PPO, fix multi-agent/APEX ( #3697 )
...
* fix
* update test
* better error
* compute
* eps fix
* add get_policy() api
* Update agent.py
* better err msg
* fix
* pass in rew
2019-01-06 19:37:35 -08:00
Eric Liang
03fe760616
[rllib] Model self loss isn't included in all algorithms ( #3679 )
2019-01-04 22:30:35 -08:00
Eric Liang
7db1f3be2a
[tune] resume=False by default but print a tip to set resume="prompt" + jenkins fix ( #3681 )
2019-01-04 17:23:19 -08:00
Eric Liang
ca864faece
[rllib] Documentation for I/O API and multi-agent support / cleanup ( #3650 )
2019-01-03 15:15:36 +08:00
opherlieber
2177e2f410
[rllib] Agent: Allow unknown subkeys for custom_resources_per_worker ( #3639 )
...
* RLLib Agent: Allow unknown subkeys for custom_resources_per_worker
* Update agent.py
2019-01-03 14:19:59 +08:00
Eric Liang
47d36d7bd6
[rllib] Refactor pytorch custom model support ( #3634 )
2019-01-03 13:48:33 +08:00
Tianming Xu
b4f61dfd50
[rllib] Export policy model checkpoint ( #3637 )
...
* Export policy model checkpoint
* update comment
2018-12-27 08:43:06 +09:00
Tianming Xu
deb26b954e
[rllib] Export tensorflow model of policy graph ( #3585 )
...
* Export tensorflow model of policy graph
* Add tests,examples,pydocs and infer extra signatures from existing methods
* Add example usage in export_policy_model comment
* Fix lint error
* Fix lint error
* Fix lint error
2018-12-22 17:35:25 +09:00
Eric Liang
6bb1103930
[rllib] Avoid sample wastage with bad PPO configurations ( #3552 )
...
## What do these changes do?
Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases.
This pr:
- Estimates the number of sampling tasks needed to avoid over-sampling.
- Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior.
## Related issue number
Closes: https://github.com/ray-project/ray/issues/3549
2018-12-20 10:50:44 -08:00
Eric Liang
303883a3b6
[rllib] [rfc] add contrib module and guideline for merging ( #3565 )
...
This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.
2018-12-20 10:44:34 -08:00
adoda
cf0c4745f4
[rllib] support running older version tensorflow(version < 1.5.0) ( #3571 )
2018-12-19 20:27:24 -08:00
Eric Liang
ffa6ee3ec8
[rllib] streaming minibatching for IMPALA ( #3402 )
...
* mb impala
* fix
* paropt
* update
* cpu warn
* on cpu
* fix mb
* doc
* docs
* comment
* larger num
* early release
* remove grad clip
* only check loader count in multi gpu mode
* revert bad multigpu changes
* num sgd iter
* comment
* reuse optimizer
* add test
* par load test
* loosen test
* Update run_multi_node_tests.sh
* fix local mode
* Update agent.py
2018-12-19 02:23:29 -08:00
Alexey Tumanov
c4cba98c75
Remove deprecation warnings when running actor tests ( #3563 )
...
* remove deprecation warnings when running actor tests
* replacing logger.warn with logger.warning
* Update worker.py
* Update policy_client.py
* Update compression.py
2018-12-18 17:04:51 -08:00
Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) ( #3548 )
2018-12-18 10:40:01 -08:00
Eric Liang
32473cf22e
[rllib] Basic Offline Data IO API ( #3473 )
2018-12-12 13:57:48 -08:00