Commit Graph

131 Commits

Author SHA1 Message Date
Eric Liang 27cd6ea401 [rllib] Flip sign of A2C, IMPALA entropy coefficient; raise DeprecationWarning if negative (#4374) 2019-03-17 18:07:37 -07:00
Eric Liang a45019d98c [rllib] Add option to proceed even if some workers crashed (#4376) 2019-03-16 13:34:09 -07:00
Stefan Pantic 2202a81773 Fix multi discrete (#4338)
* Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)"

This reverts commit 3c41cb9b60.

* Fix a bug with log rhos for vtrace

* Reformat

* lint
2019-03-12 20:32:11 -07:00
Eric Liang 3c41cb9b60 Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)
This reverts commit 962b17f567.
2019-03-11 22:51:26 -07:00
Stefan Pantic 36cbde651a Add action space to model (#4210) 2019-03-09 19:23:12 -08:00
Eric Liang c7f74dbdc7 [rllib] Add async remote workers (#4253) 2019-03-08 15:39:48 -08:00
Eric Liang b0332551dd [rllib] Fix APPO + continuous spaces, feed prev_rew/act to A3C properly (#4286) 2019-03-06 21:36:26 -08:00
Eric Liang 2781d74680 [rllib] Reserve CPUs for replay actors in apex (#4217) 2019-03-06 10:22:12 -08:00
Eric Liang ba03048254 [rllib] TF model custom_loss() should actually allow access to full rollout data (#4220) 2019-03-02 22:57:51 -08:00
bjg2 962b17f567 [wingman -> rllib] IMPALA MultiDiscrete changes (#3967) 2019-03-01 19:47:06 -08:00
Eric Liang b5799b5286 [rllib] Set PPO observation filter to NoFilter by default (#4191) 2019-03-01 13:19:33 -08:00
Eric Liang d9da183c7d [rllib] Custom supervised loss API (#4083) 2019-02-24 15:36:13 -08:00
Robert Nishihara 7b04ed059e Move TensorFlowVariables to ray.experimental.tf_utils. (#4145) 2019-02-24 14:26:46 -08:00
Eric Liang 9896df7799 [rllib] Guard against PPO value function not training with RNN models (#4037)
* better lstm settings

* 1.0

* docs

* warn on truncate

* clarify

* Update ppo_policy_graph.py

* Update ppo_policy_graph.py

* Update ppo_policy_graph.py
2019-02-22 11:18:51 -08:00
Stefan Pantic a54386e499 Added custom LSTM detection (#4087)
* Added autodetection of custom LSTM usage

* Reverted line separators

* Added check for LSTM

* Update vtrace_policy_graph.py

* Update appo_policy_graph.py
2019-02-21 21:07:48 -08:00
Jones Wong acbe0b4e5f Fix twin q bug (#4108) 2019-02-21 10:47:01 -08:00
Jones Wong 3ac8fd7ee8 Exploration with Parameter Space Noise (#4048)
*  enable parameter space noise for exploration

*  enable parameter space noise for exploration

*  yapf formatted

*  remove the usage of scipy softmax avialable in the latest version only

*  enable subclass that has no parameter_noise in the config

*  run user specified callbacks and test parameter space noise in multi node setting

*  formatted by yapf

* Update dqn.py

* lint
2019-02-20 22:35:18 -08:00
mika 64c95aea85 [rllib] Update README.md for qmix (#4101)
## What do these changes do?

Fixed PyMARL repository path.

## Related issue number

N/A
2019-02-20 10:21:08 -08:00
Philipp Moritz f51969964d Fix linting on master (#4077) 2019-02-17 13:55:40 -08:00
Megan Kawakami 346885068c [rllib] add torch pg (#3857)
* add torch pg

* add torch imports

* added torch pg

* working torch pg implementation

* add pg pytorch

* Update a3c.py

* Update a3c.py

* Update torch_policy_graph.py

* Update torch_policy_graph.py
2019-02-16 19:54:14 -08:00
Eric Liang 0c0bd4d41c [rllib] Use model.value_function() in MARWIL (#4036)
* fix marwil

* add ph

* fix
2019-02-14 19:35:21 -08:00
Eric Liang 2dccf383dd [rllib] Basic infrastructure for off-policy estimation (IS, WIS) (#3941) 2019-02-13 16:25:05 -08:00
bjg2 0e37ac6d1d [wingman -> rllib] Remote and entangled environments (#3968)
* added all our environment changes

* fixed merge request comments and remote env

* fixed remote check

* moved remote_worker_envs to correct config section

* lint

* auto wrap impl

* fix

* fixed the tests
2019-02-13 10:08:26 -08:00
Eric Liang 8df772867c [rllib] rename compute_apply to learn_on_batch 2019-02-11 15:22:15 -08:00
Eric Liang 29322c7389 [rllib] Replay buffer for IMPALA should default to 0 slots. (#3971)
* disable replay

* make lq configurable

* leak test

* Update run_multi_node_tests.sh
2019-02-08 10:03:11 -08:00
Michael Luo 1a015e420b Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented (#3934) 2019-02-02 22:10:58 -08:00
Eric Liang 0f81bc9a33 [rllib] on_train_result results do not get logged (#3865) 2019-02-01 20:32:07 -08:00
Tianming Xu 1302fafc0b [Tune] Add export_formats option to export policy graphs (#3868)
In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint.

For Ray Tune users, these APIs are not accessible through YAML configurations.

In this pull request, export_formats option is provided to enable users to choose the desired export format.
2019-01-31 17:07:27 -08:00
Eric Liang 152375aa8a [rllib] Add evaluation option to DQN agent (#3835)
* add eval

* interval

* multiagent minor fix

* Update rllib.rst

* Update ddpg.py

* Update qmix.py
2019-01-29 21:19:53 -08:00
Eric Liang fb73cedf70 [rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815)
* wip

* lint

* wip

* up

* wip

* update examples

* wip

* remove carla

* update

* improve envspec

* link to custom

* Update rllib-env.rst

* update

* fix

* fn

* lint

* ds

* ssd games

* desc

* fix up docs

* fix
2019-01-29 21:06:09 -08:00
Eric Liang 04ec47cbd4 [rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00
Eric Liang aad48ee5a5 [tune] Fully deprecate raw function literals in Tune (#3788)
Related: https://github.com/ray-project/ray/issues/3785
2019-01-19 17:09:36 -08:00
Michael Luo 16f7ca45e4 Appo (#3779)
* Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder

* Deleted unneccesary vtrace.py file

* Update pong-impala.yaml

* Cleaned PPO Code

* Update pong-impala.yaml

* Update pong-impala.yaml

* wip

* new ifle

* refactor

* add vtrace off option

* revert

* support any space

* docs

* fix comment

* remove kl

* Update cartpole-appo-vtrace.yaml
2019-01-18 13:40:26 -08:00
Jones Wong 319c1340cb [rllib] Develop MARWIL (#3635)
*  add marvil policy graph

*  fix typo

*  add offline optimizer and enable running marwil

*  fix loss function

*  add maintaining the moving average of advantage norm

*  use sync replay optimizer for unifying

*  remove offline optimizer and use sync replay optimizer

*  format by yapf

*  add imitation learning objective

*  fix according to eric's review

*  format by yapf

* revise

* add test data

* marwil
2019-01-16 19:00:43 -08:00
Eric Liang 401e656b95 [rllib] Sync filters at end of iteration not start; hierarchical docs (#3769) 2019-01-15 16:25:25 -08:00
Eric Liang e78562b2e8 [rllib] Misc fixes: set lr for PG, better error message for LSTM/PPO, fix multi-agent/APEX (#3697)
* fix

* update test

* better error

* compute

* eps fix

* add get_policy() api

* Update agent.py

* better err msg

* fix

* pass in rew
2019-01-06 19:37:35 -08:00
Eric Liang 03fe760616 [rllib] Model self loss isn't included in all algorithms (#3679) 2019-01-04 22:30:35 -08:00
Eric Liang 7db1f3be2a [tune] resume=False by default but print a tip to set resume="prompt" + jenkins fix (#3681) 2019-01-04 17:23:19 -08:00
Eric Liang ca864faece [rllib] Documentation for I/O API and multi-agent support / cleanup (#3650) 2019-01-03 15:15:36 +08:00
opherlieber 2177e2f410 [rllib] Agent: Allow unknown subkeys for custom_resources_per_worker (#3639)
* RLLib Agent: Allow unknown subkeys for custom_resources_per_worker

* Update agent.py
2019-01-03 14:19:59 +08:00
Eric Liang 47d36d7bd6 [rllib] Refactor pytorch custom model support (#3634) 2019-01-03 13:48:33 +08:00
Tianming Xu b4f61dfd50 [rllib] Export policy model checkpoint (#3637)
* Export policy model checkpoint

* update comment
2018-12-27 08:43:06 +09:00
Tianming Xu deb26b954e [rllib] Export tensorflow model of policy graph (#3585)
* Export tensorflow model of policy graph

* Add tests,examples,pydocs and infer extra signatures from existing methods

* Add example usage in export_policy_model comment

* Fix lint error

* Fix lint error

* Fix lint error
2018-12-22 17:35:25 +09:00
Eric Liang 6bb1103930 [rllib] Avoid sample wastage with bad PPO configurations (#3552)
## What do these changes do?

Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases.

This pr:
- Estimates the number of sampling tasks needed to avoid over-sampling.
- Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior.

## Related issue number

Closes: https://github.com/ray-project/ray/issues/3549
2018-12-20 10:50:44 -08:00
Eric Liang 303883a3b6 [rllib] [rfc] add contrib module and guideline for merging (#3565)
This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.
2018-12-20 10:44:34 -08:00
adoda cf0c4745f4 [rllib] support running older version tensorflow(version < 1.5.0) (#3571) 2018-12-19 20:27:24 -08:00
Eric Liang ffa6ee3ec8 [rllib] streaming minibatching for IMPALA (#3402)
* mb impala

* fix

* paropt

* update

* cpu warn

* on cpu

* fix mb

* doc

* docs

* comment

* larger num

* early release

* remove grad clip

* only check loader count in multi gpu mode

* revert bad multigpu changes

* num sgd iter

* comment

* reuse optimizer

* add test

* par load test

* loosen test

* Update run_multi_node_tests.sh

* fix local mode

* Update agent.py
2018-12-19 02:23:29 -08:00
Alexey Tumanov c4cba98c75 Remove deprecation warnings when running actor tests (#3563)
* remove deprecation warnings when running actor tests

* replacing logger.warn with logger.warning

* Update worker.py

* Update policy_client.py

* Update compression.py
2018-12-18 17:04:51 -08:00
Eric Liang db0dee573e [rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548) 2018-12-18 10:40:01 -08:00
Eric Liang 32473cf22e [rllib] Basic Offline Data IO API (#3473) 2018-12-12 13:57:48 -08:00