wassname/ray - ray - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/ray.git synced 2026-06-28 12:28:10 +08:00

Author	SHA1	Message	Date
Sven Mika	99ae7bae05	[RLlib] JAXPolicy prep. PR #1 . (#13077 )	2020-12-26 20:14:18 -05:00
Sven Mika	99c81c6795	[RLlib] Attention Net prep PR #3 . (#12450 )	2020-12-07 13:08:17 +01:00
Sven Mika	fb318addcb	[RLlib] Curiosity exploration module: tf/tf2.x/tf-eager support. (#11945 )	2020-11-29 12:31:24 +01:00
Sven Mika	0df55a139c	[RLlib] Attention Net prep PR #1 : Smaller cleanups. (#12447 ) * WIP. * Fix. * Fix. * Fix.	2020-11-27 16:25:47 -08:00
Sven Mika	62c7ab5182	[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). (#11747 )	2020-11-12 16:27:34 +01:00
Sven Mika	291c172d83	[RLlib] Support Simplex action spaces for SAC (torch and tf). (#11909 )	2020-11-11 18:45:28 +01:00
dHannasch	8346dedc3a	Fix the linter failure. (#11755 )	2020-11-02 18:02:15 +01:00
bcahlit	26176ec570	[RLlib] Fix epsilon_greedy on nested_action_spaces only in pytorch (#11453 ) * [RLlib] Fix epsilon_greedy on nested_action_spaces only in pytorch * epsilon_greedy on Continuous action * formatt * Fix error * fix format * fix bug * increase speed * Update rllib/utils/exploration/epsilon_greedy.py * Update rllib/utils/exploration/epsilon_greedy.py * Update rllib/utils/exploration/epsilon_greedy.py Co-authored-by: Sven Mika <sven@anyscale.io>	2020-11-02 12:22:33 +01:00
Sven Mika	2aec77e305	[RLlib] Fix two test cases that only fail on Travis. (#11435 )	2020-10-16 13:53:30 -05:00
Sven Mika	414041c6dd	[RLlib] Do not create env on driver iff num_workers > 0. (#11307 )	2020-10-15 18:21:30 +02:00
Sven Mika	8ea1bc5ff9	[RLlib] Allow for more than 2^31 policy timesteps. (#11301 )	2020-10-12 13:49:11 -07:00
Sven Mika	199e5d0f75	[RLlib] Exploration class type annotations. (#11251 )	2020-10-07 21:59:14 +02:00
Sven Mika	ce96b03b07	[RLlib] MB-MPO cleanup (comments, docstrings, type annotations). (#11033 )	2020-10-06 20:28:16 +02:00
Sven Mika	c17169dc11	[RLlib] Fix all example scripts to run on GPUs. (#11105 )	2020-10-02 23:07:44 +02:00
Sven Mika	f91c455527	[RLlib] Curiosity documentation. (#11066 )	2020-09-29 09:39:22 +02:00
Thomas Lecat	504da45e69	fix(rllib): allow explore=False with tuple action distributions (#10443 )	2020-09-10 15:03:02 -07:00
Sven Mika	244aafdcf8	[RLlib] Curiosity enhancements. (#10373 )	2020-09-05 13:14:24 +02:00
Sven Mika	715ee8dfc9	[RLlib] Issue 10469: Callbacks should receive env idx ... (#10477 )	2020-09-03 17:27:05 +02:00
Eric Liang	deea1861ab	[rllib] Try fixing torch GPU and masking errors (#10168 )	2020-08-25 18:34:19 -07:00
Sven Mika	2cbe29a7fa	[RLlib] Curiosity minor fixes, do-overs, and testing. (#10143 )	2020-08-19 17:49:50 +02:00
Sven Mika	2256047876	[RLlib] Rename rllib.utils.types into typing to match built-in python module's name. (#10114 )	2020-08-15 13:24:22 +02:00
Tanay Wakhare	1826b29757	[RLlib] Curiosity (intrinsic motivation) Exploration module. (#9912 )	2020-08-13 20:14:16 +02:00
Barak Michener	8e76796fd0	ci: Redo `format.sh --all` script & backfill lint fixes (#9956 )	2020-08-07 16:49:49 -07:00
Michael Luo	4d7bd8c892	[RLlib] Implementation of "Model-based Meta Policy Optimization" (MB MPO) (#9409 )	2020-08-02 18:12:09 +02:00
Sven Mika	ff9c1dac88	[RLlib] Issue 9667 DDPG Torch bugs and enhancements. (#9680 )	2020-07-28 14:15:03 +02:00
Sven Mika	78dfed2683	[RLlib] Issue 8384: QMIX doesn't learn anything. (#9527 )	2020-07-17 12:14:34 +02:00
Sven Mika	935d8308fb	[RLlib] Issue #9437 (PyTorch converts to CPU tensor, even if on GPU). (#9497 )	2020-07-16 14:55:50 +02:00
Sven Mika	fcdf410ae1	[RLlib] Tf2.x native. (#8752 )	2020-07-11 22:06:35 +02:00
Sven Mika	01125b8fcf	[RLlib] DQN rainbow eager-mode (keras style NoisyLayer) (preparation for native tf2.x support). (#9304 )	2020-07-09 10:44:10 +02:00
Sven Mika	4da0e542d5	[RLlib] DDPG and SAC eager support (preparation for tf2.x) (#9204 )	2020-07-08 16:12:20 +02:00
Sven Mika	5b2a97597b	[RLlib] Retire `try_import_tree` (should be installed along with other requirements). (#9211 ) - Retire try_import_tree. - Stabilize test_supported_multi_agent.py.	2020-07-02 13:06:34 +02:00
Sven Mika	43043ee4d5	[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136 ) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT.	2020-06-30 10:13:20 +02:00
Sven Mika	4fd8977eaf	[RLlib] Minor cleanup in preparation to tf2.x support. (#9130 ) * WIP. * Fixes. * LINT. * Fixes. * Fixes and LINT. * WIP.	2020-06-25 19:01:32 +02:00
Eric Liang	831b2fe51d	[rllib] Set framework to tf by default and remove import checks; "Auto" option (#8748 ) * tf by default * Update rllib/agents/trainer.py Co-authored-by: Sven Mika <sven@anyscale.io> * remove it * fix * remove * fix * lint Co-authored-by: Sven Mika <sven@anyscale.io>	2020-06-08 23:04:50 -07:00
Sven Mika	2746fc0476	[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520 )	2020-05-27 16:19:13 +02:00
Sven Mika	6d196197bc	[RLlib] utils/spaces ... (#8608 )	2020-05-27 10:21:30 +02:00
Sven Mika	d7eaacb5fe	[RLlib] Issue 8319 DDPG (MA or num_envs_per_worker > 1) broken. (#8324 )	2020-05-08 08:26:32 +02:00
Sven Mika	5f278c6411	[RLlib] Examples folder restructuring (models) part 1 (#8353 )	2020-05-08 08:20:18 +02:00
Sven Mika	6c2b9a4cfa	[RLlib] Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304 ) Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)	2020-05-04 23:53:38 +02:00
Sven Mika	a00144f746	[RLlib] Fix issue 8135 (DDPG inf actions when using [-inf,inf] action space). (#8302 )	2020-05-04 22:27:30 +02:00
Sven Mika	b95e28faea	[RLlib] APEX_DDPG (PyTorch) test case and docs. (#8288 ) APEX_DDPG (PyTorch) test case and docs.	2020-05-04 09:36:27 +02:00
Sven Mika	166bb5d690	[RLlib] IMPALA PyTorch (#8287 ) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole.	2020-05-03 13:44:25 +02:00
Sven Mika	76e1a4df9e	Fix TD3 torch via GaussianNoise torch bug. (#8276 )	2020-05-02 08:12:21 +02:00
Sven Mika	1775e89f26	[RLlib] Remove TupleActions and support arbitrarily nested action spaces. (#8143 ) Deprecate TupleActions and support arbitrarily nested action spaces. Closes issue #8143.	2020-04-28 14:59:16 +02:00
Eric Liang	2298f6fb40	[rllib] Port DQN/Ape-X to training workflow api (#8077 )	2020-04-23 12:39:19 -07:00
Sven Mika	d0fab84e4d	[RLlib] DDPG PyTorch version. (#7953 ) The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib. This PR: - Depends on the re-factor PR for DDPG (Functional Algorithm API). - Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch) - Updates the documentation to reflect that DDPG and TD3 now support PyTorch. * Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf). * Fix GPU target model problem.	2020-04-16 10:20:01 +02:00
Sven Mika	428516056a	[RLlib] SAC Torch (incl. Atari learning) (#7984 ) * Policy-classes cleanup and torch/tf unification. - Make Policy abstract. - Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch). - Move some methods and vars to base Policy (from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more. * Fix `clip_action` import from Policy (should probably be moved into utils altogether). * - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy). - Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces). * Add `config` to c'tor call to TFPolicy. * Add missing `config` to c'tor call to TFPolicy in marvil_policy.py. * Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract). * Fix LINT errors in Policy classes. * Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py. * policy.py LINT errors. * Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases). * policy.py - Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented). - Fix docstring of `num_state_tensors`. * Make QMIX torch Policy a child of TorchPolicy (instead of Policy). * QMixPolicy add empty implementations of abstract Policy methods. * Store Policy's config in self.config in base Policy c'tor. * - Make only compute_actions in base Policy's an abstractmethod and provide pass implementation to all other methods if not defined. - Fix state_batches=None (most Policies don't have internal states). * Cartpole tf learning. * Cartpole tf AND torch learning (in ~ same ts). * Cartpole tf AND torch learning (in ~ same ts). 2 * Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3 * Cartpole tf AND torch learning (in ~ same ts). 4 * Cartpole tf AND torch learning (in ~ same ts). 5 * Cartpole tf AND torch learning (in ~ same ts). 6 * Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning. * WIP. * WIP. * SAC torch learning Pendulum. * WIP. * SAC torch and tf learning Pendulum and Cartpole after cleanup. * WIP. * LINT. * LINT. * SAC: Move policy.target_model to policy.device as well. * Fixes and cleanup. * Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default). * Fixes and LINT. * Fixes and LINT. * Fix and LINT. * WIP. * Test fixes and LINT. * Fixes and LINT. Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>	2020-04-15 13:25:16 +02:00
Sven Mika	1b31c11806	[RLlib] DDPG re-factor to fit into RLlib's functional algorithm builder API. (#7934 )	2020-04-09 14:04:21 -07:00
Sven Mika	22ccc43670	[RLlib] DQN torch version. (#7597 ) * Fix. * Rollback. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * Fix. * Fix. * Fix. * Fix. * WIP. * WIP. * Fix. * Test case fixes. * Test case fixes and LINT. * Test case fixes and LINT. * Rollback. * WIP. * WIP. * Test case fixes. * Fix. * Fix. * Fix. * Add regression test for DQN w/ param noise. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Comment * Regression test case. * WIP. * WIP. * LINT. * LINT. * WIP. * Fix. * Fix. * Fix. * LINT. * Fix (SAC does currently not support eager). * Fix. * WIP. * LINT. * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * WIP. * Fix. * LINT. * LINT. * Fix and LINT. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Fix. * Fix and LINT. * Update rllib/utils/exploration/exploration.py * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Fixes. * WIP. * LINT. * Fixes and LINT. * LINT and fixes. * LINT. * Move action_dist back into torch extra_action_out_fn and LINT. * Working SimpleQ learning cartpole on both torch AND tf. * Working Rainbow learning cartpole on tf. * Working Rainbow learning cartpole on tf. * WIP. * LINT. * LINT. * Update docs and add torch to APEX test. * LINT. * Fix. * LINT. * Fix. * Fix. * Fix and docstrings. * Fix broken RLlib tests in master. * Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier). * Fix error_outputs option in BAZEL for RLlib regression tests. * Fix. * Tune param-noise tests. * LINT. * Fix. * Fix. * test * test * test * Fix. * Fix. * WIP. * WIP. * WIP. * WIP. * LINT. * WIP. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-04-06 11:56:16 -07:00
Sven Mika	82c2d9faba	[RLlib] Fix broken RLlib tests in master. (#7894 )	2020-04-05 09:34:23 -07:00

1 2

66 Commits