Commit Graph

79 Commits

Author SHA1 Message Date
Sven Mika c524f86785 [RLlib] BC/MARWIL/recurrent nets minor cleanups and bug fixes. (#13064) 2020-12-27 09:46:03 -05:00
Sven Mika 99ae7bae05 [RLlib] JAXPolicy prep. PR #1. (#13077) 2020-12-26 20:14:18 -05:00
Michael Luo 4bcd475671 [RLlib] Improved Documentation for PPO, DDPG, and SAC (#12943) 2020-12-24 09:31:35 -05:00
Sven Mika d5604eaba3 [RLlib] Attention nets PyTorch support and cleanup (using traj. view API). (#12029) 2020-12-21 18:38:34 -08:00
roireshef ef95db51e1 [RLlib] Arbitrary input to value() when not using GAE (#12941) 2020-12-21 12:19:33 -05:00
Sven Mika b2bcab711d [RLlib] Attention Nets: tf (#12753) 2020-12-20 20:22:32 -05:00
Sven Mika e40b14d255 [RLlib] Batch-size for truncate_episode batch_mode should be confgurable in agent-steps (rather than env-steps), if needed. (#12420) 2020-12-08 16:41:45 -08:00
Sven Mika 99c81c6795 [RLlib] Attention Net prep PR #3. (#12450) 2020-12-07 13:08:17 +01:00
Sven Mika 19c8033df2 [RLlib] Fix most remaining RLlib algos for running with trajectory view API. (#12366)
* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* LINT and fixes.
MB-MPO and MAML not working yet.

* wip

* update

* update

* rmeove

* remove dep

* higher

* Update requirements_rllib.txt

* Update requirements_rllib.txt

* relpos

* no mbmpo

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-12-01 17:41:10 -08:00
Sven Mika 0df55a139c [RLlib] Attention Net prep PR #1: Smaller cleanups. (#12447)
* WIP.

* Fix.

* Fix.

* Fix.
2020-11-27 16:25:47 -08:00
Sven Mika 6475297bd3 [RLlib] Torch LR schedule not working. Fix and added test case. (#12396) 2020-11-26 13:14:11 +01:00
Sven Mika 592c161032 [RLlib] Issue 12118: LSTM prev-a/r should be separately configurable. Fix missing prev-a one-hot encoding. (#12397)
* WIP.

* Fix and LINT.
2020-11-25 11:27:46 -08:00
Sven Mika dab241dcc6 [RLlib] Fix inconsistency wrt batch size in SampleCollector (traj. view API). Makes DD-PPO work with traj. view API. (#12063) 2020-11-19 19:01:14 +01:00
Sven Mika 62c7ab5182 [RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). (#11747) 2020-11-12 16:27:34 +01:00
Sven Mika d9f1874e34 [RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00
Philsik Chang ede9347127 [rllib] Add torch_distributed_backend flag for DDPPO (#11362) (#11425) 2020-10-21 18:30:42 -07:00
Sven Mika c17169dc11 [RLlib] Fix all example scripts to run on GPUs. (#11105) 2020-10-02 23:07:44 +02:00
Sven Mika 36bda8432b [RLlib] Trajectory view API: Simple List Collector (on by default for PPO); LSTM-agnostic (#11056) 2020-10-01 16:57:10 +02:00
Michael Luo ba5a3ae9e2 Enable vtrace by default (#10962) 2020-09-22 22:18:21 -07:00
Sven Mika 805dad3bc4 [RLlib] SAC algo cleanup. (#10825) 2020-09-20 11:27:02 +02:00
Sumanth Ratna 9da7bdcc8e Use master for links to docs in source (#10866) 2020-09-19 00:30:45 -07:00
desktable 4ccfd07a61 [RLlib] Add docstrings for agents/dqn (#10710) 2020-09-15 12:37:07 +02:00
Sven Mika ef18893fb5 [RLlib] PPO, APPO, and DD-PPO code cleanup. (#10420) 2020-09-02 14:03:01 +02:00
Sven Mika e968b52cb7 [RLlib] Trajectory view API - 03 Fast LSTM + prev actions/rewards (#9950) 2020-08-21 12:35:16 +02:00
Sven Mika d14b501692 [RLlib] First attempt at cleaning up algo code in RLlib: PG. (#10115) 2020-08-20 17:05:57 +02:00
Sven Mika 2cbe29a7fa [RLlib] Curiosity minor fixes, do-overs, and testing. (#10143) 2020-08-19 17:49:50 +02:00
Chua Cheow Huan ea51e94729 [rllib] Learning rate schedule for DDPPO. (#10006)
* Get shared metrics, increment counter & set global vars for remote workers.

* Add unit test to test lr_schedule for DDPPO.

* Broadcast the local set of global vars to remote workers instead of independently setting the global vars on each rollout worker.
2020-08-15 00:51:45 -07:00
Barak Michener 8e76796fd0 ci: Redo format.sh --all script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00
Sven Mika 57690a3a9f [RLlib] Trajectory view API - 02 actual API scaffold (#9753) 2020-08-06 10:54:20 +02:00
Sven Mika b0b0463161 [RLlib] Trajectory View API (preparatory cleanup and enhancements). (#9678) 2020-07-29 21:15:09 +02:00
Sven Mika 935d8308fb [RLlib] Issue #9437 (PyTorch converts to CPU tensor, even if on GPU). (#9497) 2020-07-16 14:55:50 +02:00
Tanay Wakhare 3536d8e4b3 Masking error. With t*valid_mask, we get the error np.inf*0 = np.inf (#9407) 2020-07-12 22:59:35 +02:00
Sven Mika fcdf410ae1 [RLlib] Tf2.x native. (#8752) 2020-07-11 22:06:35 +02:00
Sven Mika 43043ee4d5 [RLlib] Tf2x preparation; part 2 (upgrading try_import_tf()). (#9136)
* WIP.

* Fixes.

* LINT.

* WIP.

* WIP.

* Fixes.

* Fixes.

* Fixes.

* Fixes.

* WIP.

* Fixes.

* Test

* Fix.

* Fixes and LINT.

* Fixes and LINT.

* LINT.
2020-06-30 10:13:20 +02:00
Sven Mika 5c6d5d4ab1 This PR fixes the currently broken lstm_use_prev_action_reward flag for default lstm models (model.use_lstm=True). (#8970) 2020-06-27 20:50:01 +02:00
Sven Mika 4fd8977eaf [RLlib] Minor cleanup in preparation to tf2.x support. (#9130)
* WIP.

* Fixes.

* LINT.

* Fixes.

* Fixes and LINT.

* WIP.
2020-06-25 19:01:32 +02:00
Sven Mika 7008902cff [RLlib] Minor rllib.utils cleanup. (#8932) 2020-06-16 08:52:20 +02:00
Sven Mika 4ed796a7d6 [RLlib] Add testing Policy.compute_single_action() for all agents. (#8903) 2020-06-13 17:51:50 +02:00
Sven Mika 2746fc0476 [RLlib] Auto-framework, retire use_pytorch in favor of framework=... (#8520) 2020-05-27 16:19:13 +02:00
Jan Blumenkamp d6f78f58dc Fix missing learning rate and entropy coeff schedule for torch PPO (#8572) 2020-05-23 10:54:18 -07:00
Eric Liang 9a83908c46 [rllib] Deprecate policy optimizers (#8345) 2020-05-21 10:16:18 -07:00
Eric Liang 7ce138a6dc [rllib] Support free_log_std in ModelV2 (#8380)
* update

* factor

* update

* fix test failures

* fix torch net
2020-05-12 10:14:05 -07:00
Eric Liang 9d012626e5 [rllib] Distributed exec workflow for impala (#8321) 2020-05-11 20:24:43 -07:00
Sven Mika 754290daad [RLlib] Add light-weight Trainer.compute_action() tests for all Algos. (#8356) 2020-05-08 16:31:31 +02:00
Sven Mika 5f278c6411 [RLlib] Examples folder restructuring (models) part 1 (#8353) 2020-05-08 08:20:18 +02:00
Sven Mika 166bb5d690 [RLlib] IMPALA PyTorch (#8287)
This PR adds an IMPALA PyTorch implementation.

- adds compilation tests for LSTM and w/o LSTM.
- adds learning test for CartPole.
2020-05-03 13:44:25 +02:00
Eric Liang baadbdf8d4 [rllib] Execute PPO using training workflow (#8206)
* wip

* add kl

* kl

* works now

* doc update

* reorg

* add ddppo

* add stats

* fix fetch

* comment

* fix learner stat regression

* test fixes

* fix test
2020-04-30 01:18:09 -07:00
Sven Mika bf25aee392 [RLlib] Deprecate all Model(v1) usage. (#8146)
Deprecate all Model(v1) usage.
2020-04-29 12:12:59 +02:00
Sven Mika eb91619175 Fix release 0.8.5 tests for PPO torch Breakout. (#8226) 2020-04-29 10:36:41 +02:00
Sven Mika 499ad5fbe4 [RLlib] PyTorch version of APPO. (#8120)
- Translate all vtrace functionality to torch and added torch to the framework_iterator-loop in all existing vtrace test cases.
- Add learning test cases for APPO torch (both w/ and w/o v-trace).
- Add quick compilation tests for APPO (tf and torch, v-trace and no v-trace).
2020-04-23 09:11:12 +02:00