Commit Graph

14 Commits

Author SHA1 Message Date
Eric Liang 6848dfd179 [rllib] Replace ray.get() with ray_get_and_free() to optimize memory usage (#4586) 2019-04-17 20:30:03 -04:00
Eric Liang 6e7680bf21 [rllib] Clean up concepts documentation and policy optimizer creation (#4592) 2019-04-12 21:03:26 -07:00
Jones Wong fe7763e786 [rllib] replace the assertion in SyncReplayOptimizer by a warning (#4534) 2019-04-02 01:43:22 -07:00
Eric Liang 09b2961750 [rllib] Ensure stats are consistently reported across all algos (#4445) 2019-03-27 15:40:15 -07:00
Eric Liang 8df772867c [rllib] rename compute_apply to learn_on_batch 2019-02-11 15:22:15 -08:00
Eric Liang db0dee573e [rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548) 2018-12-18 10:40:01 -08:00
Eric Liang 8b5827b9da [rllib] Better document which methods are abstract and which ones are overrides (#3480) 2018-12-08 16:28:58 -08:00
Eric Liang d864f299d7 [rllib] fixes from dogfooding multi-agent (#3456)
auto wrap multi-agent dict and tuple spaces by keeping a policy -> preprocessor in the sampler
add some Q-learning debug stats
report min, max of custom metrics
better errors
2018-12-05 23:31:45 -08:00
Eric Liang f1c55497ce [rllib] Fix edge case in n-step calculation and non-apex replay prioritization (#2929)
* fix

* lint
2018-09-28 15:22:33 -07:00
Jones Wong 982cde664f [rllib] Add noisy network and distributional Q-learning to implement Rainbow (#2737)
*  add noisy network

*  distributional q-learning in dev

*  add distributional q-learning

*  validated rainbow module

*  add some comments

*  supply some comments

*  remove redundant argument to pass CI test

*  async replay optimizer does NOT need annealing beta

*  ignore rainbow specific arguments for DDPG and Apex

*  formatted by yapf

* Update dqn_policy_graph.py

* Update dqn_policy_graph.py
2018-08-25 14:17:14 -07:00
Eric Liang fbe6c59f72 [rllib] Misc fixes, A2C (#2679)
A bunch of minor rllib fixes:

pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
2018-08-20 15:28:03 -07:00
Eric Liang d01dc9e22d [rllib] format with yapf (#2427)
* initial yapf

* manual fix yapf bugs
2018-07-19 15:30:36 -07:00
Eric Liang 8aa56c12e6 [rllib] Document "v2" APIs (#2316)
* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* wip

* tf

* update

* fix

* cleanup

* cleanup

* spacing

* model

* fix

* dqn

* fix ddpg

* doc

* keep names

* update

* fix

* com

* docs

* clarify model outputs

* Update torch_policy_graph.py

* fix obs filter

* pass thru worker index

* fix

* rename

* vlad torch comments

* fix log action

* debug name

* fix lstm

* remove unused ddpg net

* remove conv net

* revert lstm

* wip

* wip

* cast

* wip

* works

* fix a3c

* works

* lstm util test

* doc

* clean up

* update

* fix lstm check

* move to end

* fix sphinx

* fix cmd

* remove bad doc

* envs

* vec

* doc prep

* models

* rl

* alg

* up

* clarify

* copy

* async sa

* fix

* comments

* fix a3c conf

* tune lstm

* fix reshape

* fix

* back to 16

* tuned a3c update

* update

* tuned

* optional

* merge

* wip

* fix up

* move pg class

* rename env

* wip

* update

* tip

* alg

* readme

* fix catalog

* readme

* doc

* context

* remove prep

* comma

* add env

* link to paper

* paper

* update

* rnn

* update

* wip

* clean up ev creation

* fix

* fix

* fix

* fix lint

* up

* no comma

* ma

* Update run_multi_node_tests.sh

* fix

* sphinx is stupid

* sphinx is stupid

* clarify torch graph

* no horizon

* fix config

* sb

* Update test_optimizers.py
2018-07-01 00:05:08 -07:00
Eric Liang 44f5f0520b [rllib] Rename optimizers for clarity (#2303)
* rename

* fix

* update

* mgpu

* Update a3c.py

* Update bc.py

* Update a3c.py

* Update test_optimizers.py

* Update a3c.py
2018-06-27 02:30:15 -07:00