wassname/ray - ray - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/ray.git synced 2026-07-03 22:06:45 +08:00

Author	SHA1	Message	Date
Eric Liang	27cd6ea401	[rllib] Flip sign of A2C, IMPALA entropy coefficient; raise DeprecationWarning if negative (#4374 )	2019-03-17 18:07:37 -07:00
Eric Liang	a45019d98c	[rllib] Add option to proceed even if some workers crashed (#4376 )	2019-03-16 13:34:09 -07:00
Stefan Pantic	2202a81773	Fix multi discrete (#4338 ) * Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)" This reverts commit `3c41cb9b60`. * Fix a bug with log rhos for vtrace * Reformat * lint	2019-03-12 20:32:11 -07:00
Eric Liang	3c41cb9b60	Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967 )" (#4332 ) This reverts commit `962b17f567`.	2019-03-11 22:51:26 -07:00
Stefan Pantic	36cbde651a	Add action space to model (#4210 )	2019-03-09 19:23:12 -08:00
Eric Liang	c7f74dbdc7	[rllib] Add async remote workers (#4253 )	2019-03-08 15:39:48 -08:00
Eric Liang	b0332551dd	[rllib] Fix APPO + continuous spaces, feed prev_rew/act to A3C properly (#4286 )	2019-03-06 21:36:26 -08:00
Eric Liang	2781d74680	[rllib] Reserve CPUs for replay actors in apex (#4217 )	2019-03-06 10:22:12 -08:00
Eric Liang	ba03048254	[rllib] TF model custom_loss() should actually allow access to full rollout data (#4220 )	2019-03-02 22:57:51 -08:00
bjg2	962b17f567	[wingman -> rllib] IMPALA MultiDiscrete changes (#3967 )	2019-03-01 19:47:06 -08:00
Eric Liang	b5799b5286	[rllib] Set PPO observation filter to NoFilter by default (#4191 )	2019-03-01 13:19:33 -08:00
Eric Liang	d9da183c7d	[rllib] Custom supervised loss API (#4083 )	2019-02-24 15:36:13 -08:00
Robert Nishihara	7b04ed059e	Move TensorFlowVariables to ray.experimental.tf_utils. (#4145 )	2019-02-24 14:26:46 -08:00
Eric Liang	9896df7799	[rllib] Guard against PPO value function not training with RNN models (#4037 ) * better lstm settings * 1.0 * docs * warn on truncate * clarify * Update ppo_policy_graph.py * Update ppo_policy_graph.py * Update ppo_policy_graph.py	2019-02-22 11:18:51 -08:00
Stefan Pantic	a54386e499	Added custom LSTM detection (#4087 ) * Added autodetection of custom LSTM usage * Reverted line separators * Added check for LSTM * Update vtrace_policy_graph.py * Update appo_policy_graph.py	2019-02-21 21:07:48 -08:00
Jones Wong	acbe0b4e5f	Fix twin q bug (#4108 )	2019-02-21 10:47:01 -08:00
Jones Wong	3ac8fd7ee8	Exploration with Parameter Space Noise (#4048 ) * enable parameter space noise for exploration * enable parameter space noise for exploration * yapf formatted * remove the usage of scipy softmax avialable in the latest version only * enable subclass that has no parameter_noise in the config * run user specified callbacks and test parameter space noise in multi node setting * formatted by yapf * Update dqn.py * lint	2019-02-20 22:35:18 -08:00
mika	64c95aea85	[rllib] Update README.md for qmix (#4101 ) ## What do these changes do? Fixed PyMARL repository path. ## Related issue number N/A	2019-02-20 10:21:08 -08:00
Philipp Moritz	f51969964d	Fix linting on master (#4077 )	2019-02-17 13:55:40 -08:00
Megan Kawakami	346885068c	[rllib] add torch pg (#3857 ) * add torch pg * add torch imports * added torch pg * working torch pg implementation * add pg pytorch * Update a3c.py * Update a3c.py * Update torch_policy_graph.py * Update torch_policy_graph.py	2019-02-16 19:54:14 -08:00
Eric Liang	0c0bd4d41c	[rllib] Use model.value_function() in MARWIL (#4036 ) * fix marwil * add ph * fix	2019-02-14 19:35:21 -08:00
Eric Liang	2dccf383dd	[rllib] Basic infrastructure for off-policy estimation (IS, WIS) (#3941 )	2019-02-13 16:25:05 -08:00
bjg2	0e37ac6d1d	[wingman -> rllib] Remote and entangled environments (#3968 ) * added all our environment changes * fixed merge request comments and remote env * fixed remote check * moved remote_worker_envs to correct config section * lint * auto wrap impl * fix * fixed the tests	2019-02-13 10:08:26 -08:00
Eric Liang	8df772867c	[rllib] rename compute_apply to learn_on_batch	2019-02-11 15:22:15 -08:00
Eric Liang	29322c7389	[rllib] Replay buffer for IMPALA should default to 0 slots. (#3971 ) * disable replay * make lq configurable * leak test * Update run_multi_node_tests.sh	2019-02-08 10:03:11 -08:00
Michael Luo	1a015e420b	Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented (#3934 )	2019-02-02 22:10:58 -08:00
Eric Liang	0f81bc9a33	[rllib] on_train_result results do not get logged (#3865 )	2019-02-01 20:32:07 -08:00
Tianming Xu	1302fafc0b	[Tune] Add export_formats option to export policy graphs (#3868 ) In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint. For Ray Tune users, these APIs are not accessible through YAML configurations. In this pull request, export_formats option is provided to enable users to choose the desired export format.	2019-01-31 17:07:27 -08:00
Eric Liang	152375aa8a	[rllib] Add evaluation option to DQN agent (#3835 ) * add eval * interval * multiagent minor fix * Update rllib.rst * Update ddpg.py * Update qmix.py	2019-01-29 21:19:53 -08:00
Eric Liang	fb73cedf70	[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815 ) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix	2019-01-29 21:06:09 -08:00
Eric Liang	04ec47cbd4	[rllib] annotate public vs developer vs private APIs (#3808 )	2019-01-23 21:27:26 -08:00
Eric Liang	aad48ee5a5	[tune] Fully deprecate raw function literals in Tune (#3788 ) Related: https://github.com/ray-project/ray/issues/3785	2019-01-19 17:09:36 -08:00
Michael Luo	16f7ca45e4	Appo (#3779 ) * Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder * Deleted unneccesary vtrace.py file * Update pong-impala.yaml * Cleaned PPO Code * Update pong-impala.yaml * Update pong-impala.yaml * wip * new ifle * refactor * add vtrace off option * revert * support any space * docs * fix comment * remove kl * Update cartpole-appo-vtrace.yaml	2019-01-18 13:40:26 -08:00
Jones Wong	319c1340cb	[rllib] Develop MARWIL (#3635 ) * add marvil policy graph * fix typo * add offline optimizer and enable running marwil * fix loss function * add maintaining the moving average of advantage norm * use sync replay optimizer for unifying * remove offline optimizer and use sync replay optimizer * format by yapf * add imitation learning objective * fix according to eric's review * format by yapf * revise * add test data * marwil	2019-01-16 19:00:43 -08:00
Eric Liang	401e656b95	[rllib] Sync filters at end of iteration not start; hierarchical docs (#3769 )	2019-01-15 16:25:25 -08:00
Eric Liang	e78562b2e8	[rllib] Misc fixes: set lr for PG, better error message for LSTM/PPO, fix multi-agent/APEX (#3697 ) * fix * update test * better error * compute * eps fix * add get_policy() api * Update agent.py * better err msg * fix * pass in rew	2019-01-06 19:37:35 -08:00
Eric Liang	03fe760616	[rllib] Model self loss isn't included in all algorithms (#3679 )	2019-01-04 22:30:35 -08:00
Eric Liang	7db1f3be2a	[tune] resume=False by default but print a tip to set resume="prompt" + jenkins fix (#3681 )	2019-01-04 17:23:19 -08:00
Eric Liang	ca864faece	[rllib] Documentation for I/O API and multi-agent support / cleanup (#3650 )	2019-01-03 15:15:36 +08:00
opherlieber	2177e2f410	[rllib] Agent: Allow unknown subkeys for custom_resources_per_worker (#3639 ) * RLLib Agent: Allow unknown subkeys for custom_resources_per_worker * Update agent.py	2019-01-03 14:19:59 +08:00
Eric Liang	47d36d7bd6	[rllib] Refactor pytorch custom model support (#3634 )	2019-01-03 13:48:33 +08:00
Tianming Xu	b4f61dfd50	[rllib] Export policy model checkpoint (#3637 ) * Export policy model checkpoint * update comment	2018-12-27 08:43:06 +09:00
Tianming Xu	deb26b954e	[rllib] Export tensorflow model of policy graph (#3585 ) * Export tensorflow model of policy graph * Add tests,examples,pydocs and infer extra signatures from existing methods * Add example usage in export_policy_model comment * Fix lint error * Fix lint error * Fix lint error	2018-12-22 17:35:25 +09:00
Eric Liang	6bb1103930	[rllib] Avoid sample wastage with bad PPO configurations (#3552 ) ## What do these changes do? Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases. This pr: - Estimates the number of sampling tasks needed to avoid over-sampling. - Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior. ## Related issue number Closes: https://github.com/ray-project/ray/issues/3549	2018-12-20 10:50:44 -08:00
Eric Liang	303883a3b6	[rllib] [rfc] add contrib module and guideline for merging (#3565 ) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.	2018-12-20 10:44:34 -08:00
adoda	cf0c4745f4	[rllib] support running older version tensorflow(version < 1.5.0) (#3571 )	2018-12-19 20:27:24 -08:00
Eric Liang	ffa6ee3ec8	[rllib] streaming minibatching for IMPALA (#3402 ) * mb impala * fix * paropt * update * cpu warn * on cpu * fix mb * doc * docs * comment * larger num * early release * remove grad clip * only check loader count in multi gpu mode * revert bad multigpu changes * num sgd iter * comment * reuse optimizer * add test * par load test * loosen test * Update run_multi_node_tests.sh * fix local mode * Update agent.py	2018-12-19 02:23:29 -08:00
Alexey Tumanov	c4cba98c75	Remove deprecation warnings when running actor tests (#3563 ) * remove deprecation warnings when running actor tests * replacing logger.warn with logger.warning * Update worker.py * Update policy_client.py * Update compression.py	2018-12-18 17:04:51 -08:00
Eric Liang	db0dee573e	[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548 )	2018-12-18 10:40:01 -08:00
Eric Liang	32473cf22e	[rllib] Basic Offline Data IO API (#3473 )	2018-12-12 13:57:48 -08:00

1 2 3

131 Commits