Commit Graph

696 Commits

Author SHA1 Message Date
Eric Liang 9a479b3a63 [rllib] Document creating an ensemble of envs; also add vector_index attribute to env config (#2513)
This also removes the async resetting code in VectorEnv. While that improves benchmark performance slightly, it substantially complicates env configuration and probably isn't worth it for most envs.

This makes it easy to efficiently support setups like Joint PPO: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/retro-contest/gotta_learn_fast_report.pdf

For example, for 188 envs, you could do something like num_envs: 10, num_envs_per_worker: 19.
2018-08-01 16:29:27 -07:00
Eric Liang a630e332f3 [rllib] Don't use get_gpu_ids() in ppo
This lets the num_gpus config work properly even when not using tune, since the gpu ids won't be set by ray in that case.
2018-08-01 16:25:11 -07:00
Eric Liang d9a36c4e39 [rllib] Document auto-concat in a3c (#2533)
* docs

* update hyperparm docs
2018-08-01 15:11:30 -07:00
Melih Elibol 89f60e39f3 Override user-specified name tag. (#2480)
Override user-specified name tag.
2018-08-01 14:16:57 -04:00
Robert Nishihara 909d7172b1 Introduce constant for ID_SIZE in python code. (#2517) 2018-07-31 12:40:53 -07:00
Eric Liang 38d00986a5 [rllib] Cleanups: deep merge configs properly; enforce min iter time on APEX (#2500)
The dict merge prevents crashes when tune is trying to get resource requests for agents and you override a config subkey. The min iter time prevents iterations from getting too small, incurring high overhead. This is easy to run into on Ape-X since throughput can get very high.
2018-07-30 13:25:35 -07:00
Eric Liang 62a52ee989 [rllib] Fix corner case in rnn episode handling
We should use episode ids instead of the timestep to determine when sequences should be cut, since when batches are concatenated, increasing t does not guarantee we are part of the same episode.
2018-07-30 13:24:43 -07:00
Philipp Moritz 696a229ece Fix text verbosity in python 2.7 by running tests with pytest (#2470) 2018-07-30 11:04:06 -07:00
Robert Nishihara 3f3514c2b3 Deprecate PYTHON_MODE more gracefully. (#2487) 2018-07-29 16:25:46 -07:00
Steve Severance f1b4ea69a3 Prevent hasher from running out of memory on large files (#2451)
* Prevent hasher from running out of memory on large files

* dump out keys

* only print if failed

* remove debugging

* Fix lint error. Reverse adding newline.
2018-07-28 23:29:09 -07:00
Eric Liang 90a3ea9443 [xray] Fix heartbeat subscription for autoscaler (#2498) 2018-07-28 13:34:55 -07:00
Peter Schafhalter e10377567c Add benchmark for ray.get (#2499) 2018-07-28 09:09:21 -07:00
Peter Schafhalter ccb9a27393 Add benchmarks for ray.put (#2489) 2018-07-27 17:49:21 -07:00
Peter Schafhalter 302510ada0 [asv] Add actor benchmarks (#2469)
* Add actor benchmarks

* Fix bug

* Address comments and refactor

* Update benchmark_actor.py
2018-07-27 17:40:02 -07:00
Eric Liang 24649726dc [rllib] Use batch.count in async samples optimizer (#2488)
Using the actual batch size reduces the risk of mis-accounting. Here, we under-counted samples since in truncate_episodes mode we were doubling the batch size by accident in policy_evaluator.
2018-07-27 16:44:21 -07:00
Peter Schafhalter 1e6b130b90 [asv] Add benchmarks for tasks (#2471)
* Add benchmarks for tasks

* Fixes

* Add timeout
2018-07-27 13:59:55 -07:00
Stephanie Wang 6675361684 [xray] Track ray.get calls as task dependencies (#2362) 2018-07-27 11:59:17 -07:00
Yuhong Guo 46351957bb Fix MAC_WHEELS=1 (#2477) 2018-07-25 14:57:28 -07:00
Richard Liaw 7edc677304 [rllib] Extra Changes for Usability (#2363) 2018-07-24 20:51:22 -07:00
Sergey Kolesnikov 05490b8cb9 [rllib] dqn/ddpg policy customization (#2445)
* dqn policy update - more customization

* docs for custom DQN graph

* Update rllib-training.rst

* Update rllib-models.rst

* Update rllib.rst

* Update rllib-training.rst

* Update rllib-concepts.rst

* yapf codestyle
2018-07-22 14:47:14 -07:00
Eric Liang 68660453e4 [rllib] Better support and add two-trainer example for multiagent (#2443)
This adds a simple DQN+PPO example for multi-agent. We don't do anything fancy here, just syncing weights between two separate trainers. This potentially is wasting some compute, but is very simple to set up.

It might be nice to share experience collection between the top-level trainers in the future.
2018-07-22 05:09:25 -07:00
Shuo 99d0d96aef Use different serialization context for each driver. (#2406) 2018-07-20 23:42:49 -07:00
Hao Chen 05f485e274 Allow Ray API to be used from multiple threads (#2422) 2018-07-20 15:39:01 -07:00
Peter Schafhalter 400a3e5705 Add queue size and __len__ methods (#2432) 2018-07-19 17:04:42 -07:00
Peter Schafhalter 4225ac5081 Add benchmark using queue (#2431) 2018-07-19 16:43:22 -07:00
Eric Liang 8e75d150f7 [rllib] Apex crash when compress_observations: False (#2426)
We shouldn't try to decompress uncompressed data.

Also, fix resource requests for ddpg + GPU.
2018-07-19 15:58:09 -07:00
Eric Liang d01dc9e22d [rllib] format with yapf (#2427)
* initial yapf

* manual fix yapf bugs
2018-07-19 15:30:36 -07:00
Robert Nishihara 24eb140e07 Remove redundant reconstruct call. (#2421) 2018-07-19 11:22:02 -07:00
Robert Nishihara 991d0911d1 Move profile data flushing to background thread on workers. (#2415)
* Move profile data flushing to background thread on workers.

* Remove outdated comment.
2018-07-18 12:34:53 -07:00
Eric Liang f31a6ca965 [rllib] Count actual sample batch size instead of configured batch size in A3C. (#2399)
This fixes a metrics accounting bug where the sample count is not reported correctly.
2018-07-18 08:59:52 +02:00
Richard Liaw 8e8c733696 [tune] Fix Categorical Space + Add Keras Example (#2401)
Previously did not properly resolve categorical variables for HyperOpt.
2018-07-17 23:52:52 +02:00
Eric Liang 0cecf6b79c [rllib] Cleanup RNN support and make it work with multi-GPU optimizer (#2394)
Cleanup: TFPolicyGraph now automatically adds loss input entries for state_in_*, so that graph sub-classes don't need to worry about it.

Multi-GPU support:

Allow setting up model tower replicas with existing state input tensors

Truncate the per-device minibatch slices so that they are always a multiple of max_seq_len.
2018-07-17 06:55:46 +02:00
Peter Schafhalter f5c46c7765 Add queue data structures (#2261) 2018-07-16 16:26:20 -07:00
Hao Chen 8a3e180156 Move profiling code to a new file and fix thread safety (#2397) 2018-07-15 18:09:52 -07:00
Eric Liang 7865dbab84 [tune] Raise error if incorrect key used in config (#2400) 2018-07-15 00:25:19 +02:00
Eric Liang 62f84d2f07 [rllib] Restore TF soft placement config to fix multi-GPU optimizer (#2395) 2018-07-13 10:34:37 +02:00
Hao Chen d6af50785e move import_thread to a separate file (#2349)
* move import_thread to a separate file

* sort imports

* group imports regardless of `from`

* re-organize imoprts based on google style

* Update import_thread.py

* fix event_type names in profile statement

* unify duplicate code
2018-07-12 21:26:24 -07:00
Robert Nishihara 515da7721a Change ray.worker.cleanup -> ray.shutdown and improve API documentation. (#2374)
* Change ray.worker.cleanup -> ray.shutdown and improve API documentation.

* Deprecate ray.worker.cleanup() gracefully.

* Fix linting
2018-07-12 12:00:00 -07:00
Eric Liang b316afeb43 [rllib] Add debug info back to PPO and fix optimizer compatibility (#2366) 2018-07-12 19:22:46 +02:00
Richard Liaw 5188b1d080 [autoscaler] Bug for file mounts for tilde (#2382) 2018-07-12 19:18:47 +02:00
Richard Liaw 0048e77093 [rllib] RLlib CLI (#2375) 2018-07-12 19:12:04 +02:00
Richard Liaw 55d5e28872 [core] Better Actor Representation (#2369) 2018-07-09 11:20:21 -07:00
Richard Liaw 4d7da9f668 [rllib] Remove "Common", cleanup some code (#2348) 2018-07-08 13:03:53 -07:00
Robert Nishihara 35f4a3070c Update 0.4.0 to 0.5.0 in autoscaler and installation examples. (#2352) 2018-07-07 14:34:20 -07:00
Eric Liang d24f19fd1e [rllib] Fix stats collection and some docs bugs since the refactoring (#2361)
* fix

* fix pbt example

* fix

* fix

* single thread by default

* vec

* fix

* fix
2018-07-07 13:29:20 -07:00
Eric Liang 9a6e329325 [rllib] Move repeat field to asv script (#2367) 2018-07-07 12:10:06 -07:00
Richard Liaw e32aed8717 [rllib] more user-friendly Optimizer signature + compute_apply (#2335)
* Move signature of optimizers

* fix

* expose compute_apply for policy_graphs

* dictionaries and such

* test for multiagent
2018-07-07 12:08:49 -07:00
Robert Nishihara e3534c46df [xray] Re-enable some stress tests and convert stress_tests to pytest. (#2285)
* Fix one of the stress tests, fix ray.global_state.client_table when called early on.

* Re-enable testWait.

* Convert stress_tests.py to pytest.

* Fix
2018-07-06 23:21:00 -07:00
Robert Nishihara 3a972893ee Bump version to 0.5.0. (#2351) 2018-07-06 22:31:33 -07:00
Devin Petersohn 4185aaed10 Dataframe deprecation (#2353) 2018-07-06 00:16:22 -07:00