Commit Graph

723 Commits

Author SHA1 Message Date
Eric Liang e56eb354eb [tune] Remove hack to serve pin requests off thread (#2680)
* nopin

* fix
2018-08-18 13:19:52 -07:00
Wang Qing 06a58016d8 [multi-language part 2] Change the command line arguments to start raylet (#2670) 2018-08-16 21:59:44 -07:00
Eric Liang 6670880f03 [rllib] Workaround actor creation hang edge case for ape-X (#2661)
* apex hang

* fix

* move pyt to end
2018-08-16 18:03:50 -07:00
Eric Liang 5f430da180 [rllib] Provide internal access to episode state in compute_actions() and allow returning extra batches (#2559)
The goal of this PR is to allow custom policies to perform model-based rollouts. In the multi-agent setting, this requires access to not only policies of other agents, but also their current observations.
Also, you might want to return the model-based trajectories as part of the rollout for efficiency.

  compute_actions() now takes a new keyword arg episodes
  pull out internal episode class into a top-level file
  add function to return extra trajectories from an episode that will be appended to the sample batch
  documentation
2018-08-16 14:37:21 -07:00
Eric Liang 127cf291a3 Delete __init__.py (#2668) 2018-08-16 02:01:21 -07:00
Eric Liang 079c4e482a ray exec and ray attach commands (#2560)
ray exec CLUSTER CMD [--screen] [--start] [--stop]
ray attach CLUSTER [--start]

Example:
ray exec sgd.yaml 'source activate tensorflow_p27 && cd ~/ray/python/ray/rllib && ./train.py --run=PPO --env=CartPole-v0' --screen --start --stop

This will in one command create a cluster and run the command on it in a screen session. The screen can later be attached to via ray attach. After the command finishes, the cluster workers will be terminated and the head node stopped.
2018-08-15 14:31:50 -07:00
Eric Liang 53f9755594 [rllib] Fix support for mixed discrete and continuous action spaces, add to regression test (#2655)
* fix

* lint

* fix
2018-08-15 10:19:41 -07:00
Yuhong Guo eeb15771ba Add ray.internal.free (#2542) 2018-08-14 22:01:23 -07:00
Mitar 493585574a Updating documentation. (#2643) 2018-08-13 19:18:12 -07:00
efang96 baba624373 updated agent.compute_action to return rnn state (#2581)
* updated agent.compute_action to return rnn state

* updated compute_action method, added case for state=None

* fixing lint
2018-08-13 18:04:42 -07:00
Mitar 8769b8ac32 Fixing docstring. (#2638) 2018-08-13 16:19:32 -07:00
Eric Liang 9559873d13 [rllib] tuple space shouldn't assume elements are all the same size (#2637)
* fix

* lint
2018-08-11 10:57:40 -07:00
Peter Schafhalter 230b9ab33b [asv] Add benchmark for ray.wait (#2625)
* Add benchmarks for ray.wait

* Fix bug
2018-08-10 17:52:36 -07:00
Jones Wong 007208d2bb Support older version TF and Support RMSProp in Impala (#2590)
to support TF version < 1.5
to support rmsprop optimizer in Impala

Before TF1.5, tf.reduce_sum() and tf.reduce_max() has an argument keep_dims which has been renamed as keepdims in later versions.

In the original paper of Impala, they use rmsprop algorithm to optimize the model. We'd better also support it so that users can reproduce their experiments. Without any tuning, say that using the same hyper-parameters as AdamOptimizer, it reaches "episode_reward_mean": 19.083333333333332 in Pong after consume 3,610,350 samples.
2018-08-09 19:51:32 -07:00
Melih Elibol 8ae82180b4 [xray] Adds a driver table. (#2289)
This PR adds a driver table for the new GCS, which enables cleanup functionality associated with monitoring driver death.

Some testing in `monitor_test.py` is restored, but redis sharding for xray is needed to enable remaining tests.
2018-08-08 23:41:40 -07:00
Eric Liang 64053278aa [tune] Support lambda functions in hyperparameters / tune rllib multiagent support (#2568)
* update

* func

* Update registry.py

* revert
2018-08-07 16:29:21 -07:00
Richard Liaw bb44456f6f [rllib, tune] TrainingResult -> Dict, Removes C408 from flake8 (#2565) 2018-08-07 12:17:44 -07:00
Philipp Moritz a3202f581c [xray] Add flag to start raylet in valgrind (#2582) 2018-08-07 11:25:21 -07:00
Yuhong Guo 9825da7233 Change training tasks to xray for Jenkins tests (#2567) 2018-08-06 13:35:26 -07:00
Eric Liang 981d9818c1 [rllib] Support the timesteps_per_batch in simple optimizer PPO mode (#2558)
* support ts

* doc

* Update sync_samples_optimizer.py
2018-08-06 12:10:59 -07:00
Richard Liaw 914a433e3f [tune] Split Search from Scheduling (#2452)
Introduces SearchAlgorithm concept, separate from schedulers in Tune. Moves HyperOpt under this concept.
2018-08-04 21:27:39 -07:00
Eric Liang 9449d07eca [rllib] Fix crash when setting horizon in multiagent
If a horizon is set, an env terminates without done=True.
2018-08-03 16:37:56 -07:00
Philipp Moritz d5dda1ebf2 copy all files when installing pyarrow (#2547) 2018-08-02 17:06:37 -07:00
Peter Schafhalter 7a5f25248e [rllib] Improve conv_filters documentation (#2540)
* Improve conv_filters documentation

* Update catalog.py

* Update catalog.py
2018-08-02 14:29:40 -07:00
Eric Liang f7ec292360 [rllib] Support agent.get_action in multiagent (#2543)
* support get action on policy id

* comment

* grammar fixes

* Update rllib-algorithms.rst
2018-08-02 13:35:53 -07:00
Yuhong Guo d2ebe4d9a3 Fix frequent failure of Jenkins CI. (#2490) 2018-08-02 10:28:28 -07:00
Eric Liang 9ea57c2a93 [rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) (#2504)
Rename AsyncSamplesOptimizer -> AsyncReplayOptimizer
  Add AsyncSamplesOptimizer that implements the IMPALA architecture
  integrate V-trace with a3c policy graph
  audit V-trace integration
  benchmark compare vs A3C and with V-trace on/off
PongNoFrameskip-v4 on IMPALA scaling from 16 to 128 workers, solving Pong in <10 min. For reference, solving this env takes ~40 minutes for Ape-X and several hours for A3C.
2018-08-01 20:53:53 -07:00
Eric Liang 9a479b3a63 [rllib] Document creating an ensemble of envs; also add vector_index attribute to env config (#2513)
This also removes the async resetting code in VectorEnv. While that improves benchmark performance slightly, it substantially complicates env configuration and probably isn't worth it for most envs.

This makes it easy to efficiently support setups like Joint PPO: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/retro-contest/gotta_learn_fast_report.pdf

For example, for 188 envs, you could do something like num_envs: 10, num_envs_per_worker: 19.
2018-08-01 16:29:27 -07:00
Eric Liang a630e332f3 [rllib] Don't use get_gpu_ids() in ppo
This lets the num_gpus config work properly even when not using tune, since the gpu ids won't be set by ray in that case.
2018-08-01 16:25:11 -07:00
Eric Liang d9a36c4e39 [rllib] Document auto-concat in a3c (#2533)
* docs

* update hyperparm docs
2018-08-01 15:11:30 -07:00
Melih Elibol 89f60e39f3 Override user-specified name tag. (#2480)
Override user-specified name tag.
2018-08-01 14:16:57 -04:00
Robert Nishihara 909d7172b1 Introduce constant for ID_SIZE in python code. (#2517) 2018-07-31 12:40:53 -07:00
Eric Liang 38d00986a5 [rllib] Cleanups: deep merge configs properly; enforce min iter time on APEX (#2500)
The dict merge prevents crashes when tune is trying to get resource requests for agents and you override a config subkey. The min iter time prevents iterations from getting too small, incurring high overhead. This is easy to run into on Ape-X since throughput can get very high.
2018-07-30 13:25:35 -07:00
Eric Liang 62a52ee989 [rllib] Fix corner case in rnn episode handling
We should use episode ids instead of the timestep to determine when sequences should be cut, since when batches are concatenated, increasing t does not guarantee we are part of the same episode.
2018-07-30 13:24:43 -07:00
Philipp Moritz 696a229ece Fix text verbosity in python 2.7 by running tests with pytest (#2470) 2018-07-30 11:04:06 -07:00
Robert Nishihara 3f3514c2b3 Deprecate PYTHON_MODE more gracefully. (#2487) 2018-07-29 16:25:46 -07:00
Steve Severance f1b4ea69a3 Prevent hasher from running out of memory on large files (#2451)
* Prevent hasher from running out of memory on large files

* dump out keys

* only print if failed

* remove debugging

* Fix lint error. Reverse adding newline.
2018-07-28 23:29:09 -07:00
Eric Liang 90a3ea9443 [xray] Fix heartbeat subscription for autoscaler (#2498) 2018-07-28 13:34:55 -07:00
Peter Schafhalter e10377567c Add benchmark for ray.get (#2499) 2018-07-28 09:09:21 -07:00
Peter Schafhalter ccb9a27393 Add benchmarks for ray.put (#2489) 2018-07-27 17:49:21 -07:00
Peter Schafhalter 302510ada0 [asv] Add actor benchmarks (#2469)
* Add actor benchmarks

* Fix bug

* Address comments and refactor

* Update benchmark_actor.py
2018-07-27 17:40:02 -07:00
Eric Liang 24649726dc [rllib] Use batch.count in async samples optimizer (#2488)
Using the actual batch size reduces the risk of mis-accounting. Here, we under-counted samples since in truncate_episodes mode we were doubling the batch size by accident in policy_evaluator.
2018-07-27 16:44:21 -07:00
Peter Schafhalter 1e6b130b90 [asv] Add benchmarks for tasks (#2471)
* Add benchmarks for tasks

* Fixes

* Add timeout
2018-07-27 13:59:55 -07:00
Stephanie Wang 6675361684 [xray] Track ray.get calls as task dependencies (#2362) 2018-07-27 11:59:17 -07:00
Yuhong Guo 46351957bb Fix MAC_WHEELS=1 (#2477) 2018-07-25 14:57:28 -07:00
Richard Liaw 7edc677304 [rllib] Extra Changes for Usability (#2363) 2018-07-24 20:51:22 -07:00
Sergey Kolesnikov 05490b8cb9 [rllib] dqn/ddpg policy customization (#2445)
* dqn policy update - more customization

* docs for custom DQN graph

* Update rllib-training.rst

* Update rllib-models.rst

* Update rllib.rst

* Update rllib-training.rst

* Update rllib-concepts.rst

* yapf codestyle
2018-07-22 14:47:14 -07:00
Eric Liang 68660453e4 [rllib] Better support and add two-trainer example for multiagent (#2443)
This adds a simple DQN+PPO example for multi-agent. We don't do anything fancy here, just syncing weights between two separate trainers. This potentially is wasting some compute, but is very simple to set up.

It might be nice to share experience collection between the top-level trainers in the future.
2018-07-22 05:09:25 -07:00
Shuo 99d0d96aef Use different serialization context for each driver. (#2406) 2018-07-20 23:42:49 -07:00
Hao Chen 05f485e274 Allow Ray API to be used from multiple threads (#2422) 2018-07-20 15:39:01 -07:00