Commit Graph

111 Commits

Author SHA1 Message Date
Richard Liaw 72542c9016 [tune] Fix Pausing and Error Propogation (#2815)
* add new tests

* Try-catch errors from ray get

* longer pbt run

* Update pbt_example.py

* Split trial and result and fix tests
2018-09-04 15:22:11 -07:00
Eric Liang df4788e501 [rllib/tune] Add test for fractional gpu support in xray mode; add rllib support for fractional gpu (#2768)
* frac gpu

* doc

* Update rllib-training.rst

* yapf

* remove xray
2018-09-03 11:12:23 -07:00
wangyiguang 3813ae34b3 [tune] Add AutoMLBoard: Monitoring UI (experimental) (#2574) 2018-08-31 00:26:44 -07:00
Richard Liaw 0347e6418b [tune] Add PyTorch MNIST Example + Misc. Tweaks (#2708) 2018-08-30 16:18:56 -07:00
Praveen Palanisamy 357c0d6156 [tune] Adds option to checkpoint at end of trials (#2754)
* Added checkpoint_at_end option. To fix #2740

* Added ability to checkpoint at the end of trials if the option is set to True

* checkpoint_at_end option added; Consistent with Experience and Trial runner

* checkpoint_at_end option mentioned in the tune usage guide

* Moved the redundant checkpoint criteria check out of the if-elif

* Added note that checkpoint_at_end is enabled only when checkpoint_freq is not 0

* Added test case for checkpoint_at_end

* Made checkpoint_at_end have an effect regardless of checkpoint_freq

* Removed comment from the test case

* Fixed the indentation

* Fixed pep8 E231

* Handled cases when trainable does not have _save implemented

* Constrained test case to a particular exp using the MockAgent

* Revert "Constrained test case to a particular exp using the MockAgent"

This reverts commit e965a9358ec7859b99a3aabb681286d6ba3c3906.

* Revert "Handled cases when trainable does not have _save implemented"

This reverts commit 0f5382f996ff0cbf3d054742db866c33494d173a.

* Simpler test case for checkpoint_at_end

* Preserved bools from loosing their actual value

* Revert "Moved the redundant checkpoint criteria check out of the if-elif"

This reverts commit 783005122902240b0ee177e9e206e397356af9c5.

* Fix linting error.
2018-08-29 13:14:17 -07:00
Eric Liang 69d1354016 [rllib] Document ARS & rainbow (#2744)
* wip

* rainbow doc too

* e not used

* fix ppo doc

* clean list

* use same title
2018-08-28 18:13:36 -07:00
Michael Tu d16b6f6a32 [tune] Rename 'repeat' to 'num_samples' (#2698)
Deprecates the `repeat` argument and introduces `num_samples`. Also updates docs accordingly.
2018-08-24 15:05:24 -07:00
old-bear 4be324efc3 [tune] Support infinity value in report result (#2693)
* + Compatibility fix under py2 on ray.tune

* + Revert changes on master branch

* + Use default JsonEncoder in ray.tune.logger

* + Add UT for infinity support
2018-08-22 13:09:14 -07:00
joyyoj 38867eea4e [tune] Cross-Framework Compatibility (#2646)
This commit is a first pass at restructuring the Trial execution logic to support running on multiple frameworks.
2018-08-22 10:55:45 -07:00
Eric Liang fbe6c59f72 [rllib] Misc fixes, A2C (#2679)
A bunch of minor rllib fixes:

pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
2018-08-20 15:28:03 -07:00
old-bear 230ac7aa80 [tune] Compatibility fix under py2 on str condition (#2673)
* * Compatibility fix under py2 on ray.tune

* + Fix compatibility

* + Use package six to achieve str compatibility
2018-08-19 20:43:03 -07:00
Richard Liaw 62d0698097 [tune] Tune Facelift (#2472)
This PR introduces the following changes:

 * Ray Tune -> Tune 
 * [breaking] Creation of `schedulers/`, moving PBT, HyperBand into a submodule
 * [breaking] Search Algorithms now must take in experiment configurations via `add_configurations` rather through initialization
 * Support `"run": (function | class | str)` with automatic registering of trainable
 * Documentation Changes
2018-08-19 11:00:55 -07:00
Eric Liang e56eb354eb [tune] Remove hack to serve pin requests off thread (#2680)
* nopin

* fix
2018-08-18 13:19:52 -07:00
Eric Liang 64053278aa [tune] Support lambda functions in hyperparameters / tune rllib multiagent support (#2568)
* update

* func

* Update registry.py

* revert
2018-08-07 16:29:21 -07:00
Richard Liaw bb44456f6f [rllib, tune] TrainingResult -> Dict, Removes C408 from flake8 (#2565) 2018-08-07 12:17:44 -07:00
Richard Liaw 914a433e3f [tune] Split Search from Scheduling (#2452)
Introduces SearchAlgorithm concept, separate from schedulers in Tune. Moves HyperOpt under this concept.
2018-08-04 21:27:39 -07:00
Richard Liaw 7edc677304 [rllib] Extra Changes for Usability (#2363) 2018-07-24 20:51:22 -07:00
Richard Liaw 8e8c733696 [tune] Fix Categorical Space + Add Keras Example (#2401)
Previously did not properly resolve categorical variables for HyperOpt.
2018-07-17 23:52:52 +02:00
Eric Liang 7865dbab84 [tune] Raise error if incorrect key used in config (#2400) 2018-07-15 00:25:19 +02:00
Robert Nishihara 515da7721a Change ray.worker.cleanup -> ray.shutdown and improve API documentation. (#2374)
* Change ray.worker.cleanup -> ray.shutdown and improve API documentation.

* Deprecate ray.worker.cleanup() gracefully.

* Fix linting
2018-07-12 12:00:00 -07:00
Richard Liaw 0048e77093 [rllib] RLlib CLI (#2375) 2018-07-12 19:12:04 +02:00
Eric Liang d24f19fd1e [rllib] Fix stats collection and some docs bugs since the refactoring (#2361)
* fix

* fix pbt example

* fix

* fix

* single thread by default

* vec

* fix

* fix
2018-07-07 13:29:20 -07:00
Richard Liaw d75b39f6df [tune] Return error trials(#2292) 2018-06-28 20:23:38 -07:00
Eric Liang 737f3e3cf2 [tune] Fix registering trainable twice (#2293)
* register twice

* isolate

* Update registry.py

* Update registry.py
2018-06-27 16:29:39 -07:00
Richard Liaw e657497225 [xray] Fix tune tests (#2305)
* fix xray tests

* yapf

* unleash tests
2018-06-26 23:56:23 -07:00
Eric Liang a9a26b7560 [rllib] Part 2 of multiagent support (#2286)
* wip

* cls

* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* fix obs filter

* pass thru worker index

* fix

* fix log action

* debug name

* fix sphinx
2018-06-25 22:33:57 -07:00
Eric Liang 9c3bab5c42 [tune] Support all serializable objects in config (#2287)
* wip

* order

* lint
2018-06-23 16:13:46 -07:00
Richard Liaw 4acb77a5c3 [tune] Update Trainable doc to expose interface (#2272) 2018-06-20 13:40:45 -07:00
Eric Liang 30f7c08ca7 [rllib] Remove need to pass around registry (#2250)
* remove registry

* fix

* too many _

* fix

* cloudpickle

* Update registry.py

* yapf

* fix test

* fix kv check
2018-06-19 22:47:00 -07:00
Eric Liang 71eb558eb0 [rllib] Refactor rllib to have a common sample collection pathway (#2149) 2018-06-09 00:21:35 -07:00
Eric Liang 31046f7e06 Autoscaler Python 2 queue fix (#2205) 2018-06-07 18:43:07 -07:00
Kristian Hartikainen 74dc14d1fc [autoscaler] GCP node provider (#2061)
* Google Cloud Platform scaffolding

* Add minimal gcp config example

* Add googleapiclient discoveries, update gcp.config constants

* Rename and update gcp.config key pair name function

* Implement gcp.config._configure_project

* Fix the create project get project flow

* Implement gcp.config._configure_iam_role

* Implement service account iam binding

* Implement gcp.config._configure_key_pair

* Implement rsa key pair generation

* Implement gcp.config._configure_subnet

* Save work-in-progress gcp.config._configure_firewall_rules.

These are likely to be not needed at all. Saving them if we happen to
need them later.

* Remove unnecessary firewall configuration

* Update example-minimal.yaml configuration

* Add new wait_for_compute_operation, rename old wait_for_operation

* Temporarily rename autoscaler tags due to gcp incompatibility

* Implement initial gcp.node_provider.nodes

* Still missing filter support

* Implement initial gcp.node_provider.create_node

* Implement another compute wait
  operation (wait_For_compute_zone_operation). TODO: figure out if we
  can remove the function.

* Implement initial gcp.node_provider._node and node status functions

* Implement initial gcp.node_provider.terminate_node

* Implement node tagging and ip getter methods for nodes

* Temporarily rename tags due to gcp incompatibility

* Tiny tweaks for autoscaler.updater

* Remove unused config from gcp node_provider

* Add new example-full example to gcp, update load_gcp_example_config

* Implement label filtering for gcp.node_provider.nodes

* Revert unnecessary change in ssh command

* Revert "Temporarily rename tags due to gcp incompatibility"

This reverts commit e2fe634c5d11d705c0f5d3e76c80c37394bb23fb.

* Revert "Temporarily rename autoscaler tags due to gcp incompatibility"

This reverts commit c938ee435f4b75854a14e78242ad7f1d1ed8ad4b.

* Refactor autoscaler tagging to support multiple tag specs

* Remove missing cryptography imports

* Update quote function import

* Fix threading issue in gcp.config with the compute discovery object

* Add gcs support for log_sync

* Fix the labels/tags naming discrepancy

* Add expanduser to file_mounts hashing

* Fix gcp.node_provider.internal_ip

* Add uuid to node name

* Remove 'set -i' from updater ssh command

* Also add TODO with the context and reason for the change.

* Update ssh key creation in autoscaler.gcp.config

* Fix wait_for_compute_zone_operation's threading issue

Google discovery api's compute object is not thread safe, and thus
needs to be recreated for each thread. This moves the
`wait_for_compute_zone_operation` under `autoscaler.gcp.config`, and
adds compute as its argument.

* Address pr feedback from @ericl

* Expand local file mount paths in NodeUpdater

* Add ssh_user name to key names

* Update updater ssh to attempt 'set -i' and fall back if that fails

* Update gcp/example-full.yaml

* Fix wait crm operation in gcp.config

* Update gcp/example-minimal.yaml to match aws/example-minimal.yaml

* Fix gcp/example-full.yaml comment indentation

* Add gcp/example-full.yaml to setup files

* Update example-full.yaml command

* Revert "Refactor autoscaler tagging to support multiple tag specs"

This reverts commit 9cf48409ca2e5b66f800153853072c706fa502f6.

* Update tag spec to only use characters [0-9a-z_-]

* Change the tag values to conform gcp spec

* Add project_id in the ssh key name

* Replace '_' with '-' in autoscaler tag names

* Revert "Update updater ssh to attempt 'set -i' and fall back if that fails"

This reverts commit 23a0066c5254449e49746bd5e43b94b66f32bfb4.

* Revert "Remove 'set -i' from updater ssh command"

This reverts commit 5fa034cdf79fa7f8903691518c0d75699c630172.

* Add fallback to `set -i` in force_interactive command

* Update autoscaler tests to match current implementation

* Update GCPNodeProvider.create_node to include hash in instance name

* Add support for creating multiple instance on one create_node call

* Clean TODOs

* Update styles

* Replace single quotes with double quotes
* Some minor indentation fixes etc.

* Remove unnecessary comment. Fix indentation.

* Yapfify files that fail flake8 test

* Yapfify more files

* Update project_id handling in gcp node provider

* temporary yapf mod

* Revert "temporary yapf mod"

This reverts commit b6744e4e15d4d936d1a14f4bf155ed1d3bb14126.

* Fix autoscaler/updater.py lint error, remove unused variable
2018-05-31 09:00:03 -07:00
Alok Singh f795173b51 Use flake8-comprehensions (#1976)
* Add flake8 to Travis

* Add flake8-comprehensions

[flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that
checks for useless constructions.

* Use generators instead of lists where appropriate

A lot of the builtins can take in generators instead of lists.

This commit applies `flake8-comprehensions` to find them.

* Fix lint error

* Fix some string formatting

The rest can be fixed in another PR

* Fix compound literals syntax

This should probably be merged after #1963.

* dict() -> {}

* Use dict literal syntax

dict(...) -> {...}

* Rewrite nested dicts

* Fix hanging indent

* Add missing import

* Add missing quote

* fmt

* Add missing whitespace

* rm duplicate pip install

This is already installed in another file.

* Fix indent

* move `merge_dicts` into utils

* Bring up to date with `master`

* Add automatic syntax upgrade

* rm pyupgrade

In case users want to still use it on their own, the upgrade-syn.sh script was
left in the `.travis` dir.
2018-05-20 16:15:06 -07:00
Alok Singh 9a8f29e571 YAPF, take 3 (#2098)
* Use pep8 style

The original style file is actually just pep8 style, but with everything
spelled out. It's easier to use the `based_on_style` feature. Any overrides are
clearer that way.

* Improve yapf script

1. Do formatting in parallel
2. Lint RLlib
3. Use .style.yapf file

* Pull out expressions into variables

* Don't format rllib

* Don't allow splits in dicts

* Apply yapf

* Disallow single line if-statements

* Use arithmetic comparison

* Simplify checking for changed files

* Pull out expr into var
2018-05-19 16:07:28 -07:00
Melih Elibol bea97b425b Fix python linting (#2076) 2018-05-16 15:04:31 -07:00
Alok Singh c7f3b8c4d3 Fix typo in tune. (#2046)
Fix typo in tune.
2018-05-12 09:36:45 -07:00
Robert Nishihara 77c8aa7627 Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (#2007)
* Make ActorHandles pickleable, also make proper ActorHandle and ActorClass classes.

* Fix bug.

* Fix actor test bug.

* Update __ray_terminate__ usage.

* Fix most linting, add documentation, and small cleanups.

* Handle forking and pickling differently for actor handles. Fix linting.

* Fixes for named actors via pickling.

* Generate actor handle IDs deterministically in the pickling case.
2018-05-08 19:19:07 -07:00
Kristian Hartikainen 2048b546ff Expand local_dir in Trial init (#2013)
* Fix the case where Trial logs into wrong paths when `local_dir`
argument starts with tilde (~), by expanding the `local_dir` argument
* Add test case for checking that the tilde gets expanded
2018-05-07 21:44:28 -07:00
Alok Singh cdf94c18a4 Clean up syntax for supported Python versions. (#1963)
* Use set/dict literal syntax

Ran code through [pyupgrade](https://github.com/asottile/pyupgrade). This is
supported in every Python version 2.7+.

* Drop unnecessary string format specification

No need to specify 0,1.. if paramters are passed in order.

* Revert "Drop unnecessary string format specification"

This reverts commit efa5ec85d30ff69f34e5ed93e31343fea7647bcb.

* Undo changes to cloudpickle

Drop use of set literal until cloudpickle uses it.

* Reformat code with YAPF

We need to set up a git pre-push hook to automatically run this stuff.
2018-05-03 07:45:11 -07:00
Alok Singh 06a0898af7 [rllib] Fix PyTorch initialization (#1961)
* Fix typo

* Fix A3C PyTorch agent initialization

`registry` needs to be passed as an argument or else the `super` init will
fail.
2018-05-01 18:39:01 -07:00
Richard Liaw f833e4da37 [tune] Polishing docs (#1846) 2018-04-17 09:57:35 -07:00
Eric Liang 7ab890f4a1 [tune] [rllib] Automatically determine RLlib resources and add queueing mechanism for autoscaling (#1848) 2018-04-16 16:58:15 -07:00
Eric Liang ed8c0f1a38 [tune] Allow fetching pinned objects from trainable functions (#1895)
* updates

* lint

* Update util.py

* Update function_runner.py

* updates
2018-04-16 15:54:38 -07:00
Philipp Moritz 74162d1492 Lint Python files with Yapf (#1872) 2018-04-11 10:11:35 -07:00
Eric Liang e6c00b2b5e [tune] Add util function to broadcast objects (#1845)
* add util

* Fri Apr  6 15:09:20 PDT 2018

* doc

* Fri Apr  6 15:21:42 PDT 2018

* Fri Apr  6 15:28:07 PDT 2018

* Fri Apr  6 15:28:26 PDT 2018

* Update tune-config.rst

* Update tune-config.rst
2018-04-07 11:37:14 -07:00
Richard Liaw bc8f62c947 [tune] Fix Median Stopping Rule Verbosity (#1833) 2018-04-06 22:58:13 -07:00
Richard Liaw 888e70f1be [tune] HyperOpt Support (v2) (#1763) 2018-04-04 11:08:26 -07:00
Eric Liang 4116c64698 [tune] Remove rllib dep again, and add a test (#1792)
* tune should not depend on rllib

* fix dep test

* Tue Mar 27 16:55:41 PDT 2018

* f401
2018-03-29 15:36:49 -07:00
Eric Liang 7c4afa4b04 [tune] Fix linting error (#1777) 2018-03-25 23:44:14 -07:00
Yan Facai (颜发才) 6b1e592d5c [tune] Added pbt with keras on cifar10 dataset example (#1729)
* [tune] Added pbt with keras on cifar10 dataset example

* ENH: add gpu resources

* CLN: requires 4 GPUs resource

* CLN: use single quotes

* CLN: don't save model by default
2018-03-25 15:57:23 -07:00