Commit Graph

11 Commits

Author SHA1 Message Date
Richard Liaw 9ce7ad17fd [tune] remove some bottlenecks in trialrunner (#12476) 2020-11-30 14:54:25 -08:00
Richard Liaw 98df612010 [tune] option to raise on error (#10030) 2020-08-11 09:59:04 -07:00
Richard Liaw ca6eabc9cb [tune] Fail Fast (#7528)
* pytest

* init cancel

* testing

* Update python/ray/tune/tests/test_tune_server.py

Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>

* change-test

* Apply suggestions from code review

* Apply suggestions from code review

* finished

* set_finished

* tune

* fix

Co-authored-by: ijrsvt <ian.rodney@gmail.com>
2020-03-26 00:04:09 -07:00
Ujval Misra 6022eb53c4 [tune] Use newest checkpoint in normal operation (#7563)
* Use persistent checkpoint for failures

* Fix test

* Add unpause test

* move test

* Fix tests

* remove debug statement

* Mark test as flaky
2020-03-12 22:21:42 -07:00
Ujval Misra 023d4c02a9 [tune] Prevent deletion of checkpoint from user-initiated resto… (#7501)
* Fix restore bug

* Add test

* Lint

* Indent
2020-03-09 15:53:10 -07:00
Ujval Misra 98a07fe37e [tune] Asynchronous saves (#6912)
* Support asynchronous saves

* Fix merge issues

* Add test, fix existing tests

* More informative warning

* Lint, remove print statements

* Address comments, add checkpoint.is_resolved fn

* Add more detailed comments
2020-02-09 12:17:45 -08:00
Ujval Misra 1558307ac4 [tune] Prevent MEMORY checkpoints from breaking trial FT (#6691)
* Prevent MEMORY checkpoints from breaking FT

* Add save/pause/resume/restore test

* change checkpoint return value based on status

* Fix test_checkpoint_manager_tests.

* Fix test + checkpoint manager bug

* lint

* Add docstring

* Add docstring to checkpoint_manager constructor

* Change variable name for clarity

* Revert on_checkpoint docstring wording

* Break after success

* nit: more informative warning

* Quarantine test
2020-01-22 23:17:09 -08:00
Sven 60d4d5e1aa Remove future imports (#6724)
* Remove all __future__ imports from RLlib.

* Remove (object) again from tf_run_builder.py::TFRunBuilder.

* Fix 2xLINT warnings.

* Fix broken appo_policy import (must be appo_tf_policy)

* Remove future imports from all other ray files (not just RLlib).

* Remove future imports from all other ray files (not just RLlib).

* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).

* Add two empty lines before Schedule class.

* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Ujval Misra ca651af1d7 [tune] Async restores and S3/GCP-capable trial FT (#6376)
* Initial commit for asynchronous save/restore

* Set stage for cloud checkpointable trainable.

* Refactor log_sync and sync_client.

* Add durable trainable impl.

* Support delete in cmd based client

* Fix some tests and such

* Cleanup, comments.

* Use upload_dir instead.

* Revert files belonging to other PR in split.

* Pass upload_dir into trainable init.

* Pickle checkpoint at driver, more robust checkpoint_dir discovery.

* Cleanup trainable helper functions, fix tests.

* Addressed comments.

* Fix bugs from cluster testing, add parameterized cluster tests.

* Add trainable util test

* package_ref

* pbt_address

* Fix bug after running pbt example (_save returning dir).

* get cluster tests running, other bug fixes.

* raise_errors

* Fix deleter bug, add durable trainable example.

* Fix cluster test bugs.

* filelock

* save/restore bug fixes

* .

* Working cluster tests.

* Lint, revert to tracking memory checkpoints.

* Documentation, cleanup

* fixinitialsync

* fix_one_test

* Fix cluster test bug

* nit

* lint

* Revert tune md change

* Fix basename bug for directories.

* lint

* fix_tests

* nit_fix

* Add __init__ file.

* Move to utils package

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-02 20:40:53 -08:00
Robert Nishihara 480206eef8 Remove some Python 2 compatibility code. (#6624) 2019-12-31 17:14:58 -08:00
Eric Liang 304b4f0d3d Shard unit tests into medium sized files for test stability (#6398) 2019-12-09 13:15:29 -08:00