Commit Graph

519 Commits

Author SHA1 Message Date
Tomasz Wrona aff7f19360 [tune] Added logger_config field (#8521)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-18 11:10:22 -07:00
Amog Kamsetty d3bac298d5 [Tune] PBT Error if metric not available (#9957) 2020-08-17 16:12:14 -07:00
krfricke 8f0f7371a0 [tune] Added Kubernetes syncer and sync client (#10097)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-16 14:09:28 -07:00
Amog Kamsetty f87a4aa45d [Tune] Pbt Function API (#9958)
* adding function convnet example

* add unit test

* update test

* update example

* wip

* move error from experiment to tune

* wip

* Fix checkpoint deletion

* updating code

* adding smoke test

* updating pbt guide

* formatting

* fix build

* add best checkpoint analysis util

* update test

* add comments

* remove class api

* fix example

* add setup and teardown to tests

* formatting

* Update python/ray/tune/tests/test_trial_scheduler_pbt.py

Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-14 17:52:30 -07:00
Amog Kamsetty 5898248645 [Tune] Update PBT Transformer Test (#10081) 2020-08-13 12:23:03 -07:00
krfricke 16486a8df3 [tune] Add OptunaSearcher wrapper around Optuna samplers (#10044)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-12 16:13:22 -07:00
Richard Liaw 7a8b922841 [tune] hotfix log_once (#10069) 2020-08-12 12:40:22 -07:00
Simon Mo f1ede1099f [Hotfix] Pin opencv-python-headless==4.3.0.36 (#10049) 2020-08-11 15:58:18 -07:00
krfricke 221fdc0774 [tune] fix flaky PBT replay test (#10047)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-11 14:17:31 -07:00
Richard Liaw 98df612010 [tune] option to raise on error (#10030) 2020-08-11 09:59:04 -07:00
Amog Kamsetty 856d4a0533 [Tune] Better error when using checkpoint_freq (#9998) 2020-08-10 13:52:46 -07:00
Richard Liaw be8e63d477 [tune] support resume for search algorithms (#9972) 2020-08-10 13:43:14 -07:00
krfricke 7301733a1f [tune] Close logfile contexts (#10026)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-10 12:40:40 -07:00
Richard Liaw a438cbd1e8 [tune] reorder transformer example save (#9994)
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-08-07 20:29:04 -07:00
krfricke 0ef8224446 [tune] PBT replay utility scheduler (#9953)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-07 12:41:49 -07:00
Amog Kamsetty 5af7d24f66 [Tune] Transformer blog example (#9789)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-04 22:05:01 -07:00
krfricke ef717ecda6 [tune] Prevent leak of magic keys in trial config (#9903)
Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-04 11:24:01 -07:00
Richard Liaw c6404e8cf6 [tune] Search alg checkpointing during training (#9803)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
2020-08-03 15:07:31 -07:00
Richard Liaw b5068d08bf [tune] Fix restoration for function API PBT (#9853) 2020-08-03 12:36:17 -07:00
krfricke c741d1cf9c [tune] stdout/stderr logging redirection (#9817)
* Add `log_to_file` parameter, pass to Trainable config, redirect stdout/stderr.

* Add logging handler to root ray logger

* Added test for `log_to_file` parameter

* Added logs, reuse test

* Revert debug change

* Update logdir on reset, flush streams after each train() step

* Remove magic keys from visible config

Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-03 11:18:34 -07:00
Richard Liaw a47121476f [tune] Remove accidentally added files (#9835) 2020-07-30 21:47:27 -07:00
krfricke 619e44e54a [tune] Added WandbLogger (#9725)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-30 13:09:03 -07:00
Richard Liaw 0c3b9ebeef [tune/sgd] Document func_trainable and add checkpoint context (#9739)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-07-30 09:46:37 -07:00
Richard Liaw f3fdb5c5db [tune] distributed torch wrapper (#9550)
* changes

* add-working

* checkpoint

* ccleanu

* fix

* ok

* formatting

* ok

* tests

* some-good-stuff

* fix-torch

* ddp-torch

* torch-test

* sessions

* add-small-test

* fix

* remove

* gpu-working

* update-tests

* ok

* try-test

* formgat

* ok

* ok
2020-07-26 09:37:22 -07:00
krfricke 9f3570828a [tune] move jenkins tests to travis (#9609)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-24 21:22:54 -07:00
krfricke ad0219b80d [tune] fix pbt checkpoint_freq (#9517)
* Only delete old checkpoint if it is not the same as the new one

* Return early if old checkpoint value coincides with new checkpoint value

Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-18 00:58:16 -07:00
Richard Liaw ed476be4ad quickfix (#9552) 2020-07-17 20:54:03 -07:00
Tom cf719dd470 [Tune] Copy default_columns in new ProgressReporter instances (#9537) 2020-07-17 15:44:38 -07:00
krfricke 87630cf024 [tune] Unflattened lookup for ProgressReporter (#9525)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-17 13:52:54 -07:00
krfricke 5a40299d42 [tune] extend PTL template (GPU, typing fixes, tensorboard) (#9451)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-15 10:30:20 -07:00
Michael Mui e93cde8c66 [tune] Issue 8821: ExperimentAnalysis doesn't expand user (#9461) 2020-07-14 13:53:37 -07:00
krfricke deba082cb4 [tune] PyTorch CIFAR10 example (#9338)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-13 23:16:05 -07:00
Amog Kamsetty 4454d05bcf [Tune] Trainable documentation fix (#9448) 2020-07-13 13:15:01 -07:00
Nicolaus93 b5a6c57295 [tune] handling nan values (#9381) 2020-07-12 17:08:36 -07:00
Hao Chen d49dadf891 Change Python's ObjectID to ObjectRef (#9353) 2020-07-10 17:49:04 +08:00
Richard Liaw 139d21e068 [tune] Docs for tune-sklearn (#9129)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
2020-07-06 15:35:10 -07:00
Richard Liaw b71c912da7 [tune] Fix up examples (#9201) 2020-07-05 01:16:20 -07:00
David Fidalgo c0ba337fe0 [tune] Add np.bool8 and np.int to allowed HPARAMS types (#9297) 2020-07-03 18:34:45 -07:00
Richard Liaw d35f0e40d0 [tune] Use public methods for trainable (#9184) 2020-07-01 11:00:00 -07:00
krfricke e0b6984dce [tune] pytorch lightning template and walkthrough (#9151)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-06-29 16:52:07 -07:00
mehrdadn 898e472425 Make test_utils.py use pipes to avoid file access conflicts on Windows (#9072)
Co-authored-by: Mehrdad <noreply@github.com>
2020-06-27 22:33:45 +02:00
krfricke 22ea8dde84 [Tune] Added XGBoost tutorial and template (#9060)
* Added XGBoost tutorial and template

* XGBoost tutorial: Cut some clutter

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Added XGboost logo

* Fixed further references

Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-06-25 15:59:54 -07:00
mehrdadn 07655036d2 [tune] os.replace() instead of os.rename() for cross-platform (#9141)
Co-authored-by: Mehrdad <noreply@github.com>
2020-06-25 12:30:51 -07:00
Ian Rodney b942bcd798 [Tune] remove whitelist from deep_copy (#8997) 2020-06-22 15:02:27 -07:00
Vishnu Deva 432ce1be50 [tune] fix for sync_on_checkpoint bug (#9057)
* #9056 fix for sync_on_checkpoint bug

* fix for failing checks

* update help string
2020-06-21 01:07:11 -07:00
Richard Liaw e6ee39a6a3 [tune] checkpoint_dir test (#8024) 2020-06-20 17:56:24 -07:00
mehrdadn 981f67bfb0 Fix more Windows issues (#9011)
Co-authored-by: Mehrdad <noreply@github.com>
2020-06-19 18:51:45 -07:00
Richard Liaw 6c49c01837 [tune] Function API checkpointing (#8471)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
2020-06-15 10:42:54 -07:00
Jack Carreira 19cc1ae781 [docs] Tune Search: Wrong parameter name (#8927) 2020-06-13 18:01:22 -07:00
Eli Meirom 5c56760fac [tune] np.array compat for logger (#8918)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-06-12 16:39:01 -07:00