Commit Graph

40 Commits

Author SHA1 Message Date
Amog Kamsetty 5d3c9c8861 [Tune] Mlflow Integration (#12840)
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-19 00:40:02 -08:00
Richard Liaw 4dc16730a7 [tune] with-params fix (#12522) 2020-12-01 16:47:03 -08:00
Kai Fricke b94bfdfa99 [tune] use default anonymous metric _metric if at least a mode is set (#12159)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-23 20:09:33 -08:00
Kai Fricke 6d11fb8bc6 [tune] validate function callable in tune.with_parameters() (#11504) 2020-10-20 16:03:24 -07:00
Kai Fricke b450cb030a [tune] reuse actors for function API (#11230)
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-10-08 16:15:02 -07:00
scottwedge 732cd9901b Fix spelling of occurred (#10792) 2020-10-08 10:55:52 -07:00
Kai Fricke 508cfa3540 [tune] Support yield and return statements (#10857)
* Support `yield` and `return` statements in Tune trainable functions

* Support anonymous metric with ``tune.report(value)``

* Raise on invalid return/yield value

* Fix end to end reporter test
2020-09-17 20:18:35 -07:00
Kai Fricke 7eaf063f29 [tune] wrapper function to pass arbitrary objects through the object store to trainables (#10679) 2020-09-10 17:39:44 -07:00
Richard Liaw 5851e893ee [tune] More robust resolution/detection of signature (#10365)
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-09-08 11:38:16 -07:00
Richard Liaw 6bd5458bef [tune] cleanup error messaging/diagnose_serialization helper (#10210) 2020-08-22 11:50:49 -07:00
Richard Liaw b5068d08bf [tune] Fix restoration for function API PBT (#9853) 2020-08-03 12:36:17 -07:00
krfricke 619e44e54a [tune] Added WandbLogger (#9725)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-30 13:09:03 -07:00
Richard Liaw 0c3b9ebeef [tune/sgd] Document func_trainable and add checkpoint context (#9739)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-07-30 09:46:37 -07:00
Richard Liaw f3fdb5c5db [tune] distributed torch wrapper (#9550)
* changes

* add-working

* checkpoint

* ccleanu

* fix

* ok

* formatting

* ok

* tests

* some-good-stuff

* fix-torch

* ddp-torch

* torch-test

* sessions

* add-small-test

* fix

* remove

* gpu-working

* update-tests

* ok

* try-test

* formgat

* ok

* ok
2020-07-26 09:37:22 -07:00
Richard Liaw d35f0e40d0 [tune] Use public methods for trainable (#9184) 2020-07-01 11:00:00 -07:00
Richard Liaw 6c49c01837 [tune] Function API checkpointing (#8471)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
2020-06-15 10:42:54 -07:00
Richard Liaw 67c01455fe [tune] tune.track -> tune.report (#8388) 2020-05-16 12:55:08 -07:00
Richard Liaw ea10cd212c [tune] add accessible trial_info (#7378)
* add accessible trial_info

* trial name and info

* doc

* fix
gp

* Update doc/source/tune-package-ref.rst

* Apply suggestions from code review

* fix

* trial

* fixtest

* testfix
2020-03-17 23:44:18 -07:00
Eric Liang 1ea05a2c08 [tune] Fix a number of reporter regressions and add end-to-end tests (#7274) 2020-02-25 14:31:56 -08:00
Sven 60d4d5e1aa Remove future imports (#6724)
* Remove all __future__ imports from RLlib.

* Remove (object) again from tf_run_builder.py::TFRunBuilder.

* Fix 2xLINT warnings.

* Fix broken appo_policy import (must be appo_tf_policy)

* Remove future imports from all other ray files (not just RLlib).

* Remove future imports from all other ray files (not just RLlib).

* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).

* Add two empty lines before Schedule class.

* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Robert Nishihara 39a3459886 Remove (object) from class declarations. (#6658) 2020-01-02 17:42:13 -08:00
Robert Nishihara 480206eef8 Remove some Python 2 compatibility code. (#6624) 2019-12-31 17:14:58 -08:00
Richard Liaw aa7b861332 [minor][tune] Support Type Hinting for py3 (#6571)
* fullargspec for new pyversion

* fi
2019-12-25 08:15:33 +01:00
Richard Liaw 1eaa57c98f [tune] Distributed example + walkthrough (#5157) 2019-08-02 09:17:20 -07:00
Sam Toyer 7ad854d4c6 [tune] Use traceback.format_tb() (fixes #5135) (#5136) 2019-07-08 01:13:06 -07:00
Noah Golmant 1ef9c0729d [tune] Initial track integration (#4362)
Introduces a minimally invasive utility for logging experiment results. A broad requirement for this tool is that it should integrate seamlessly with Tune execution.
2019-05-17 11:34:05 -07:00
Richard Liaw 828dc08ac8 [tune] Fix tests for Function API for better consistency (#4421) 2019-03-20 22:31:38 -07:00
gehring 7c3274e65b [tune] Make the logging of the function API consistent and predictable (#4011)
## What do these changes do?

This is a re-implementation of the `FunctionRunner` which enforces some synchronicity between the thread running the training function and the thread running the Trainable which logs results. The main purpose is to make logging consistent across APIs in anticipation of a new function API which will be generator based (through `yield` statements). Without these changes, it will be impossible for the (possibly soon to be) deprecated reporter based API to behave the same as the generator based API.

This new implementation provides additional guarantees to prevent results from being dropped. This makes the logging behavior more intuitive and consistent with how results are handled in custom subclasses of Trainable.

New guarantees for the tune function API:

- Every reported result, i.e., `reporter(**kwargs)` calls, is forwarded to the appropriate loggers instead of being dropped if not enough time has elapsed since the last results.
- The wrapped function only runs if the `FunctionRunner` expects a result, i.e., when `FunctionRunner._train()` has been called. This removes the possibility that a result will be generated by the function but never logged.
- The wrapped function is not called until the first `_train()` call. Currently, the wrapped function is started during the setup phase which could result in dropped results if the trial is cancelled between `_setup()` and the first `_train()` call.
- Exceptions raised by the wrapped function won't be propagated until all results are logged to prevent dropped results.
- The thread running the wrapped function is explicitly stopped when the `FunctionRunner` is stopped with `_stop()`.
- If the wrapped function terminates without reporting `done=True`, a duplicate result with `{"done": True}`, is reported to explicitly terminate the trial, and components will be notified with a duplicate of the last reported result, but this duplicate will not be logged.

## Related issue number

Closes #3956.
#3949
#3834
2019-03-18 19:14:26 -07:00
Andrew Tan 57dcd3033e [tune] Trial reporter fix (#3951)
Fixes #3949.
2019-02-13 01:03:54 -08:00
Richard Liaw eab6dd72b5 [tune] logging fixes, better warnings, better cluster support (#3906) 2019-02-02 19:14:03 -08:00
Richard Liaw c3a2c7ebed [tune] Doc: Autofilled, StatusReporter (#3294)
* autofill and revise doc page for things

* lint

* comments
2018-11-13 13:15:56 -08:00
Richard Liaw f9b58d7b02 [tune] Tweaks to Trainable and Verbosity (#2889) 2018-10-11 23:42:13 -07:00
Richard Liaw f372f48bf3 [tune] Tune onto Logging Module (#2882)
Moves Tune onto logging in Python. Ignores examples and tests.
2018-09-16 12:09:36 -07:00
Richard Liaw 0347e6418b [tune] Add PyTorch MNIST Example + Misc. Tweaks (#2708) 2018-08-30 16:18:56 -07:00
Richard Liaw 62d0698097 [tune] Tune Facelift (#2472)
This PR introduces the following changes:

 * Ray Tune -> Tune 
 * [breaking] Creation of `schedulers/`, moving PBT, HyperBand into a submodule
 * [breaking] Search Algorithms now must take in experiment configurations via `add_configurations` rather through initialization
 * Support `"run": (function | class | str)` with automatic registering of trainable
 * Documentation Changes
2018-08-19 11:00:55 -07:00
Eric Liang e56eb354eb [tune] Remove hack to serve pin requests off thread (#2680)
* nopin

* fix
2018-08-18 13:19:52 -07:00
Richard Liaw bb44456f6f [rllib, tune] TrainingResult -> Dict, Removes C408 from flake8 (#2565) 2018-08-07 12:17:44 -07:00
Eric Liang ed8c0f1a38 [tune] Allow fetching pinned objects from trainable functions (#1895)
* updates

* lint

* Update util.py

* Update function_runner.py

* updates
2018-04-16 15:54:38 -07:00
Philipp Moritz 74162d1492 Lint Python files with Yapf (#1872) 2018-04-11 10:11:35 -07:00
Eric Liang 173f1d629a [tune] Ray Tune API cleanup (#1454)
Remove rllib dep: trainable is now a standalone abstract class that can be easily subclassed.

Clean up hyperband: fix debug string and add an example.

Remove YAML api / ScriptRunner: this was never really used.

Move ray.init() out of run_experiments(): This provides greater flexibility and should be less confusing since there isn't an implicit init() done there. Note that this is a breaking API change for tune.
2018-01-24 16:55:17 -08:00