* Remove all __future__ imports from RLlib.
* Remove (object) again from tf_run_builder.py::TFRunBuilder.
* Fix 2xLINT warnings.
* Fix broken appo_policy import (must be appo_tf_policy)
* Remove future imports from all other ray files (not just RLlib).
* Remove future imports from all other ray files (not just RLlib).
* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).
* Add two empty lines before Schedule class.
* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).
* Fix LINT line-len errors.
* Fix LINT errors.
* Fix `tf_pg_policy` imports (formerly: `pg_policy`).
* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).
* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
then built into the Bazel/Travis test suite.
* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.
* Fix remaining import errors for agents/pg/...
* Fix circular dependency in pg imports.
* Add pg tests to Jenkins test suite.
* Rollout improvements
* Make info-saving optional, to avoid breaking change.
* Store generating ray version in checkpoint metadata
* Keep the linter happy
* Add small rollout test
* Terse.
* Update test_io.py
## What do these changes do?
This is a re-implementation of the `FunctionRunner` which enforces some synchronicity between the thread running the training function and the thread running the Trainable which logs results. The main purpose is to make logging consistent across APIs in anticipation of a new function API which will be generator based (through `yield` statements). Without these changes, it will be impossible for the (possibly soon to be) deprecated reporter based API to behave the same as the generator based API.
This new implementation provides additional guarantees to prevent results from being dropped. This makes the logging behavior more intuitive and consistent with how results are handled in custom subclasses of Trainable.
New guarantees for the tune function API:
- Every reported result, i.e., `reporter(**kwargs)` calls, is forwarded to the appropriate loggers instead of being dropped if not enough time has elapsed since the last results.
- The wrapped function only runs if the `FunctionRunner` expects a result, i.e., when `FunctionRunner._train()` has been called. This removes the possibility that a result will be generated by the function but never logged.
- The wrapped function is not called until the first `_train()` call. Currently, the wrapped function is started during the setup phase which could result in dropped results if the trial is cancelled between `_setup()` and the first `_train()` call.
- Exceptions raised by the wrapped function won't be propagated until all results are logged to prevent dropped results.
- The thread running the wrapped function is explicitly stopped when the `FunctionRunner` is stopped with `_stop()`.
- If the wrapped function terminates without reporting `done=True`, a duplicate result with `{"done": True}`, is reported to explicitly terminate the trial, and components will be notified with a duplicate of the last reported result, but this duplicate will not be logged.
## Related issue number
Closes#3956.
#3949#3834
In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint.
For Ray Tune users, these APIs are not accessible through YAML configurations.
In this pull request, export_formats option is provided to enable users to choose the desired export format.
This PR introduces cluster-level fault tolerance for Tune by checkpointing global state. This occurs with relatively high frequency and allows users to easily resume experiments when the cluster crashes.
Note that this PR may affect automated workflows due to auto-prompting, but this is resolvable.
Adds the ability for trainables to reset their configurations during experiments. These changes in particular add the base functions to the trial_executor and trainable interfaces as well as giving the basic implementation on the PopulationBasedTraining scheduler.
Related issue number: #2741
This PR introduces the following changes:
* Ray Tune -> Tune
* [breaking] Creation of `schedulers/`, moving PBT, HyperBand into a submodule
* [breaking] Search Algorithms now must take in experiment configurations via `add_configurations` rather through initialization
* Support `"run": (function | class | str)` with automatic registering of trainable
* Documentation Changes