* prepare for head node
* move command runner interface outside _private
* remove space
* Eric
* flake
* min_workers in multi node type
* fixing edge cases
* eric not idle
* fix target_workers to consider min_workers of node types
* idle timeout
* minor
* minor fix
* test
* lint
* eric v2
* eric 3
* min_workers constraint before bin packing
* Update resource_demand_scheduler.py
* Revert "Update resource_demand_scheduler.py"
This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.
* reducing diff
* make get_nodes_to_launch return a dict
* merge
* weird merge fix
* auto fill instance types for AWS
* Alex/Eric
* Update doc/source/cluster/autoscaling.rst
* merge autofill and input from user
* logger.exception
* make the yaml use the default autofill
* docs Eric
* remove test_autoscaler_yaml from windows tests
* lets try changing the test a bit
* return test
* lets see
* edward
* Limit max launch concurrency
* commenting frac TODO
* move to resource demand scheduler
* use STATUS UP TO DATE
* Eric
* make logger of gc freed refs debug instead of info
* add cluster name to docker mount prefix directory
* grrR
* fix tests
* moving docker directory to sdk
* move the import to prevent circular dependency
* smallf fix
* ian
* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running
* small fix
* Revert "Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970)" (#14046)"
This reverts commit 6f9d39fb3e.
* fake news
Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
* In Progress.
* Done.
* Fix the issue.
* Add wait for condition because logs are not written right away now.
* debug string.
* lint.
* Fix flaky test.
* Fix issues.
* Fix test.
* lint.
* Remove worker Wait() call due to SIGCHLD being ignored
* Port _pid_alive to Windows
* Show PID as well as TID in glog
* Update TensorFlow version for Python 3.8 on Windows
* Handle missing Pillow on Windows
* Work around dm-tree PermissionError on Windows
* Fix some lint errors on Windows with Python 3.8
* Simplify torch requirements
* Quiet git clean
* Handle finalizer issues
* Exit with the signal number
* Get rid of wget
* Fix some Windows compatibility issues with tests
Co-authored-by: Mehrdad <noreply@github.com>
* Remove all __future__ imports from RLlib.
* Remove (object) again from tf_run_builder.py::TFRunBuilder.
* Fix 2xLINT warnings.
* Fix broken appo_policy import (must be appo_tf_policy)
* Remove future imports from all other ray files (not just RLlib).
* Remove future imports from all other ray files (not just RLlib).
* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).
* Add two empty lines before Schedule class.
* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.