Commit Graph

3529 Commits

Author SHA1 Message Date
Edward Oakes d738610dc9 Disable atexit test on windows (#13207) 2021-01-05 14:33:51 -06:00
Kai Fricke 96c2d3d2b5 [tune] better signature check for tune.sample_from (#13171)
* [tune] better signature check for `tune.sample_from`

* Update python/ray/tune/sample.py

Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>

Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
2021-01-05 08:04:18 -08:00
Edward Oakes e8162f1b1f [serve] Merge ActorReconciler and BackendState (#13139) 2021-01-05 09:56:22 -06:00
Hao Zhang 4150970226 [Collective][PR 2/6] Driver program declarative interfaces (#12874)
* scaffold of the code

* some scratch and options change

* NCCL mostly done, supporting API#1

* interface 2.1 2.2 scratch

* put code into ray and fix some importing issues

* add an addtional Rendezvous class to safely meet at named actor

* fix some small bugs in nccl_util

* some small fix

* scaffold of the code

* some scratch and options change

* NCCL mostly done, supporting API#1

* interface 2.1 2.2 scratch

* put code into ray and fix some importing issues

* add an addtional Rendezvous class to safely meet at named actor

* fix some small bugs in nccl_util

* some small fix

* add a Backend class to make Backend string more robust

* add several useful APIs

* add some tests

* added allreduce test

* fix typos

* fix several bugs found via unittests

* fix and update torch test

* changed back actor

* rearange a bit before importing distributed test

* add distributed test

* remove scratch code

* auto-linting

* linting 2

* linting 2

* linting 3

* linting 4

* linting 5

* linting 6

* 2.1 2.2

* fix small bugs

* minor updates

* linting again

* auto linting

* linting 2

* final linting

* Update python/ray/util/collective_utils.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/collective_utils.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/collective_utils.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* added actor test

* lint

* remove local sh

* address most of richard's comments

* minor update

* remove the actor.option() interface to avoid changes in ray core

* minor updates

Co-authored-by: YLJALDC <dal177@ucsd.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-04 20:57:37 -08:00
Tao Wang c617291b27 [build]Update description and add some keywords (#13163) 2021-01-05 11:34:03 +08:00
Barak Michener 9643e44af6 [ray_client]: Move from experimental to util (#13176)
Change-Id: I9f054881f0429092d265cd6944d89804cce9d946
2021-01-04 17:51:56 -08:00
Eric Liang dfb326d4b5 Surface object store spilling statistics in ray memory (#13124) 2021-01-04 17:35:39 -08:00
Amog Kamsetty e181515dff [SGD] Fix Docstring for as_trainable (#13173) 2021-01-04 17:21:24 -08:00
Amog Kamsetty 15e86581bd [XGboost] Update Documentation (#13017)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-04 17:21:04 -08:00
Raed Shabbir d632b0f0f7 [Serve] Bug in Serve node memory-related resources calculation #11198 (#13061) 2021-01-04 11:04:59 -08:00
Clark Zinzow c2bff64699 [Core] Locality-aware leasing: Milestone 1 - Owned refs, pinned location (#12817)
* Locality-aware leasing for owned refs (pinned locations).

* LessorPicker --> LeasePolicy.

* Consolidate GetBestNodeIdForTask and GetBestNodeIdForObjects.

* Update comments.

* Turn on locality-aware leasing feature flag by default.

* Move local fallback logic to LeasePolicy, move feature flag check to CoreWorker constructor, add local-only lease policy.

* Add lease policy consulting assertions to the direct task submitter tests.

* Add lease policy tests.

* LocalityLeasePolicy --> LocalityAwareLeasePolicy.

* Add missing const declarations.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Add RAY_CHECK for raylet address nullptr when creating lease client.

* Make the fact that LocalLeasePolicy always returns the local node more explicit.

* Flatten GetLocalityData conditionals to make it more readable.

* Add ReferenceCounter::GetLocalityData() unit test.

* Add data-intensive microbenchmarks for single-node perf testing.

* Add data-intensive microbenchmarks for simulated cluster perf testing.

* Remove redundant comment.

* Remove data-intensive benchmarks.

* Add locality-aware leasing Python test.

* Formatting changes in ray_perf.py.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2021-01-04 09:49:08 -08:00
Dmitri Gekhtman 31453621ef [kubernetes][docs][minor] Kubernetes version warning (#13161) 2021-01-04 10:29:17 -06:00
architkulkarni a95275bdd9 [Serve] [Doc] Add existing web server integration ServeHandle tutorial (#13127) 2021-01-04 10:28:34 -06:00
Simon Mo fece8db70d [Serve] Use a small object to track requests (#13125) 2020-12-31 11:43:03 -08:00
Ian Rodney acb082fc47 [serve] Async controller (#13111) 2020-12-31 10:51:33 -06:00
Amog Kamsetty 7120f3a6ab [Tune] Update URL to fix 403 not found error in PBT tranformers test case (#13131) 2020-12-31 10:45:57 -05:00
chaokunyang 33089c44e2 Fix streaming ci failure (#12830) 2020-12-30 10:45:52 +08:00
architkulkarni 032a6546d5 Serve metrics docs (#13096) 2020-12-29 14:03:34 -06:00
Ameer Haj Ali 44483f465c [autoscaler] Make placement groups bypass max launch limit (#13089) 2020-12-29 10:06:11 -08:00
Ian Rodney 7ad56826db [docker] Fix restart behavior with Docker (#12898)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: ijrsvt <ilr@anyscale.com>
2020-12-28 18:56:28 -08:00
architkulkarni cc1c2c3dc9 [Serve] Use ServeHandle in HTTP proxy (#12523) 2020-12-28 18:33:42 -08:00
Simon Mo 30c22921d9 [Serve] Implement Graceful Shutdown (#13028) 2020-12-28 17:53:53 -08:00
Lavanya Shukla 350917958c [docs] fix wandb url (#13094) 2020-12-28 17:19:17 -08:00
Eric Liang 836c5d5a91 Deprecate experimental / dynamic resources (#13019) 2020-12-28 11:52:36 -08:00
architkulkarni 9a0218fb89 [Serve] [Doc] Front page update (#13032) 2020-12-28 10:19:36 -08:00
Hao Zhang 18f5743416 [Collective][PR 3.5/6] Send/Recv calls and some initial code for communicator caching (#12935)
* other collectives all work

* auto-linting

* mannual linting #1

* mannual linting 2

* bugfix

* add send/recv point-to-point calls

* add some initial code for communicator caching

* auto linting

* optimize imports

* minor fix

* fix unpassed tests

* support more dtypes

* rerun some distributed tests for send/recv

* linting
2020-12-28 09:48:07 -08:00
Sumanth Ratna b11bd22111 [docs] Fix args + kwargs instead of docstrings (#13068)
* functools wraps

* Fix typo (functoools -> functools)
2020-12-23 19:09:23 -08:00
Edward Oakes 3cc213ddf6 [serve] Centralize HTTP-related logic in HTTPState (#13020) 2020-12-23 18:00:02 -06:00
Alex Wu 8df94e33e0 [Autoscaler] New output log format (#12772) 2020-12-23 12:02:55 -08:00
Antoni Baum a4f2dd2138 [Tune]Add integer loguniform support (#12994)
* Add integer quantization and loguniform support

* Fix hyperopt qloguniform not being np.log'd first

* Add tests, __init__

* Try to fix tests, better exceptions

* Tweak docstrings

* Type checks in SearchSpaceTest

* Update docs

* Lint, tests

* Update doc/source/tune/api_docs/search_space.rst

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2020-12-23 09:27:16 -08:00
Ameer Haj Ali d37e2c3a20 [joblib] Fix flaky joblib test. (#13046) 2020-12-23 10:43:34 -06:00
Barak Michener c4e273920f [ray_client]: Insert decorators into the real ray module to allow for client mode (#13031) 2020-12-22 22:51:45 -08:00
Simon Mo bc68260144 [Serve] Handle Bug Fixes (#12971) 2020-12-22 19:13:16 -08:00
Edward Oakes b52cce6632 [serve] Refactor SystemState into EndpointState and BackendState (#13018) 2020-12-21 20:39:13 -06:00
Eric Liang 8068041006 Don't release resources during plasma fetch (#13025) 2020-12-21 18:32:40 -08:00
Edward Oakes 015a0f9935 [serve] Rename replica_tag -> replica in metrics for consistency (#13022) 2020-12-21 17:19:39 -06:00
Eric Liang 03a5b90ed6 Revert "Revert "Increase the number of unique bits for actors to avoi… (#12990) 2020-12-21 15:16:42 -08:00
architkulkarni 8b4b4bf0a2 [Serve] Migrate from Flask.Request to Starlette Request (#12852) 2020-12-21 15:34:15 -06:00
Hao Zhang 5b48480e29 [Collective][PR 3/6] Other collectives (#12864) 2020-12-21 12:48:00 -08:00
Barak Michener 43b9c7811e [ray_client] add client microbenchmarks (#13007) 2020-12-21 12:17:44 -08:00
Ameer Haj Ali 5e2b850836 [autoscaler] Fixes max_workers bug. (#13008) 2020-12-21 10:30:03 -08:00
Kai Yang 5a6801dde7 [Core] Remove delete_creating_tasks (#12962) 2020-12-22 00:01:27 +08:00
Barak Michener c576f0b073 [ray_client] Implement a gRPC streaming logs API for the client (#13001) 2020-12-20 19:35:34 -08:00
Barak Michener e715ade2d1 Support retrieval of named actor handles (#13000)
Change-Id: I05d31c9c67943d2a0230782cbdaa98341584cbc7
2020-12-20 16:34:50 -08:00
Barak Michener 80f6dd16b2 [ray_client] Implement optional arguments to ray.remote() and f.options() (#12985) 2020-12-20 15:43:48 -08:00
Ameer Haj Ali 11f34f72d8 [autoscaler] Do not count head node with min_workers constraint. (#12980) 2020-12-20 14:54:46 -08:00
Barak Michener 7ab9164f1b [ray_client] Integrate with test_basic, test_basic_2 and test_actor (#12964) 2020-12-20 14:54:18 -08:00
Philipp Moritz bf6577c8f4 Switch debugger to sockets and support unicode (#13004) 2020-12-20 12:10:28 -08:00
Ian Rodney d6e243ad46 [serve] Refactor to full control loop design (#12537) 2020-12-20 13:03:57 -06:00
Richard Liaw 038a50af52 [tune] skopt fix-extra-import (#12970)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-20 01:01:09 -08:00