Commit Graph

6071 Commits

Author SHA1 Message Date
Alex Wu 175fc41fbc [Autoscaler] Account for resource backlog size (#11261) 2020-10-12 09:43:48 -07:00
Sven Mika d3bc20b727 [RLlib] ConvTranspose2D module (#11231) 2020-10-12 15:00:42 +02:00
fangfengbin d1579819e9 [GCS]Eviction of dead nodes cached in GCS (#11323) 2020-10-12 15:54:32 +08:00
fangfengbin 31117b5e96 [GCS]Add job id to log (#11331) 2020-10-12 13:53:08 +08:00
Simon Mo 0d09a17c64 Skip set_result if the future is done (#11256) 2020-10-11 22:33:58 -07:00
Alex V. Kotlar f9a29a6d26 [docs] Fix pip install commands (#11326) 2020-10-11 22:12:18 -07:00
Sven Mika 957877ad3f Tf version of VisionNet (ray/rllib/model/tf/vision_net.py) crashes iff len(conv-filters)=1. (#11330) 2020-10-11 12:49:47 +02:00
Richard Liaw 56f858ed1a [tune][docs/util] gputil check, docs (#11260)
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-10-10 00:54:31 -07:00
fyrestone defd41aad7 [Dashboard] http route handler cache (#10921)
* Add aiohttp_cache to dashboard

* Add comments; Refine code

* Keep NODE_STATS_UPDATE_INTERVAL_SECONDS 1 second; Change AIOHTTP_CACHE_TTL_SECONDS to 2 seconds

* Update merge

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-10-09 22:27:05 -07:00
SangBin Cho 9dd4561d1b [Placement Group] Fix stress tests to pass when actors are scheduled. (#11151)
* Fix stress tests to pass when actors are created.

* Addressed code review.
2020-10-09 21:52:26 -07:00
chaokunyang 0737e78445 [Java] upgrade common-collections version (#10613) 2020-10-10 11:16:12 +08:00
Gekho457 48db6f8858 [autoscaler/k8s] namespace permissions problem (#11270) 2020-10-09 19:22:20 -05:00
Gekho457 92b4059cad Replace read_namespaced_pod_status with read_namespaced_pod (#11278) 2020-10-09 19:21:39 -05:00
Ian Rodney 5ef1784024 [Autoscaler] Fix sdk (#11314)
* Use

* [Hotfix] Make Optional[str] default to None

* Fix TempFile

* context manager (with statement)

* use throughout

* drop try/finally
2020-10-09 12:34:29 -07:00
fangfengbin 3eb2b9e216 [GCS]Random eviction of destroyed actors cached in GCS (#11189)
* add part code

* fix lint error

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-09 11:54:47 -07:00
fangfengbin ca36105d77 [TEST]Fix TestActorSubscribeAll bug (#11297)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-09 11:54:27 -07:00
Sumanth Ratna 40071739dc [docs] fix version warning banner location (#11286) 2020-10-08 21:21:42 -07:00
Kai Fricke b450cb030a [tune] reuse actors for function API (#11230)
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-10-08 16:15:02 -07:00
Thomas Tumiel 587319debc [tune] move _SCHEDULERS to tune.schedulers and add all available schedulers (#11218)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-08 16:10:23 -07:00
SangBin Cho 6cb00208f7 [Placement Group] Export bundle reservation check method only once. (#11153)
* Export bundle reservation check method only once.

* Addressed code review.
2020-10-08 16:08:28 -07:00
Richard Liaw 74e9647ec3 [tune] deactivate flaky test for now (#11284) 2020-10-08 15:40:04 -07:00
Amog Kamsetty 1027bfd4b8 [Tune, Ray SGD] Update PTL integrations (#11271) 2020-10-08 13:43:07 -07:00
Alex Wu a6f91664c1 [New Scheduler] Multi tenancy edge case (#11164)
* .

* refactor

* .

* .

* done?

* .

* .

* .

* lint

* no light heartbeat, no tests, fields 2,3

* .

* manually clang format :(

* .

* .

* test

* .

* .

* task manager heartbeat

* lint

* .

* add reminder

* CR

* CR

* cleanup

* CR

* comment

* lint

* .

* .
2020-10-08 13:19:01 -07:00
Pierre TASSEL 8d6a2774ef 'Policy' argument doesn't exist (#11276)
policy argument doesn't exist anymore and has been renamed to policy_class
2020-10-08 11:42:21 -07:00
Lee moon soo bd9619e207 include staroid/example-full.yaml in whl package (#11194) 2020-10-08 11:29:07 -07:00
desktable 8af9ff6dc2 [RLlib] Add MultiAgentEnv wrapper for Kaggle's football environment (#11249)
* [RLlib] Add MultiAgentEnv wrapper for Kaggle's football environment

* Add unit tests to BUILD

* Add gfootball dependency

* Revert the last two commits
2020-10-08 10:57:58 -07:00
scottwedge 732cd9901b Fix spelling of occurred (#10792) 2020-10-08 10:55:52 -07:00
SangBin Cho 174bef56d4 [Core] Publish gcs server failure to drivers. (#11265)
* Done.

* Fixed.
2020-10-08 08:59:31 -07:00
SangBin Cho 37fa86f9a0 [Placement Group] Fix placement group bugs that happen when rescheduling. (#11263)
* Fix placement group bugs while autoscaling.

* Addressed code review.
2020-10-08 08:58:59 -07:00
desktable f9621ce23c [RLlib] Add recsim_wrapper unit test to BUILD (#11225) 2020-10-08 08:23:27 +02:00
Siyuan (Ryans) Zhuang 47b7b76a5a [Example] commands need to be quoted (#11224) 2020-10-07 19:12:42 -07:00
Sumanth Ratna 14d8826e43 Fix overriden typo (#11227) 2020-10-07 19:11:07 -07:00
Anes Benmerzoug ff3e411ea2 [rllib] Fix VectorEnv's check for the info object's type (#10982) 2020-10-07 15:00:37 -07:00
Edward Oakes cd6936e60b Deflake test_env_with_subprocess.py (#11257) 2020-10-07 16:19:40 -05:00
Sven Mika 199e5d0f75 [RLlib] Exploration class type annotations. (#11251) 2020-10-07 21:59:14 +02:00
Ian Rodney b6314dd15c [metrics] Cleanup unused item in Docstring (#11254)
* Use

* remove unused tag
2020-10-07 11:39:38 -07:00
architkulkarni 6676b30eef [Serve] Serve multi node tests (#10980) 2020-10-07 10:57:40 -07:00
Simon Mo 68106425db [Serve] Disable serialization for client and print helpful msg (#11181) 2020-10-07 10:50:30 -07:00
Akash Patel 91d0f41a2f [CI] remove unsupported experimental_ui_deduplicate bazel flag (#11238) 2020-10-06 16:09:44 -07:00
Amog Kamsetty 3b76def2d2 [Docs] [Tune] Add NeuroCard to open source projects using Tune (#11213) 2020-10-06 14:22:32 -07:00
Kai Fricke e58613b5e5 [tune/dashboard] Fix Tune dashboard to work with all trainables (#11232) 2020-10-06 14:03:31 -07:00
Sven Mika ce96b03b07 [RLlib] MB-MPO cleanup (comments, docstrings, type annotations). (#11033) 2020-10-06 20:28:16 +02:00
Alex Wu d2a0d23b0e [Core] Fix master build failure (#11217)
Co-authored-by: Alex Wu <alex@Alexs-MacBook-Pro.local>
2020-10-06 10:23:34 -07:00
Gekho457 66e265fdb9 Type annotations added to node_provider.py (#11221) 2020-10-05 22:03:04 -07:00
Alex Wu 8ec044f1f5 [Autoscaler] Remove extra print statement (#11222) 2020-10-05 22:02:23 -07:00
Philsik Chang 2b26d2ca1b [rllib] Fix for Torch checkpoint taken on GPU fails to deserialize on CPU (#11071) (#11208) 2020-10-05 22:01:55 -07:00
Alex Wu dc7c2a70b8 [Core] Report worker backlog in GCS heartbeat (#11039) 2020-10-05 22:00:44 -07:00
Robert Nishihara 0d02aa10b2 Add Slack badge to README (#11215) 2020-10-05 19:39:23 -07:00
Ameer Haj Ali 518876ef20 Remove test_autoscaler_yaml from windows tests (#11197)
Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-10-05 17:07:00 -07:00
Ian Rodney ab30940e50 Use 'mkdir -p' in manylinux (#11219) 2020-10-05 16:18:18 -07:00