Commit Graph

6812 Commits

Author SHA1 Message Date
Lingxuan Zuo 2f3ec4ef75 add streaming data writer unit tests (#11387) 2020-10-16 14:42:44 +08:00
mattearllongshot 049985549b Restore uptime timeout to 5 seconds (#11300) 2020-10-15 17:21:11 -07:00
herve-alanaai 436202bcfd [docs] Fix typos in documentation (#11414) 2020-10-15 17:00:48 -07:00
Ian Rodney afd797b896 [docker] Check for GPUs before setting runtime-nvidia (#11418) 2020-10-15 15:43:09 -07:00
Amog Kamsetty 38eb61442b [SGD] Callback API for SGD+Tune (#11316)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-15 15:22:14 -07:00
Sven Mika 414041c6dd [RLlib] Do not create env on driver iff num_workers > 0. (#11307) 2020-10-15 18:21:30 +02:00
Sumanth Ratna 60a4be4a59 [tune] Remove metric and mode kwargs from create_searcher (#11335) 2020-10-14 21:44:36 -07:00
Sumanth Ratna 3fe757391b [tune] Add Basic Variant Generator to search algorithm shim function (#11334)
* Add Basic Variant Generator

* Add 'random' key to SEARCH_ALG_IMPORT

Co-authored-by: Kai Fricke <kai@anyscale.com>

Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-10-14 21:41:47 -07:00
Vishnu Deva 00e0f14c6f [tune] restore trials when sync_on_checkpoint is False (#11355)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-14 19:09:34 -07:00
Kai Fricke f7120d2a18 [tune] Make metrics parameter optional in pytorch lightning integration (#11402) 2020-10-14 17:50:34 -07:00
SongGuyang 34191107a3 [cpp worker] fix crash in empty args task (#11363) 2020-10-14 16:48:34 -07:00
SangBin Cho 666fcde8ca [Placement group] Input validation (#11152)
* Add a basic input validation.

* Addressed code review.
2020-10-14 13:56:41 -07:00
Ameer Haj Ali a10e36ca04 Make the logging of gc.collect() freed refs appear in DEBUG not INFO (#11353) 2020-10-14 13:14:35 -07:00
Alex Wu 7466ce82df [Autoscaler] Placement group autoscaling (#11243) 2020-10-14 13:11:46 -07:00
Eric Liang aefcf901d3 [docs] Add sklearn integration link 2020-10-14 13:07:23 -07:00
SangBin Cho b1481c6acf Revert "[PlacementGroup]Add node manager test framework (#11174)" (#11398)
This reverts commit 241e765d3a.
2020-10-14 11:09:20 -07:00
Lingxuan Zuo 149ec5f6bf [Log] dump stacktrace from glog lib (#11360)
* dump stacktrace from glog lib

* fix windows compile

* add comments for getcallstack
2020-10-14 10:52:12 -07:00
Kai Yang abc6126814 [Java] Release actor instance reference when Ray.exitActor() is invoked (#11324) 2020-10-14 13:12:59 +08:00
fangfengbin c926838411 [GCS]Fix GcsActorManagerTest multithreading bug (#11361)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-13 21:36:40 -07:00
Simon Mo 5637093f44 Add Serve load testing tool to long running test yaml (#11386) 2020-10-13 20:24:57 -07:00
Simon Mo 866193b01c Fix cluster yaml for serve benchmarks (#11383)
- Separate out single node and multiple node yamls
- Remove cluster_synced_files, somehow it breaks for me
2020-10-13 19:30:18 -07:00
fangfengbin 241e765d3a [PlacementGroup]Add node manager test framework (#11174)
* add part code

* add part code

* add part code

* add part code

* add part code

* add part code

* fix ut bug

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-13 19:27:11 -07:00
Max Fitton cd9dcfca0d [Dashboard] CPU/GPU usage details in actor pane (#11269) 2020-10-13 20:23:23 -05:00
Amog Kamsetty 933cf6675c [Tune] Changes for Pytorch Lightning 1.0 (#11375) 2020-10-13 15:50:11 -07:00
Sven Mika a6a94d3206 [RLlib] Fix test_env_with_subprocess.py. (#11356) 2020-10-13 12:42:20 -07:00
J Seppänen 63fa0a53a3 [k8s] Fix kubernetes cloud cluster example configuration (#11364) 2020-10-13 12:28:55 -07:00
Ian Rodney 84617f6ff6 [docker] Script for quickly fixing all Latest images (#11351) 2020-10-13 09:36:40 -07:00
Simon Mo 39e809fa03 Update microbenchmark script to use Python 3.8 wheel (#11357) 2020-10-13 09:27:52 -07:00
fangfengbin 0c02427da2 [GCS]Eviction of destroyed actors cached in GCS (#11338) 2020-10-13 15:34:35 +08:00
Lingxuan Zuo c84a9b457c [Streaming] add barrier helper tests (#11107) 2020-10-13 09:55:55 +08:00
Ian Rodney 6426fb3fff [CI] Fix-Up Docker Build (Use Python) (#11139) 2020-10-12 14:22:51 -07:00
Sven Mika 1ebcdf236f [RLlib] Add support for custom MultiActionDistributions. (#11311) 2020-10-12 13:50:43 -07:00
Sven Mika 0c0f67c14d [RLlib] ARS/ES eval workers not working: Issue 9933. (#11308) 2020-10-12 13:49:48 -07:00
Sven Mika 8ea1bc5ff9 [RLlib] Allow for more than 2^31 policy timesteps. (#11301) 2020-10-12 13:49:11 -07:00
Sven Mika f5e2cda68a [RLlib] SAC: log_alpha not being learnt when on GPU. (#11298) 2020-10-12 13:48:44 -07:00
Julius Frost 7dcfd258cd [RLlib] Assert LongTensor in SAC Discrete PyTorch (#11245) 2020-10-12 13:47:21 -07:00
Sven Mika 580820a530 [RLlib] Create ci/rllib_tests and organize a little (#11342) 2020-10-12 12:05:09 -07:00
SangBin Cho c107eea551 [Core] Do not report stats when worker is already dead. (#11167)
* Fix.

* Addressed code reivew.

* Done.
2020-10-12 11:57:04 -07:00
SangBin Cho 56f69543d0 Try to deflake test_failure (#11293) 2020-10-12 12:03:36 -05:00
Ameer Haj Ali 06fe690682 [autoscaler] Limit max launch concurrency per node type (#11242) 2020-10-12 09:45:52 -07:00
Sumanth Ratna 92a58aabce [tune][docs] Fix learning rate bounds in FAQ (#11345) 2020-10-12 09:44:53 -07:00
Alex Wu 175fc41fbc [Autoscaler] Account for resource backlog size (#11261) 2020-10-12 09:43:48 -07:00
Sven Mika d3bc20b727 [RLlib] ConvTranspose2D module (#11231) 2020-10-12 15:00:42 +02:00
fangfengbin d1579819e9 [GCS]Eviction of dead nodes cached in GCS (#11323) 2020-10-12 15:54:32 +08:00
fangfengbin 31117b5e96 [GCS]Add job id to log (#11331) 2020-10-12 13:53:08 +08:00
Simon Mo 0d09a17c64 Skip set_result if the future is done (#11256) 2020-10-11 22:33:58 -07:00
Alex V. Kotlar f9a29a6d26 [docs] Fix pip install commands (#11326) 2020-10-11 22:12:18 -07:00
Sven Mika 957877ad3f Tf version of VisionNet (ray/rllib/model/tf/vision_net.py) crashes iff len(conv-filters)=1. (#11330) 2020-10-11 12:49:47 +02:00
Richard Liaw 56f858ed1a [tune][docs/util] gputil check, docs (#11260)
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-10-10 00:54:31 -07:00
fyrestone defd41aad7 [Dashboard] http route handler cache (#10921)
* Add aiohttp_cache to dashboard

* Add comments; Refine code

* Keep NODE_STATS_UPDATE_INTERVAL_SECONDS 1 second; Change AIOHTTP_CACHE_TTL_SECONDS to 2 seconds

* Update merge

Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-10-09 22:27:05 -07:00