Commit Graph

6812 Commits

Author SHA1 Message Date
Sven Mika b6b54f1c81 [RLlib] Trajectory view API: enable by default for SAC, DDPG, DQN, SimpleQ (#11827) 2020-11-16 10:54:35 -08:00
Kai Fricke 8609e2dd90 [tune] refactor verbosity levels (#11767)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 10:32:53 -08:00
Keqiu Hu a50128079d [tune/placement group] dist. training placement group support (#11934)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 01:11:39 -08:00
fangfengbin 8fb926565c [Placement Group]Placement Group supports gcs failover (Part1) (#11933) 2020-11-16 14:42:56 +08:00
dHannasch d35de2272d [Core] Allow redis.ResponseError instead of redis.AuthenticationError (#12024)
* redis.ResponseError

* there really is no way to make this look good, is there
2020-11-15 15:04:56 -08:00
Simon Mo ac9610b19d [Autoscaler] Precisely match docker HOME (#12020)
* [Autoscaler] Precisely match docker HOME

The current grep will match any env variable keyed by HOME. This will
include some unwanted variables like PYTHONHOME, PROJECT_HOME, etc.
Depending on the order of the environment variable, the subsequent
docker setup command might fail.

* fstring
2020-11-15 11:49:50 -08:00
Richard Liaw 8b3f79f307 [tune] refactor and add examples (#11931) 2020-11-14 20:43:28 -08:00
dHannasch 5891759a3e Clarify get_node_ip_address docstring (#11881) 2020-11-14 15:20:58 -08:00
dHannasch 9fbeefd604 Distinguish a bad --redis-password from any other Redis error (#11893) 2020-11-13 17:39:44 -06:00
Eric Liang 4f5d6274af [docs] Add links to Ray design patterns whitepaper (#12014)
* update

* update
2020-11-13 14:16:51 -08:00
Edward Oakes 8bcb0bddc9 [serve] Fix API calls in global README (#12015) 2020-11-13 16:05:00 -06:00
dHannasch effa553077 [Doc] Explain how to know whether RAY_BACKEND_LOG_LEVEL worked (#12010)
* Fix broken link to nonexistent Temporary Files page.

* How to know that RAY_BACKEND_LOG_LEVEL worked.

* Reference the definition of DEBUG in case it changes.
2020-11-13 14:02:57 -08:00
Simon Mo 277558895d [Serve] Introduce Long Polling (#11905) 2020-11-13 13:17:20 -08:00
Eric Liang 00ef1179c0 [object spilling] Autocreate dir if not exists (#11999) 2020-11-13 12:13:06 -08:00
Ian Rodney f936ea35fe [hotfix] Fix ResourceDemandScheduler (#11996)
* [hotfix] Fix ResourceDemandScheduler

* fix test_autoscaler
2020-11-13 00:42:16 -08:00
SangBin Cho f6f9b15299 . (#11998) 2020-11-12 21:33:00 -08:00
Ian Rodney 3b56a1a522 [docker] auto-populate shared memory size (#11953) 2020-11-12 17:22:42 -08:00
Michael Luo 59bc1e6c09 [RLLib] MAML extension for all models except RNNs (#11337) 2020-11-12 16:51:40 -08:00
Barak Michener 272edcca94 [ray_client]: Implement function calls (#11922) 2020-11-12 16:49:34 -08:00
Eric Liang a6a8e777f3 [autoscaler] Interpret autoscaling_speed as 1/x-1 of previous target util fraction (#11961)
* tweak

* update
2020-11-12 16:23:50 -08:00
Sven Mika 0bd69edd71 [RLlib] Trajectory view API: enable by default for ES and ARS (#11826) 2020-11-12 10:33:10 -08:00
Michael Luo 6e6c680f14 MBMPO Cartpole (#11832)
* MBMPO Cartpole Done

* Added doc
2020-11-12 10:30:41 -08:00
Ian Rodney 9254de0b02 [autoscaler] Fix custom node resources on head (#11896) 2020-11-12 10:30:04 -08:00
Gekho457 ad639f12d8 [autoscaler/k8s] Preliminary k8s operator (#11929) 2020-11-12 11:58:02 -06:00
Gabriele Oliaro 4744ed01f7 Queueing non-actor tasks at the workers (#11051)
* separated adding tasks to queue and executing them (worker side)

* linting

* first review

* second rev

* rev3, all tests passing locally

* linting

* rev4

* linting

* finished rev4, all tests passing locally (mac)

* rev4, all tests passing locally

* linting

* rev5

* bug fix

* hopefully fixed build

* nvm

* ptr cast

* linting

* no special treatment for actor creation tasks
2020-11-12 12:44:13 -05:00
Kai Fricke 02c02369ca [tune] Fix hpo randint limits (#11946)
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
2020-11-12 08:45:49 -08:00
Kristian Hartikainen 07f401d99d [tune] Fix unflatten dict (#11948) 2020-11-12 08:43:15 -08:00
Lee moon soo 9920933e31 [docker] Support non-root container (#11407) 2020-11-12 08:41:50 -08:00
Sven Mika 62c7ab5182 [RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). (#11747) 2020-11-12 16:27:34 +01:00
Michael Luo 59ccbc0fc7 [RLlib] Model Annotations: Tensorflow (#11964) 2020-11-12 12:18:50 +01:00
Michael Luo b2984d1c34 [RLlib] Model Annotations to Torch Models (#9749) 2020-11-12 12:16:12 +01:00
Tao Wang 3fbd8be851 [Placement Group]Do not really subtract resources, just count (#11894)
* [Placement Group]Do not really subtract resources, just count

* add todo
2020-11-12 00:01:19 -08:00
SangBin Cho f80d812799 [Object Spilling] Introduce SpillWorker & RestoreWorker Pool to avoid IO worker deadlock. (#11885) 2020-11-11 18:20:14 -08:00
Barak Michener de6df51bd2 [redis, docs]: Bump redis and docs/Pillow dependencies (#11371) 2020-11-11 18:15:27 -08:00
Max Fitton f545418c3f [Dashboard] Fix dashboard regression caused by logCount and errCount being removed from worker payload (#11954) 2020-11-11 14:55:54 -08:00
Edward Oakes 73a1cb702b Split _get_node_provider_cls off from _get_node_provider (#11949) 2020-11-11 16:10:46 -06:00
Ameer Haj Ali 85197deece [autoscaler] Remove legacy autoscaler (#11802) 2020-11-11 13:36:48 -08:00
Sven Mika 72fc79740c [RLlib] Issue with pickle versions (breaks rollout test cases in RLlib). (#11939) 2020-11-11 21:52:21 +01:00
dHannasch 396ae0b7c2 Add docstring for find_redis_address (#11884) 2020-11-11 12:24:36 -06:00
Sven Mika 291c172d83 [RLlib] Support Simplex action spaces for SAC (torch and tf). (#11909) 2020-11-11 18:45:28 +01:00
Kai Yang 4735c032ed [Core] Fix C++ worker test (#11941) 2020-11-11 09:04:45 -08:00
Tao Wang 92286660e4 [Core] Lazy create node manager clients, and destroy then (#11928) 2020-11-11 08:51:40 -08:00
SangBin Cho 7b8bd15702 [Stalebot] Fix issues. (#11930) 2020-11-11 00:28:02 -08:00
Siyuan (Ryans) Zhuang b8dda0e3d0 [Serialization] Fix buffer alignment issues (#11888)
* fix buffer alignment issues

* remove unused fields

* aligned memory allocation

* windows compat

* license. fix compiler warnings

* fix compilation error

* reinterpret_cast
2020-11-10 23:44:16 -08:00
chaokunyang 1979ea9c0a fix disable javadoc lint (#11907) 2020-11-11 13:40:50 +08:00
dHannasch 29cb32539e [Core] If failed to connect to redis, try to say why. (#11916) 2020-11-10 18:22:10 -08:00
fangfengbin 433e4f32da [GCS]Reduce get operations of worker table (#11599)
* [GCS]Reduce get operations of worker table

* fix ut bug

* fix ut bug

* fix review comment

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-10 18:11:25 -08:00
Alex Wu 8afd2acdc1 [Autoscaler] simulator placement groups (#11777) 2020-11-10 18:10:36 -08:00
Eric Liang 46f3652102 Remove repeat push timeout from object manager (#11874) 2020-11-10 16:26:53 -08:00
Keqiu Hu 0c1bdaef59 [tune] TensorFlow Distributed Trainable (#11876)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-10 14:59:08 -08:00