Commit Graph

3483 Commits

Author SHA1 Message Date
Edward Oakes dee696577f Fix passing object ids in local mode (#6170) 2019-11-15 15:46:39 -08:00
Edward Oakes 33040d734f Disable stopgap GC by default (#6165)
* disable stopgap gc by default

* fix gc testss
2019-11-15 15:42:59 -08:00
Hersh Godse 7aa06fb25c [tune] ExperimentalAnalysis in-memory cache (#5962) 2019-11-15 12:47:50 -08:00
Eric Liang 7d33e9949b Integrate ref count module into local memory store (#6122) 2019-11-15 10:52:19 -08:00
Richard Liaw 62cbc043b4 [tune] tbx logger (#6133)
* tbx

* add_hparams

* fix_hparams

* ok

* ok

* fix

* ok

* fix
2019-11-15 08:45:44 -08:00
Eric Liang 8ff393a7bd Handle exchange of direct call objects between tasks and actors (#6147) 2019-11-14 17:32:04 -08:00
Edward Oakes 385783fcec Ray on YARN + Skein Documentation (#6119) 2019-11-14 15:06:05 -08:00
Edward Oakes 2758cd0b34 Make log message debug (#6166) 2019-11-14 15:05:36 -08:00
Edward Oakes e3b95dafeb Fix sigterm_handler (#6141) 2019-11-14 13:41:50 -08:00
Eric Liang 243b1b7281 [rllib] Add microbatch optimizer with A2C example (#6161) 2019-11-14 12:14:00 -08:00
Eric Liang 0a3623ded6 Fix memory store wait (#6152) 2019-11-14 10:17:30 -08:00
Stephanie Wang bbadde57e0 Pass through caller address when submitting a task (#6143)
* Add RpcAddress, set in actor table data

* Pass through task caller address

* RpcAddress -> Address

* update

* fix

* lint

* fix cc tests
2019-11-14 09:14:08 -08:00
Ujval Misra e3e3ad4b25 Add timeout param to ray.get (#6107) 2019-11-14 00:50:04 -08:00
waldroje e4c0843f60 Allow EntropyCoeffSchedule to accept custom schedule (#6158)
* modify tf_policy to enable EntropyCoeffSchedule to handle list, and avoid negative values under current implementation

* Update custom_metrics_and_callbacks.py

* Update tf_policy.py
2019-11-14 00:45:43 -08:00
Eric Liang e4565c9cc6 Reduce RLlib log verbosity (#6154) 2019-11-13 18:50:45 -08:00
Edward Oakes 51e76151d6 Use shared_ptr for gcs client in profiler (#6150) 2019-11-13 15:24:01 -08:00
Philipp Moritz f24d96ec4f Revert "Try to enable dashboard (again) (#6069)" (#6159)
This reverts commit 4044af8520.
2019-11-13 12:32:12 -08:00
Eric Liang b924299833 Add large scale regression test for RLlib (#6093) 2019-11-13 12:22:55 -08:00
Eric Liang f3f86385d6 Minimal implementation of direct task calls (#6075) 2019-11-12 11:45:28 -08:00
Stephanie Wang 35d177f459 Use grpc for communication from worker to local raylet (task submission and direct actor args only) (#6118)
* Skeleton for SubmitTask proto

* Pass through node manager port, connect in raylet client

* Switch submit task to grpc

* Check port in use

* doc

* Remove default port, set port randomly from driver

* update

* Fix test

* Fix object manager test
2019-11-11 21:17:25 -08:00
Siyuan (Ryans) Zhuang f48293f96d Fix deprecated warning (#6142) 2019-11-11 17:49:15 -08:00
Simon Mo c75ada9e04 [Autoscaler][K8s] Enforce memory limit in k8s yaml (#6138)
* Enforce memory limit in k8s yaml

* Update python/ray/autoscaler/kubernetes/example-full.yaml

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Line wrap
2019-11-11 14:06:34 -08:00
Adi Zimmerman 776b071f3b [tune] Let Search Algorithms use early stopped trials (#5651) 2019-11-11 09:38:14 -08:00
Edward Oakes 5780ec1b62 Refresh ObjectIDs in raylet for stopgap GC (#6109) 2019-11-10 23:12:59 -08:00
Philipp Moritz decaa65cd6 Use pickle by default for serialization (#5978) 2019-11-10 18:12:18 -08:00
Adam Gleave 01aee8d970 [autoscaler] Retry creating EC2 instances in new AZ (#6129) 2019-11-09 19:44:27 -08:00
Miguel Morales d17ae5ad7a Update hyperband-cartpole.yaml (#6121)
Typo
2019-11-09 19:39:03 -08:00
Adam Gleave c157e93ba1 [tune] Retry failed tasks with checkpointing disabled (#6126)
* Allow recovery for failed tasks without checkpointing

* Update docs
2019-11-09 19:35:27 -08:00
Philipp Moritz ccbcc4bafa Use GRCP and Bazel 1.0 (#6002) 2019-11-08 15:58:28 -08:00
Eric Liang afca6d3d87 Object store full with cyclic python references (#6114) 2019-11-08 14:08:24 -08:00
Edward Oakes 83378a8610 Improve flaky test_warning_monitor_died (#6113) 2019-11-08 12:11:15 -08:00
Eric Liang 4044af8520 Try to enable dashboard (again) (#6069)
* Revert "Revert "Enable the Ray dashboard by default (#5976)" (#6068)"

This reverts commit 1a3e97cf23.

* fix tests that assume the dashboard isn't a job

* travis
2019-11-08 10:48:48 -08:00
Philipp Moritz 5a05eaaa54 Fix compilation on master (#6116) 2019-11-07 22:38:42 -08:00
Eric Liang 4a28306186 Allow large returns from direct actor calls (#6088) 2019-11-07 21:28:55 -08:00
Edward Oakes ca53af4d0f Add pending task dependencies to ObjectID ref counting (#6054) 2019-11-07 18:37:10 -08:00
Eric Liang 1f043daf69 [rllib] Fix and add test for LR annealing config 2019-11-07 12:17:27 -08:00
Simon Mo fcb6bdbc39 [Doc] Document Actor.options API (#6099)
* Document Actor.options API

* Undocument _remote
2019-11-06 23:12:23 -08:00
Edward Oakes 9820c10a09 Simplify gRPC service definition for the worker (#6095) 2019-11-06 13:00:39 -08:00
David Bignell 3f83b2daa9 [rllib] Rollout extensions (#6065)
* Rollout improvements

* Make info-saving optional, to avoid breaking change.

* Store generating ray version in checkpoint metadata

* Keep the linter happy

* Add small rollout test

* Terse.

* Update test_io.py
2019-11-05 20:34:18 -08:00
Eric Liang 2a0225dd25 [rllib] RLlib chooses wrong neural network model for Atari in 0.7.5 (#6087) 2019-11-05 11:36:29 -08:00
daiyaanarfeen 8f6d73a93a [sgd] Extend distributed pytorch functionality (#5675)
* raysgd

* apply fn

* double quotes

* removed duplicate TimerStat

* removed duplicate find_free_port

* imports in pytorch_trainer

* init doc

* ray.experimental

* remove resize example

* resnet example

* cifar

* Fix up after kwargs

* data_dir and dataloader_workers args

* formatting

* loss

* init

* update code

* lint

* smoketest

* better_configs

* fix

* fix

* fix

* train_loader

* fixdocs

* ok

* ok

* fix

* fix_update

* fix

* fix

* done

* fix

* fix

* fix

* small

* lint

* fix

* fix

* fix_test

* fix

* validate

* fix

* fi
2019-11-05 11:16:46 -08:00
Mitchell Stern 82be14f943 Move gRPC calls outside of Raylet stats lock (#6090) 2019-11-05 00:47:15 -08:00
mehrdadn e312f3d282 Compatibility issues (#6071)
* Pass -f - to tar to force stdin on Windows

* Quote paths that may contain spaces (causes issues on Windows)

* Copy over Windows code from Arrow for glog signal handle uninstall

* Add missing COPTS to build rules since we'll need them for Windows compatibility

* Begin adding COPTS for Windows compatibility

* Disable glog on Arrow until we change WIN32 to _WIN32 there

* Missing header files that cause problems on Windows

* WORD typedef conflicts with Windows; remove it

* uint -> unsigned int wherever we're dealing with milliseconds (signed version is already int)

* uint -> unsigned int for enums

* uint -> size_t, wherever we're dealing with sizes or indices into arrays

* Work around Boost 1.68 bug in detecting clang-cl (revert this after upgrading)

* Missing #include <unistd.h>

* Add check for signal handler uninstallation failure

* Linting issue
2019-11-05 00:08:14 -08:00
Philipp Moritz fefe050a58 Fix running out of file descriptors in the WebUI (#6086) 2019-11-04 21:17:36 -08:00
Edward Oakes 043d1f4094 Return RayObjects to core worker (#6052) 2019-11-04 20:27:57 -08:00
visatish 18241f4a2d [tune] Added resources_per_trial arg to validate_save_restore u… (#6032) 2019-11-04 13:24:46 -08:00
Simon Mo c23eae5998 [Serve] Fix router-worker communication (#5961)
* Half way there, needs the strict queuing fix

* Fix scale down, use callback

* Cleanup

* Address commments

* Comment, nit

* Fix docstring
2019-11-04 11:29:21 -08:00
Eric Liang 8485304e83 Support concurrent Actor calls in Ray (#6053) 2019-11-04 01:14:35 -08:00
Eric Liang fbad6f543b Try fixing actor handle destruction on py2 (#6076) 2019-11-03 22:46:40 -08:00
Philipp Moritz 1c5446851a Use Plasma with LRU refreshing integrated (#6050) 2019-11-03 16:19:05 -08:00