Commit Graph

2234 Commits

Author SHA1 Message Date
Philipp Moritz b4656ca244 Fix dashboard profiling (#8013) 2020-04-14 08:30:16 -07:00
Robert Nishihara d985d7537e Replace all instances of ray.readthedocs.io with ray.io (#7994) 2020-04-13 16:17:05 -07:00
Richard Liaw e97adba6ac [autoscaler] Improve argument handling for submit (#7986)
* docs

* Apply suggestions from code review

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* ok

Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-04-13 15:53:42 -07:00
ZhuSenlin 4a81793ba5 GCS-Based actor management implementation (#6763)
* add gcs actor manager

* fix test_metrics.py

* fix TestTaskInfo

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix compile error

* fix merge error

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-04-13 09:48:48 -07:00
mehrdadn 1b0f6fd558 Check AF_UNIX path length (#7951) 2020-04-13 09:30:01 -07:00
Edward Oakes 2cb9cfb2b6 [serve] Make workers fault tolerant (#7970) 2020-04-12 11:48:08 -05:00
Qing Wang 98bfcd53bc [Java] Rename group id and package name. (#7864)
* Initial

* Change streaming's

* Fix

* Fix

* Fix org_ray

* Fix cpp file name

* Fix streaming

* Fix

* Fix

* Fix testlistening

* Fix missing sth in python

* Fix

* Fix

* Fix SPI

* Fix

* Fix complation

* Fix

* Fix CI

* Fix checkstyle

Fix checkstyle

* Fix streaming tests

* Fix streaming CI

* Fix streaming checkstyle.

* Fix build

* Fix bazel dep

* Fix

* Fix ray checkstyle

* Fix streaming checkstyle

* Fix bazel checkstyle
2020-04-12 17:59:34 +08:00
Stephanie Wang d7eef808b8 [core] Reconstruction for lost plasma objects (#7733)
* Add a lineage_ref_count to References

* Refactor TaskManager to store TaskEntry as a struct

* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs

* Pin TaskEntries and References in the lineage of any ObjectIDs in scope

* Fix deadlock, convert num_plasma_returns to a set of object IDs

* fix unit tests

* Feature flag

* Do not release lineage for objects that were promoted to plasma

* fix build

* fix build

* Remove num executions

* Remove num executions

* Add pinned locations to ReferenceCounter, empty handler for node death

* Fix num returns for actor tasks, fix Put return value

* Add regression test

* Clear pinned locations and callbacks on node removal

* Clear pinned locations and callbacks on node removal

* Simplify num return values

* Remove unused

* doc

* tmp

* Set num returns

* Move lineage pinning flag to ReferenceCounter

* comments

* Recover from plasma failures by pinning a new copy

* Basic object reconstruction, no concurrent reqs yet

* reconstruction test suite and a few fixes:
- fix for disabling lineage
- fix for updating submitted task refs

* Handle concurrent attempts to recover the same object

* Fix deadlock in DrainAndShutdown

* Revert "[core] Revert lineage pinning (#7499) (#7692)"

This reverts commit ba86a02b37.

* debug rllib

* debug rllib

* turn on all rllib tests again

* debug rllib

* Fix drain bug, check number of pending tasks

* revert rllib debug

* remove todo

* Trigger rllib tests

* revert rllib debug commit

* Split out logic into ObjectRecoveryManager

* Fix python tests

* Refactor to remove dependency on gcs client

* Unit tests

* Move pinned at node ID to direct memory store

* Unit test fixes and lint

* simplify and more tests

* Add ResubmitTask test for TaskManager

* Doc

* fix build

* comments

* Fix

* debug

* Update

* fix

* Fix

* Fix bad status handling, unit test

* Fix build
2020-04-11 16:52:57 -07:00
Stephanie Wang 18e9a076e5 [core] Cancel worker lease requests that are no longer needed (#7929)
* regression test

* Cancel lease requests

* unit tests

* update

* fix build

* Move unit test

* Set success

* Ref to shared_ptr

* debug

* Revert "debug"

This reverts commit 6b2c25805a8223b41ffcc2d88d903e16ea415089.

* Bad move

* Fix bad status handling
2020-04-11 16:51:32 -07:00
Richard Liaw 87e3c39b48 [tune] Ensure Cleanup (#7967) 2020-04-11 16:28:03 -07:00
Richard Liaw dd63178e91 [sgd] Semantic Segmentation Example (#7825)
* better_example

* test

* improve some usability things

* submit

* fix

* making a segmentation example

* segmentation_example

* segmentation

* device

* flake

* Update python/ray/util/sgd/torch/training_operator.py

* uti

* finished_example

* block

* format

* locationg

* fix

* ok

* revert

* segmentation

* lint_and_test

* address_comments
2020-04-10 20:35:45 -07:00
mehrdadn 0b4e09da76 Log to terminal if glog is also doing so (#7868) 2020-04-10 18:41:21 -05:00
aannadi 9e31ee991a [Dashboard] Configure Subset of Parameters/Metrics and show Err… (#7726)
* Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors
2020-04-10 13:27:52 -07:00
Edward Oakes 7be7af11ab [serve] Push requests to workers instead of polling via dequeue_request (#7965) 2020-04-10 14:47:03 -05:00
Edward Oakes d8f5b52265 [serve] Don't use mixin class for class-based backends (#7957) 2020-04-10 12:01:14 -05:00
marload e3ffb8ac28 [tune] Refactoring: Deduplicate (#7918)
* refactoring: Deduplication

* refactoring: Deduplication

* refactoring: Deduplication

* refactoring: Deduplication

* lint fix: Variable naming case

* fix: Remove White Space

* fix_lint

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-04-09 20:19:04 -07:00
Edward Oakes 305eb74a86 [serve] Make HTTP proxy fault tolerant (#7936) 2020-04-09 17:07:22 -05:00
Simon Mo 870271d51f [Serve] Call serve.init in function handler (#7947) 2020-04-09 11:46:15 -07:00
Simon Mo 59867dad75 Move Jenkins test to Github action (#7342) 2020-04-09 10:27:19 -07:00
David Chan 6521e92a95 [RaySGD] Honor the use_gpu flag (#7942) 2020-04-08 20:20:09 -07:00
ijrsvt 44825d81e9 Change Proctitle to IDLE after an Error (#7863) 2020-04-08 11:33:43 -07:00
fyrestone fc6259a656 Cross language serialization for primitive types (#7711)
* Cross language serialization for Java and Python

* Use strict types when Python serializing

* Handle recursive objects in Python; Pin msgpack >= 0.6.0, < 1.0.0

* Disable gc for optimizing msgpack loads

* Fix merge bug

* Java call Python use returnType; Fix ClassLoaderTest

* Fix RayMethodsTest

* Fix checkstyle

* Fix lint

* prepare_args raises exception if try to transfer a non-deserializable object to another language

* Fix CrossLanguageInvocationTest.java, Python msgpack treat float as double

* Minor fixes

* Fix compile error on linux

* Fix lint in java/BUILD.bazel

* Fix test_failure

* Fix lint

* Class<?> to Class<T>; Refine metadata bytes.

* Rename FST to Fst; sort java dependencies

* Change Class<?>[] to Optional<Class<?>>; sort requirements in setup.py

* Improve CrossLanguageInvocationTest

* Refactor MessagePackSerializer.java

* Refactor MessagePackSerializer.java; Refine CrossLanguageInvocationTest.java

* Remove unnecessary dependencies for Java; Add getReturnType() for RayFunction in Java

* Fix bug

* Remove custom cross language type support

* Replace Serializer.Meta with MutableBoolean

* Remove @SuppressWarnings support from checkstyle.xml; Add null test in CrossLanguageInvocationTest.java

* Refine MessagePackSerializer.pack

* Ray.get support RayObject as input

* Improve comments and error info

* Remove classLoader argument from serializer

* Separate msgpack from pickle5 in Python

* Pair<byte[], MutableBoolean> to Pair<byte[], Boolean>

* Remove public static <T> T get(RayObject<T> object), use RayObject.get() instead

* Refine test

* small fixes

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-04-08 21:10:57 +08:00
Edward Oakes 85481d635d [serve] Call serve.init() before initializing backends (#7922) 2020-04-07 17:22:52 -05:00
Edward Oakes 1be87c7fbb [serve] Remove global state, instead access the master actor directly (#7914)
* Move _scale() to master actor

* move create_backend

* Move set_backend_config

* Move get_backend_config

* Remove backend_table from global_state

* Remove global_state, just access master directly

* Remove accidental addition
2020-04-07 15:21:40 -05:00
Edward Oakes d3c310f408 [serve] Only access backend_table in master actor (#7913) 2020-04-07 10:12:39 -05:00
Kai Yang 48b48cc8c2 Support multiple core workers in one process (#7623) 2020-04-07 11:01:47 +08:00
Richard Liaw f63b4c1110 [sgd] make ddp optional (#7875)
* loosen

* devices

* tryitout

* fix

* fix

* fix

* easy

* test

* fix

* fix

* better visibility

* fix
2020-04-06 11:41:36 -07:00
SangBin Cho 73fd78316d [Dashboard] Authentication (#7888)
* Change authentication schema.

Authentication implementation.

* Formatting.

* Fix a minor style.

* Fix tests.

* Removed url validation.
2020-04-04 19:40:54 -07:00
Allen 3c91ff1f63 [autoscaler] Allowing users to provide extra configs for AWS (#7844)
* Allowing users to provide custom key names & security group inbound rules

* linting

* getting aws credentials passed in

* one more thing

* one more thing part 2

* formatting

* addressing comments

* update

* update

* update

* update

* update

* update

* remove tests

* rerun tests

Co-authored-by: Allen Yin <allenyin@anyscale.io>
2020-04-04 18:36:51 -07:00
acxz 7827d2c2de Add wheel build dependency (#7877) 2020-04-03 18:10:34 -07:00
ijrsvt e03f687b84 Cleaning up remaining Local Mode Code (#7865) 2020-04-03 19:54:15 -05:00
Markus Cozowicz b853df7a3b [autoscaler] Switch to ARM for Azure deployment (#7717)
* switch to ARM templates for config and VMs

* switch to ARM templates for config and VMs

* auto-formatting

* addressed Scotts comment

* added missing imports

* fixed gpu templates
fixed wheel reference

* added missing reference

* cleanup wording and yamls

* Update doc/source/autoscaling.rst

Co-Authored-By: Scott Graham <5720537+gramhagen@users.noreply.github.com>

Co-authored-by: Ubuntu <marcozo@marcozodev2.zqvgrdyupqrudayw1il1agipig.jx.internal.cloudapp.net>
Co-authored-by: Scott Graham <5720537+gramhagen@users.noreply.github.com>
2020-04-03 15:51:56 -07:00
SangBin Cho 1d532d1cb8 [Dashboard ]Action Implementation. (#7826) 2020-04-02 18:02:37 -07:00
Edward Oakes 7f9ddfcfd8 Only access route_table and policy_table in master actor (#7835) 2020-04-02 14:44:53 -07:00
Edward Oakes cbe494ab13 [flaky test] Fix flaky test_heartbeats_single (#7857) 2020-04-02 16:23:28 -05:00
ijrsvt 9bfc2c4b54 Moving Local Mode to C++ (#7670) 2020-04-01 15:50:57 -05:00
mehrdadn 65054a2c7c Python 3.8 compatibility (#7754) 2020-04-01 10:03:23 -07:00
Richard Liaw 24bf6ad607 [raysgd] Improve raysgd examples (#7818)
* better_example

* test

* improve some usability things

* submit

* fix

* flake

* Update python/ray/util/sgd/torch/training_operator.py

* trythis

* fix

* fix

* smoke

* fail

* fix

* fix
2020-04-01 08:58:39 -07:00
Edward Oakes f4239d27fa [serve] Create all other actors in master actor (#7791) 2020-04-01 10:15:04 -05:00
Robert Nishihara b011c604d7 Remove ray.tasks() from API. (#7807) 2020-04-01 10:10:40 -05:00
SangBin Cho c23e56ce9a Metrics Export Service (#7809) 2020-03-30 23:28:32 -07:00
mehrdadn 8958728139 Windows bug fixes (#7740) 2020-03-30 20:39:23 -05:00
Simon Mo dc9b62e007 Deserialize Args in Event Loop Thread (#7806) 2020-03-30 18:28:13 -07:00
Richard Liaw fbf02fa7f7 [Hotfix] Lint for Documentation (#7817) 2020-03-30 11:49:05 -07:00
Richard Liaw 18327254b6 [docs] Fix readthedocs rendering (#7810) 2020-03-30 11:40:08 -07:00
Richard Liaw 86cff17e7e [tune/raysgd] Tune API for TorchTrainer + Fix State Restoration (#7547) 2020-03-30 12:58:49 -05:00
Edward Oakes 3a53ea60d9 [Serve] Push route table updates to HTTP proxy (#7774) 2020-03-30 09:53:05 -07:00
Philipp Moritz eb61036ba2 Revert "Pyarrow Segfault Regression Test (#7568)" (#7805)
This reverts commit 57599f075c.
2020-03-29 20:59:05 -07:00
ijrsvt 57599f075c Pyarrow Segfault Regression Test (#7568) 2020-03-29 16:15:24 -07:00
Simon Mo 353d7e107f [Serve] Improve Serialization (#7688) 2020-03-29 14:57:19 -07:00