Commit Graph

69 Commits

Author SHA1 Message Date
Stephanie Wang 3321555975 Increase timeout for ray.wait test (#5273)
* Increase test timeout for ray.wait

* make sure the actor is scheduled
2019-07-25 14:23:46 -07:00
Joey Jiang 40395acadf [gRPC] Migrate raylet client implementation to grpc (#5120) 2019-07-25 14:48:56 +08:00
Eric Liang 5b76238bce Fix two types of eviction hangs (#5225) 2019-07-23 21:20:17 -07:00
Stephanie Wang 9c651f47bb Add regression test for actor load balancing (#5224)
* Add regression test for actor load balancing

* Increase timeout

* Reduce number of nodes?
2019-07-23 15:11:55 -07:00
Stephanie Wang 15959b0f0d Leave ray.wait calls open until the task or actor exits (#5234)
* Regression test

* Split TaskDependencyManager::SubscribeDependencies into ray.get and ray.wait dependencies
- Some initial implementation

* unit test

* Improve unit tests for TaskDependencyManager

* Implement SubscribeWaitDependencies and UnsubscribeWaitDependencies, unit tests passing

* Add ray.wait python test for drivers that exit early

* Add WorkerID to Worker

* Update test to use two nodes

* Regression test for ray.wait passes

* Extend regression test to include ray.wait from an actor

* Fix ClientID and WorkerIDs

* lint

* lint

* Remove unnecessary ray_get argument

* fix build
2019-07-23 11:55:28 -07:00
Richard Liaw 3e0ad11ae0 Add heartbeat test + Fix monitor.py (#5191) 2019-07-16 21:59:48 -07:00
Hao Chen ea6aa6409a Reconstruct failed actors without sending tasks. (#5161)
* fast reconstruct dead actors

* add test

* fix typos

* remove debug print

* small fix

* fix typos

* Update test_actor.py
2019-07-15 10:25:09 -07:00
Qing Wang f2293243cc [ID Refactor] Shorten the length of JobID to 4 bytes (#5110)
* WIP

* Fix

* Add jobid test

* Fix

* Add python part

* Fix

* Fix tes

* Remove TODOs

* Fix C++ tests

* Lint

* Fix

* Fix exporting functions in multiple ray.init

* Fix java test

* Fix lint

* Fix linting

* Address comments.

* FIx

* Address and fix linting

* Refine and fix

* Fix

* address

* Address comments.

* Fix linting

* Fix

* Address

* Address comments.

* Address

* Address

* Fix

* Fix

* Fix

* Fix lint

* Fix

* Fix linting

* Address comments.

* Fix linting

* Address comments.

* Fix linting

* address comments.

* Fix
2019-07-11 14:25:16 +08:00
Kai Yang 43b6513d19 [GCS] Move node resource info from client table to resource table (#5050) 2019-07-11 13:17:19 +08:00
Eric Liang 5aec750107 Add warning/error if object store memory exceeds available memory (#4893)
* exclude

* format

* add warning

* hatch

* reduce mem usage

* reduce object store mem

* set obj mem
2019-07-08 21:37:08 -07:00
Edward Oakes 8f53364097 Improve local_mode (#5060) 2019-07-07 17:10:50 -07:00
Philipp Moritz c5253cc300 Add job table to state API (#5076) 2019-07-06 00:05:48 -07:00
Joey Jiang d6bbbdef35 Use gRPC to handle communication and data transmission between object manager (#4996) 2019-06-28 10:56:34 +08:00
Qing Wang 62e4b591e3 [ID Refactor] Rename DriverID to JobID (#5004)
* WIP

WIP

WIP

Rename Driver -> Job

Fix complition

Fix

Rename in Java

In py

WIP

Fix

WIP

Fix

Fix test

Fix

Fix C++ linting

Fix

* Update java/runtime/src/main/java/org/ray/runtime/config/RayConfig.java

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Update src/ray/core_worker/core_worker.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Address comments

* Fix

* Fix CI

* Fix cpp linting

* Fix py lint

* FIx

* Address comments and fix

* Address comments

* Address

* Fix import_threading
2019-06-28 00:44:51 +08:00
Hao Chen a1156754e9 Fix test_task_forward (#5040) 2019-06-27 14:37:00 +08:00
Robert Nishihara a17c08faa4 Lengthen buffer in resource test. (#4961) 2019-06-26 09:54:04 -07:00
Hao Chen 0131353d42 [gRPC] Migrate gcs data structures to protobuf (#5024) 2019-06-25 14:31:19 -07:00
Joey Jiang a7f84b536f Fix no cpus test (#5009) 2019-06-21 17:08:25 +08:00
Andrew Berger e59e8074dd fix handling of non-integral timeout values in signal.receive (#5002) 2019-06-20 15:33:40 -07:00
Hao Chen 2bf92e02e2 [gRPC] Use gRPC for inter-node-manager communication (#4968) 2019-06-17 19:00:50 +08:00
Robert Nishihara a82e8118a0 Fix resource bookkeeping bug with acquiring unknown resource. (#4945) 2019-06-07 21:07:27 -07:00
Robert Nishihara 6703519144 Move global state API out of global_state object. (#4857) 2019-05-26 11:27:53 -07:00
Robert Nishihara 49fe894e22 Export remote functions when first used and also fix bug in which rem… (#4844)
* Export remote functions when first used and also fix bug in which remote functions and actor classes are not exported from workers during subsequent ray sessions.

* Documentation update

* Fix tests.

* Fix grammar
2019-05-24 13:44:39 -07:00
Robert Nishihara 2015085192 Fix bug in which actor classes are not exported multiple times. (#4838) 2019-05-23 09:22:46 -07:00
Yuhong Guo 1a39fee9c6 Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (#4776)
* Enable BaseId.

* Change TaskID and make python test pass

* Remove unnecessary functions and fix test failure and change TaskID to
16 bytes.

* Java code change draft

* Refine

* Lint

* Update java/api/src/main/java/org/ray/api/id/TaskId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/ObjectId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comment

* Lint

* Fix SINGLE_PROCESS

* Fix comments

* Refine code

* Refine test

* Resolve conflict
2019-05-22 14:46:30 +08:00
Romil Bhardwaj 004440f526 Dynamic Custom Resources - create and delete resources (#3742) 2019-05-11 20:06:04 +08:00
Devin Petersohn edb8465910 [ray-core] Initial addition of performance integration testing files (#4325) 2019-05-08 13:40:54 -07:00
Si-Yuan bd00735fe8 Fix tempfile issues (#4605) 2019-05-05 16:06:15 -07:00
Robert Nishihara d81e71e297 Enable actor methods to be decorated on the caller side also and get postprocessors. (#4732)
* Allow decorating ray actor methods.

* Add test.

* Add get postprocessors.

* Improve documentation.

* Make it work for remote functions.

* Temporary fix.
2019-05-04 11:53:47 -07:00
Robert Nishihara e9b351e749 Reduce memory usage of test_simple in test_stress.py. (#4709) 2019-04-28 07:50:23 -07:00
Si-Yuan 9ce3039390 Fix webui api (#4686)
* fix webui

* Apply suggestions from code review

lint

Co-Authored-By: suquark <suquark@gmail.com>

* add dependencies for this unittest

* move dependencies to the script file
2019-04-27 15:23:56 +08:00
William Ma c99e3caaca Change resource bookkeeping to account for machine precision. (#4533) 2019-04-23 11:59:53 -07:00
Robert Nishihara 967e8aad9d Make def test_submitting_many_actors_to_one less stressful. (#4622) 2019-04-14 12:19:57 -07:00
Daniel Edgecumbe 3e1adafbce [autoscaler] Add an aggressive_autoscaling flag (#4285) 2019-04-13 18:44:32 -07:00
Romil Bhardwaj 0f42f87ebc Updating zero capacity resource semantics (#4555) 2019-04-12 16:53:57 -07:00
Wang Qing fe07a5b4b1 Add delete_creating_tasks option for internal.free() (#4588)
* add delete creating task objects.

* format code style

* Fix lint

* add tests add address comments.

* Refine test

* Refine java test

* Fix CI

* Refine

* Fix lint

* Fix CI
2019-04-12 13:38:31 +08:00
justinwyang e88e706fcc Enforce quoting style in Travis. (#4589) 2019-04-11 14:24:26 -07:00
Kristian Hartikainen ed02bf11f7 [autoscaler] Lint code that we forgot to lint in #4537 (#4584)
* Lint code that we forgot to lint in previous PR

* Revert setup command merge

* Lint

* Revert "Revert setup command merge"

This reverts commit 55e1cdb1f256ea51ef66a38730d8f7865f1f5ad1.

* Fix testReportsConfigFailures test

* Minor syntax tweaks

* Lint
2019-04-10 17:01:36 +08:00
Si-Yuan dab99d26af Improve code related to node (#4383)
* Make full use of node

implement local node

fix bugs mentioned in comments

* Add more tests

* Use more specific exception handling

* fix, lint

* fix for py2.x
2019-04-09 17:27:54 +08:00
Yuhong Guo c2349cf12d Remove local/global_scheduler from code and doc. (#4549) 2019-04-03 17:05:09 -07:00
Hao Chen 23404f7bcf Fix some flaky tests (#4535) 2019-04-02 17:57:11 -07:00
Yuhong Guo c2c548bdfd Fix broken pipe callback (#4513) 2019-04-02 17:42:18 +08:00
William Ma 11580fb7dc Changes where actor resources are assigned (#4323) 2019-03-24 15:49:36 -07:00
Ion 59079a799c Signal actor failure (#4196) 2019-03-21 15:17:42 -07:00
Kai Yang c36d03874b Redis returns OK when removing a non-existent set entry (#4434) 2019-03-21 11:59:15 -07:00
Stephanie Wang 4ac9c1ed6e Fix bug in cluster mode where driver exits when there are tasks in the waiting queue (#4251) 2019-03-20 10:18:27 -07:00
Yuhong Guo 8ce7565530 Refactor pytest fixtures for ray core (#4390) 2019-03-20 11:48:32 +08:00
Peter Schafhalter c93eb126ec Allow manually writing to return ObjectIDs from tasks/actor methods (#3805) 2019-03-18 19:24:57 -07:00
Wang Qing 3b141b26cd Fix global_state not disconnected after ray.shutdown (#4354) 2019-03-18 16:44:49 -07:00
Yuhong Guo becffc6cef Fix checkpoint crash for actor creation task. (#4327)
* Fix checkpoint crash for actor creation task.

* Lint

* Move test to test_actor.py

* Revert unused code in test_failure.py

* Refine test according to Raul's suggestion.
2019-03-14 23:42:57 +08:00