Commit Graph

102 Commits

Author SHA1 Message Date
Edward Oakes d190e73727 Use our own implementation of parallel_memcopy (#7254) 2020-02-21 11:03:50 -08:00
Siyuan (Ryans) Zhuang 0d210a99c3 Ensure deserialized numpy arrays are immutable (#7181)
* ensure numpy arrays are immutable when deserialized from the memory buffer
2020-02-19 23:30:10 -08:00
Simon Mo b804d40c04 Stop vendoring pyarrow (#7233) 2020-02-19 19:01:26 -08:00
Simon Mo 7bef7031c2 Revert "Revert "Revert "Removing Pyarrow dependency (#7146)" (#7209) (#7214)" (#7232) 2020-02-19 13:35:29 -08:00
Simon Mo e8941b1b79 Revert "Revert "Removing Pyarrow dependency (#7146)" (#7209) (#7214) 2020-02-19 10:08:52 -08:00
Stephanie Wang f76ce836b2 Distributed ref counting for serialized ObjectIDs (#6945)
* Skeleton plus a unit test for simple borrower case

* First unit test passes - forward an ID and task returns with 1 submitted task pending on the inner ID

* Invariant for contained_in

* Unit test passes for testing task return without creating a borrower

* Wrap ref count functionality in test case

* Fix bad delete

* Unit test and fix for borrowers creating more borrowers

* Unit test and fix for simple borrowing, but owner sends call after borrower's ref count goes to 0

* Refactor:
- keep a sentinel ref count for task argument IDs
- keep contained_in_borrowed in addition to contained_in_owned

* Unit test for nested IDs passes

* Refactor so that an object ID can only be contained in 1 borrowed ID at a time

* Add check

* Fix

* Unit test (passes) to test nesting object IDs but no borrowers created

* Unit test for nested objects from different owners passes, refactor to unset contained_in when popping refs

* Unit tests for borrowers receiving an ObjectID from multiple sources,
skip adding ownership info if we already have it to handle duplicate
refs

* Unit test for returning object ID passes

* More unit tests for returning object IDs pass

* Add serialized ID tests

* fix serialization issue

* remove swap

* It builds!

* debugging and some fixes:
- register handler for WaitForRefRemoved
- don't create a python reference for arg IDs
- pass in client factory into ReferenceCounter
- fix bad decrement in PopBorrowerRefs

* Fix accounting for serialized IDs:
- don't decrement for IDs on dependency resolution, wait until task finished
- add object IDs that were inlined when building the arguments to the task spec, pin these on the task executor until task finishes

* mu_ -> mutex_

* lint

* fix build

* clear outer_object_id

* add direct call type check

* Fix test for direct call IDs and return IDs for actor calls

* Fix CoreWorkerClient.Addr()

* Remove unneeded lock

* Remove unnecessary ObjectID refs

* Fix worker holding serialized refs test

* Fix hex IDs

* fix

* fix tests

* fix tests

* refactor and cleanups

* lint

* Put inlined Ids in task args and some cleanup

* Add back gc.collect() line for test case

* Refactor and fixes:
- store inlined IDs in RayObject
- allow storing objects with inlined IDs in memory store
- pin objects that were promoted to plasma

* oops

* make sure worker ID is set in address, pass in rpc::Address to CoreWorkerClient

* todos

* cleanups and test builds

* Fix tests

* Add feature flag

* cleanups

* address comments and some cleanups

* cleanup

* fix recursive test

* Comments for tests

* Turn off ref counting by default

* Skip tests

* Fix some bugs for test_array.py, java build

* Don't include nested objects in the ref count when the feature flag is off

* C++ feature flag does not work...

* Remove

* Turn on python tests and add a warning when plasma objects are evicted before being pinned

* Fix build and remove irrelevant test

* Fix for java

* Revert "Fix build and remove irrelevant test"

This reverts commit 056cca9b263ed05b0f9ab2250907338edcbca2d5.

* Fix ray.internal.free

* Fixes and skip some flaky tests

* fix java build

* fix windows build

* Add IDs contained in owned objects

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* update

* Try to fix ::test_direct_call_serialized_id_eviction

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-02-18 18:21:34 -08:00
Eric Liang 0aa9373d62 Revert "Removing Pyarrow dependency (#7146)" (#7209)
This reverts commit 2116fd3bca.
2020-02-18 14:12:06 -08:00
ijrsvt 2116fd3bca Removing Pyarrow dependency (#7146) 2020-02-17 18:00:13 -08:00
fyrestone a6b8bd47b0 [xlang] Cross language serialize ActorHandle (#7134) 2020-02-17 20:44:56 +08:00
Qing Wang f3703bafa3 [Java] Support concurrent actor calls API. (#7022)
* WIP

Temp change

Attach native thread to jvm

* Fix run mode

* Address comments.
2020-02-14 13:02:39 +08:00
fyrestone 0648bd28ef [xlang] Cross language Python support (#6709) 2020-02-08 13:01:28 +08:00
Edward Oakes 844f607c93 Collect contained ObjectIDs during deserialization (#7029) 2020-02-03 22:49:14 -08:00
Edward Oakes 984490d2be Collect object IDs during serialization (#6946) 2020-02-03 18:38:11 -08:00
Edward Oakes 92525f35d1 Remove raylet client from Python worker (#6018) 2020-01-31 18:23:01 -08:00
Edward Oakes 341a921d81 Remove vanilla pickle serialization for task arguments (#6948) 2020-01-31 16:52:43 -08:00
Simon Mo 396d7fafc8 UI improvement for asyncio (#6905) 2020-01-27 12:45:51 -08:00
Yunzhi Zhang 0834bda8c1 [Dashboard] Display actor task execution info (#6705)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2020-01-22 22:33:55 -08:00
Simon Mo 5f527816fe Fix async actor high cpu utilization when idle (#6877) 2020-01-22 16:07:08 -08:00
Edward Oakes 2a4d2c6e9e Basic reference counting & pinning (#6554) 2020-01-06 17:30:26 -06:00
Eric Liang 46acb02aa4 Fix verbose shutdown error and test_env_with_subprocesses (#6614) 2019-12-26 22:43:39 -08:00
Edward Oakes 6b1a57542e Add actor.__ray_kill__() to terminate actors immediately (#6523) 2019-12-23 23:12:57 -06:00
Yunzhi Zhang bac6f3b61e [Dashboard] Collecting worker stats in node manager and implement webui display in the backend (#6574) 2019-12-22 17:50:23 -08:00
Edward Oakes e50aa99be1 Reference counting for direct call submitted tasks (#6514)
Co-authored-by: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
2019-12-20 17:06:33 -08:00
Eric Liang e556b729c2 [direct call] Fix max_calls interaction with background tasks. (#6536) 2019-12-19 13:48:32 -08:00
Edward Oakes 41fa2e9604 Remove object id translation (#6531) 2019-12-19 12:47:49 -08:00
Simon Mo 26ec500ef9 Implement async get for direct actor call (#6339) 2019-12-18 11:50:21 -08:00
Edward Oakes e2b7459bfc Fix worker exit cleanup (#6450)
* working but ugly

* comments

* proper but hanging in grpc server destructor

* grpc server shutdown deadline

* fix disconnect

* lint

* shutdown_only in test

* replace shutdown
2019-12-13 16:52:50 -08:00
Edward Oakes 82f7dbc7a7 Increase TaskID size by 2 bytes, taken from JobID (#6425)
* Increase TaskID size by 2 bytes, taken from JobID

* comments

* check max job id

* fix doc

* fix local mode
2019-12-11 10:45:14 -08:00
Edward Oakes 044527adb8 Remove ref counting dependencies on ray.get() (#6412)
* Remove ref counting dependencies on Get()

* comment

* don't send IDs when disabled

* pass through internal config

* fix

* allow reinit

* remove flag
2019-12-10 18:11:34 -08:00
Chaokun Yang 6272907a57 [Streaming] Streaming data transfer and python integration (#6185) 2019-12-10 20:33:24 +08:00
Rong Rong c1d4ab8bb4 Move top level RayletClient to ray::raylet::RayletClient (#6404) 2019-12-09 21:08:59 -08:00
micafan 668ce47360 [GCS]Add abstract interface of actor to GCS Client (#6269) 2019-12-05 13:38:29 +08:00
Stephanie Wang da41180dc0 [direct task] Retry tasks on failure and turn on RAY_FORCE_DIRECT for test_multinode_failures.py (#6306)
* multinode failures direct

* Add number of retries allowed for tasks

* Retry tasks

* Add failing test for object reconstruction

* Handle return status and debug

* update

* Retry task unit test

* update

* update

* todo

* Fix max_retries decorator, fix test

* Fix test that flaked

* lint

* comments
2019-12-02 10:20:57 -08:00
Eric Liang b7b655c851 Also use NotifyDirectCallTaskBlock/Unblocked for plasma store accesses (#6249)
* wip

* fix it

* lint

* wip

* fix

* unblock

* flaky

* use fetch only flag

* Revert "use fetch only flag"

This reverts commit 56e938a0ee2024f5c99c9ab2d55fd35558fb15e1.

* restore error resolution

* use worker task id

* proto comments

* fix if
2019-11-27 22:46:15 -08:00
Stephanie Wang 2797c11b69 [direct task] For serialized object IDs, check with owner before declaring object unreconstructable (#6286)
* Track borrowed vs owned objects

* Serialize owner address with object ID

* serialize owner task id

* Deserialize object IDs

* Pass direct task ID instead of plasma ID

* it works

* Fix ref count test

* Add unit test

* update warning

* we own ray.put objects

* missing file

* doc

* Fix unit test

* comments

* Fix py2

* lint

* update
2019-11-27 15:31:44 -08:00
Simon Mo aa8d5d2f6c Rate limit asyncio actor (#6242) 2019-11-24 11:39:28 -08:00
Simon Mo 29ba6bfc64 Basic Async Actor Call (#6183)
* Start trying to figure out where to put fibers

* Pass is_async flag from python to context

* Just running things in fiber works

* Yield implemented, need some debugging to make it work

* It worked!

* Remove debug prints

* Lint

* Revert the clang-format

* Remove unnecessary log

* Remove unncessary import

* Add attribution

* Address comment

* Add test

* Missed a merge conflict

* Make test pass and compile

* Address comment

* Rename async -> asyncio

* Move async test to py3 only

* Fix ignore path
2019-11-21 11:56:46 -08:00
Eric Liang 7d33e9949b Integrate ref count module into local memory store (#6122) 2019-11-15 10:52:19 -08:00
Eric Liang 8ff393a7bd Handle exchange of direct call objects between tasks and actors (#6147) 2019-11-14 17:32:04 -08:00
Ujval Misra e3e3ad4b25 Add timeout param to ray.get (#6107) 2019-11-14 00:50:04 -08:00
Eric Liang f3f86385d6 Minimal implementation of direct task calls (#6075) 2019-11-12 11:45:28 -08:00
Stephanie Wang 35d177f459 Use grpc for communication from worker to local raylet (task submission and direct actor args only) (#6118)
* Skeleton for SubmitTask proto

* Pass through node manager port, connect in raylet client

* Switch submit task to grpc

* Check port in use

* doc

* Remove default port, set port randomly from driver

* update

* Fix test

* Fix object manager test
2019-11-11 21:17:25 -08:00
Eric Liang 4a28306186 Allow large returns from direct actor calls (#6088) 2019-11-07 21:28:55 -08:00
Edward Oakes ca53af4d0f Add pending task dependencies to ObjectID ref counting (#6054) 2019-11-07 18:37:10 -08:00
Edward Oakes 043d1f4094 Return RayObjects to core worker (#6052) 2019-11-04 20:27:57 -08:00
Eric Liang 8485304e83 Support concurrent Actor calls in Ray (#6053) 2019-11-04 01:14:35 -08:00
Simon Mo 7f5b3502da Implement Detached Actor (#6036)
* Arg propagation works

* Implement persistent actor

* Add doc

* Initialize is_persistent_

* Rename persistent->detached

* Address comment

* Make test passes

* Address comment

* Python2 compatiblity

* Fix naming, py2

* Lint
2019-11-01 10:28:23 -07:00
Eric Liang b89cac976a Basic direct actor call support in Python (#5991) 2019-10-28 22:09:04 -07:00
Edward Oakes c1418b04df Remove CoreWorkerObjectInterface (#6023) 2019-10-28 10:48:41 -07:00
Philipp Moritz 80c01617a3 Optimize python task execution (#6024) 2019-10-27 00:43:34 -07:00