Commit Graph

1145 Commits

Author SHA1 Message Date
SangBin Cho c7c9db3840 Formatting. 2020-04-28 01:09:43 -07:00
SangBin Cho c6217e53e3 Updated Version to 0.8.5. 2020-04-28 00:00:08 -07:00
mehrdadn b9de9dadd7 Fix Windows build (#8186)
Co-authored-by: Mehrdad <noreply@github.com>
2020-04-26 13:07:25 -07:00
fangfengbin 5bff707d20 [GCS]Add in-memory store client (#8144) 2020-04-26 19:09:26 +08:00
ZhuSenlin 9255fcd516 [GCS] Add node failure detector (#8119) 2020-04-26 19:08:27 +08:00
fangfengbin c5d181e3d9 gcs adapts to worker table pub sub (#8182)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-04-26 17:58:55 +08:00
fangfengbin f17bea2de5 Fix get gcs server address block bug (#8126) 2020-04-26 10:01:06 +08:00
ijrsvt 69ff7e3e35 TaskCancellation (#7669)
* Smol comment

* WIP, not passing ray.init

* Fixed small problem

* wip

* Pseudo interrupt things

* Basic prototype operational

* correct proc title

* Mostly done

* Cleanup

* cleaner raylet error

* Cleaning up a few loose ends

* Fixing Race Conds

* Prelim testing

* Fixing comments and adding second_check for kill

* Working_new_impl

* demo_ready

* Fixing my english

* Fixing a few problems

* Small problems

* Cleaning up

* Response to changes

* Fixing error passing

* Merged to master

* fixing lock

* Cleaning up print statements

* Format

* Fixing Unit test build failure

* mock_worker fix

* java_fix

* Canel

* Switching to Cancel

* Responding to Review

* FixFormatting

* Lease cancellation

* FInal comments?

* Moving exist check to CoreWorker

* Fix Actor Transport Test

* Fixing task manager test

* chaning clock repr

* Fix build

* fix white space

* lint fix

* Updating to medium size

* Fixing Java test compilation issue

* lengthen bad timeouts
2020-04-25 16:04:52 -07:00
fangfengbin 38dfe5db86 remove store client template (#8160)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-04-24 21:19:12 +08:00
fangfengbin 713e375d50 [GCS]GCS adapts to job table pub sub (#8145) 2020-04-24 16:33:25 +08:00
Qing Wang d66d12661b Improve the perf of constructing actor task specs. (#8093) 2020-04-21 11:54:09 +08:00
Stephanie Wang eefea4e29c [core] Post task submission to IO loop (#8090)
* Post to IO loop

* Unused

* Fix build
2020-04-20 19:13:50 -07:00
Stephanie Wang 1323e1753d [core] When reconstruction is enabled, pin objects created by ray.put() (#8021)
* Unit test and pin ray.put objects until they have no more lineage references

* c++ tests

* lint

* Mark ray.put objects as pinned
2020-04-20 13:09:54 -07:00
ZhuSenlin 3f28a8a229 [GCS] reply to the owner only after the actor has been successfully created. (#8079)
* reply to the owner only after the actor is successfully created.

* reply immediately if the actor is already created

* fix comment

* add test_actor_creation_task provided by @Stephanie Wang

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-04-19 09:53:02 -07:00
Edward Oakes 90ef585fd5 Revert "Add ability to specify worker and driver ports (#7833)" (#8069)
This reverts commit 9f751ff8c4.
2020-04-17 12:32:22 -05:00
Eric Liang 55ce2bba10 Record num plasma errs in map (#8034) 2020-04-16 13:16:40 -07:00
Edward Oakes 9f751ff8c4 Add ability to specify worker and driver ports (#7833) 2020-04-16 13:49:25 -05:00
Clark Zinzow d4cae5f632 [Core] Added ability to specify different IP addresses for a core worker and its raylet. (#7985) 2020-04-16 10:32:24 -05:00
fangfengbin 5a7882bb44 Fix gcs_server get invalid local address (#7842) 2020-04-16 14:58:19 +08:00
mehrdadn ba00c29b67 Factor out Travis 'install' sections for use with GitHub Actions (#7988) 2020-04-15 08:10:22 -07:00
fangfengbin efbaf155b2 [GCS]Add publish and subscribe function of gcs table (#7909) 2020-04-15 04:24:52 -07:00
fangfengbin c17404918c [GCS]Add gcs table storage interface (#7949) 2020-04-15 10:48:12 +08:00
fangfengbin 026abb119c fix GrpcServer out-of-bounds bug (#7995)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-04-14 10:34:29 +08:00
ZhuSenlin 4a81793ba5 GCS-Based actor management implementation (#6763)
* add gcs actor manager

* fix test_metrics.py

* fix TestTaskInfo

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix compile error

* fix merge error

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-04-13 09:48:48 -07:00
mehrdadn 1b0f6fd558 Check AF_UNIX path length (#7951) 2020-04-13 09:30:01 -07:00
micafan c222d64ca1 [GCS] Add MessagePublisher to GCS (#7771) 2020-04-13 19:32:28 +08:00
mehrdadn 7c52359b00 Fix Windows build (#7987)
Co-authored-by: Mehrdad <noreply@github.com>
2020-04-12 13:29:48 -07:00
Qing Wang 98bfcd53bc [Java] Rename group id and package name. (#7864)
* Initial

* Change streaming's

* Fix

* Fix

* Fix org_ray

* Fix cpp file name

* Fix streaming

* Fix

* Fix

* Fix testlistening

* Fix missing sth in python

* Fix

* Fix

* Fix SPI

* Fix

* Fix complation

* Fix

* Fix CI

* Fix checkstyle

Fix checkstyle

* Fix streaming tests

* Fix streaming CI

* Fix streaming checkstyle.

* Fix build

* Fix bazel dep

* Fix

* Fix ray checkstyle

* Fix streaming checkstyle

* Fix bazel checkstyle
2020-04-12 17:59:34 +08:00
mehrdadn 07002825aa Proper command-line parsing (#7603)
* Command-line parsing functions

* Work around bug in MSVCRT for passing command-lines to programs

* Polishing

* Fix std::regex_replace() overload compatibility issue with GCC 4.8.x

* Try to work around linker error

* Implement ScanToken()

* Parse command-lines via ScanToken

* Merge src/ray/util.cc and src/ray/url.cc

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-11 23:07:07 -07:00
Stephanie Wang d7eef808b8 [core] Reconstruction for lost plasma objects (#7733)
* Add a lineage_ref_count to References

* Refactor TaskManager to store TaskEntry as a struct

* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs

* Pin TaskEntries and References in the lineage of any ObjectIDs in scope

* Fix deadlock, convert num_plasma_returns to a set of object IDs

* fix unit tests

* Feature flag

* Do not release lineage for objects that were promoted to plasma

* fix build

* fix build

* Remove num executions

* Remove num executions

* Add pinned locations to ReferenceCounter, empty handler for node death

* Fix num returns for actor tasks, fix Put return value

* Add regression test

* Clear pinned locations and callbacks on node removal

* Clear pinned locations and callbacks on node removal

* Simplify num return values

* Remove unused

* doc

* tmp

* Set num returns

* Move lineage pinning flag to ReferenceCounter

* comments

* Recover from plasma failures by pinning a new copy

* Basic object reconstruction, no concurrent reqs yet

* reconstruction test suite and a few fixes:
- fix for disabling lineage
- fix for updating submitted task refs

* Handle concurrent attempts to recover the same object

* Fix deadlock in DrainAndShutdown

* Revert "[core] Revert lineage pinning (#7499) (#7692)"

This reverts commit ba86a02b37.

* debug rllib

* debug rllib

* turn on all rllib tests again

* debug rllib

* Fix drain bug, check number of pending tasks

* revert rllib debug

* remove todo

* Trigger rllib tests

* revert rllib debug commit

* Split out logic into ObjectRecoveryManager

* Fix python tests

* Refactor to remove dependency on gcs client

* Unit tests

* Move pinned at node ID to direct memory store

* Unit test fixes and lint

* simplify and more tests

* Add ResubmitTask test for TaskManager

* Doc

* fix build

* comments

* Fix

* debug

* Update

* fix

* Fix

* Fix bad status handling, unit test

* Fix build
2020-04-11 16:52:57 -07:00
Stephanie Wang 18e9a076e5 [core] Cancel worker lease requests that are no longer needed (#7929)
* regression test

* Cancel lease requests

* unit tests

* update

* fix build

* Move unit test

* Set success

* Ref to shared_ptr

* debug

* Revert "debug"

This reverts commit 6b2c25805a8223b41ffcc2d88d903e16ea415089.

* Bad move

* Fix bad status handling
2020-04-11 16:51:32 -07:00
fangfengbin 061043229f [GCS]Optimize gcs client testcases (#7895) 2020-04-09 12:30:58 +08:00
Kai Yang 48b48cc8c2 Support multiple core workers in one process (#7623) 2020-04-07 11:01:47 +08:00
micafan e91595f955 [GCS] Add ObjectLocator to gcs server (#7557) 2020-04-07 10:37:24 +08:00
Ion 9f6cbf168e New scheduler local node (#7899) 2020-04-06 14:43:42 -05:00
mehrdadn 203c077895 Switch to Boost generic sockets (#7656)
* Use generic Boost sockets

* Un-templatize server/client connections

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-05 22:26:46 -07:00
micafan 185d591108 No need to send actor died signal from RedisActorInfoAccessor (#7883) 2020-04-03 17:45:39 -07:00
ijrsvt 9bfc2c4b54 Moving Local Mode to C++ (#7670) 2020-04-01 15:50:57 -05:00
micafan 780c1c3b08 [GCS] impl RedisStoreClient for GCS Service (#7675) 2020-04-01 21:18:19 +08:00
fangfengbin bfb9248532 fix gcs server resolver error (#7822)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-03-30 22:57:55 -07:00
mehrdadn 8958728139 Windows bug fixes (#7740) 2020-03-30 20:39:23 -05:00
Simon Mo dc9b62e007 Deserialize Args in Event Loop Thread (#7806) 2020-03-30 18:28:13 -07:00
mehrdadn f86e623095 Fix & improve GitHub Actions CI builds (#7784) 2020-03-30 16:29:54 -07:00
mehrdadn fc23f79f82 Windows process issues (#7739) 2020-03-29 12:48:32 -07:00
fangfengbin 6ce8b63bb6 fix TestTaskLeaseRenewal test failure (#7765) 2020-03-29 11:18:47 +08:00
Kai Yang 6a3503c494 Fix reusing the cached hash of nil ID (#7753) 2020-03-27 23:40:03 +08:00
SongGuyang c195dc8f88 Basic C++ worker implementation (#6125) 2020-03-27 23:01:08 +08:00
fangfengbin e196fcdbaf Add gcs_service_enabled function to avoid getting environment variable directly (#7742) 2020-03-26 22:02:53 +08:00
Eric Liang 23b6fdcda1 ray memory should collect statistics from all nodes (#7721) 2020-03-25 16:31:31 -07:00
Stephanie Wang 46404d8a0b [core] Pin lineage of plasma objects that are still in scope (#7690)
* Fix deadlock in DrainAndShutdown

* Revert "[core] Revert lineage pinning (#7499) (#7692)"

This reverts commit ba86a02b37.

* debug rllib

* debug rllib

* turn on all rllib tests again

* debug rllib

* Fix drain bug, check number of pending tasks

* revert rllib debug

* remove todo

* Trigger rllib tests

* revert rllib debug commit
2020-03-25 09:29:32 -07:00