Commit Graph

352 Commits

Author SHA1 Message Date
fyrestone a6b8bd47b0 [xlang] Cross language serialize ActorHandle (#7134) 2020-02-17 20:44:56 +08:00
Richard Liaw 52d9189d5d [autoscaler] port-forward for attach + redis_port (#7145)
* port-forward

* fixport

* force redis port in init mode

* test

* Update python/ray/tests/test_ray_init.py
2020-02-14 15:17:00 -08:00
fyrestone 0648bd28ef [xlang] Cross language Python support (#6709) 2020-02-08 13:01:28 +08:00
ijrsvt 0826f95e1c Including psutil & setproctitle (#7031) 2020-02-05 14:16:58 -08:00
Edward Oakes 984490d2be Collect object IDs during serialization (#6946) 2020-02-03 18:38:11 -08:00
Siyuan (Ryans) Zhuang 42cbf801e1 workaround for python3.5 fast numpy serialization (#6675) 2020-02-03 13:08:18 -08:00
Eric Liang 8b4b49662b Force OMP_NUM_THREADS=1 if unset (#6998)
* force omp

* update

* set

* workers

* link
2020-02-01 11:46:11 -08:00
Edward Oakes 92525f35d1 Remove raylet client from Python worker (#6018) 2020-01-31 18:23:01 -08:00
Edward Oakes 341a921d81 Remove vanilla pickle serialization for task arguments (#6948) 2020-01-31 16:52:43 -08:00
SangBin Cho df518849ed Remove ray.wait timeout warning for milliseconds (#6980) 2020-01-30 19:07:52 -08:00
Simon Mo 1e3a34b223 Rewrite the async api documentation (#6936)
* Rewrite the async api documentation

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* clearify comment

* Add quickstart

* Add reference for async in ray.get ray.wait docstring

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-01-30 09:34:09 -08:00
Simon Mo 396d7fafc8 UI improvement for asyncio (#6905) 2020-01-27 12:45:51 -08:00
Simon Mo 4dd41844d0 Ignore blocking ray.wait if timeout is zero (#6891) 2020-01-22 16:05:34 -08:00
Sven Mika 4ee566129f Ignore io.UnsupportedOperation error when "Enabling nice stack traces on SIGSEGV etc." in worker.py::connect(). (#6771)
- Fixes RLlib tf-eager test cases for all agents when run locally on Ubuntu and Mac.
2020-01-13 14:31:13 -08:00
Sven 60d4d5e1aa Remove future imports (#6724)
* Remove all __future__ imports from RLlib.

* Remove (object) again from tf_run_builder.py::TFRunBuilder.

* Fix 2xLINT warnings.

* Fix broken appo_policy import (must be appo_tf_policy)

* Remove future imports from all other ray files (not just RLlib).

* Remove future imports from all other ray files (not just RLlib).

* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).

* Add two empty lines before Schedule class.

* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Eric Liang 69c5a2bc3c Warn if OMP_NUM_THREADS is set (#6729) 2020-01-08 14:59:07 -08:00
Robert Nishihara 5e43b25e8c Document fault tolerance behavior. (#6698) 2020-01-06 22:34:06 -08:00
Edward Oakes 2a4d2c6e9e Basic reference counting & pinning (#6554) 2020-01-06 17:30:26 -06:00
Robert Nishihara 92e44a5dc8 Deprecate redis_address argument in favor of address. (#6654) 2020-01-02 20:18:34 -08:00
Robert Nishihara 39a3459886 Remove (object) from class declarations. (#6658) 2020-01-02 17:42:13 -08:00
Robert Nishihara 480206eef8 Remove some Python 2 compatibility code. (#6624) 2019-12-31 17:14:58 -08:00
Eric Liang e2bc489a18 Port webui nits from original pr that enables it (#6628)
* backport changes

* Update test_webui.py
2019-12-29 19:19:43 -08:00
Robert Nishihara 8724e5ffd5 Start WebUI by default. (#6493) 2019-12-27 13:49:07 -08:00
Edward Oakes 6b1a57542e Add actor.__ray_kill__() to terminate actors immediately (#6523) 2019-12-23 23:12:57 -06:00
Yunzhi Zhang bac6f3b61e [Dashboard] Collecting worker stats in node manager and implement webui display in the backend (#6574) 2019-12-22 17:50:23 -08:00
Simon Mo 26ec500ef9 Implement async get for direct actor call (#6339) 2019-12-18 11:50:21 -08:00
Simon Mo e530c37b0e Use localhost and set redis password by default (#6481) 2019-12-17 19:41:19 -08:00
Edward Oakes e2b7459bfc Fix worker exit cleanup (#6450)
* working but ugly

* comments

* proper but hanging in grpc server destructor

* grpc server shutdown deadline

* fix disconnect

* lint

* shutdown_only in test

* replace shutdown
2019-12-13 16:52:50 -08:00
Edward Oakes 82f7dbc7a7 Increase TaskID size by 2 bytes, taken from JobID (#6425)
* Increase TaskID size by 2 bytes, taken from JobID

* comments

* check max job id

* fix doc

* fix local mode
2019-12-11 10:45:14 -08:00
Edward Oakes 044527adb8 Remove ref counting dependencies on ray.get() (#6412)
* Remove ref counting dependencies on Get()

* comment

* don't send IDs when disabled

* pass through internal config

* fix

* allow reinit

* remove flag
2019-12-10 18:11:34 -08:00
Stephanie Wang da41180dc0 [direct task] Retry tasks on failure and turn on RAY_FORCE_DIRECT for test_multinode_failures.py (#6306)
* multinode failures direct

* Add number of retries allowed for tasks

* Retry tasks

* Add failing test for object reconstruction

* Handle return status and debug

* update

* Retry task unit test

* update

* update

* todo

* Fix max_retries decorator, fix test

* Fix test that flaked

* lint

* comments
2019-12-02 10:20:57 -08:00
Edward Oakes e4f9b3b7d9 Use process reaper for cleanup (#6253) 2019-11-26 22:00:08 -06:00
Simon Mo 1ca8c427e3 Consistent Name for Process Title (#6276)
* Consistent naming for setprotitle

* Address comments

* Add debug/verbose mode

* Fix test
2019-11-26 11:56:28 -08:00
Philipp Moritz 33c768ebe4 Fix worker signal.SIGTERM handler being installed from outside the main thread (#6176) 2019-11-20 11:14:28 -08:00
Ujval Misra 2965dc1b72 [tune] Fault tolerance improvements (#5877)
* Precede ray.get with ray.wait.

* Trigger checkpoint deletes locally in Trainable

* Clean-up code.

* Minor changes.

* Track best checkpoint so far again

* Pulled checkpoint GC out of Trainable.

* Added comments, error logging.

* Immediate pull after checkpoint taken; rsync source delete on pull

* Minor doc fixes

* Fix checkpoint manager bug

* Fix bugs, tests, formatting

* Fix bugs, feature flag for force sync.

* Fix test.

* Fix minor bugs: clear proc and less verbose sync_on_checkpoint warnings.

* Fix bug: update IP of last_result.

* Fixed message.

* Added a lot of logging.

* Changes to ray trial executor.

* More bug fixes (logging after failure), better logging.

* Fix richards bug and logging

* Add comments.

* try-except

* Fix heapq bug.

* .

* Move handling of no available trials to ray_trial_executor (#1)

* Fix formatting bug, lint.

* Addressed Richard's comments

* Revert tests.

* fix rebase

* Fix trial location reporting.

* Fix test

* Fix lint

* Rebase, use ray.get w/ timeout, lint.

* lint

* fix rebase

* Address richard's comments
2019-11-18 01:14:41 -08:00
Ujval Misra e3e3ad4b25 Add timeout param to ray.get (#6107) 2019-11-14 00:50:04 -08:00
Philipp Moritz f24d96ec4f Revert "Try to enable dashboard (again) (#6069)" (#6159)
This reverts commit 4044af8520.
2019-11-13 12:32:12 -08:00
Stephanie Wang 35d177f459 Use grpc for communication from worker to local raylet (task submission and direct actor args only) (#6118)
* Skeleton for SubmitTask proto

* Pass through node manager port, connect in raylet client

* Switch submit task to grpc

* Check port in use

* doc

* Remove default port, set port randomly from driver

* update

* Fix test

* Fix object manager test
2019-11-11 21:17:25 -08:00
Philipp Moritz decaa65cd6 Use pickle by default for serialization (#5978) 2019-11-10 18:12:18 -08:00
Eric Liang 4044af8520 Try to enable dashboard (again) (#6069)
* Revert "Revert "Enable the Ray dashboard by default (#5976)" (#6068)"

This reverts commit 1a3e97cf23.

* fix tests that assume the dashboard isn't a job

* travis
2019-11-08 10:48:48 -08:00
Eric Liang 4a28306186 Allow large returns from direct actor calls (#6088) 2019-11-07 21:28:55 -08:00
Edward Oakes 043d1f4094 Return RayObjects to core worker (#6052) 2019-11-04 20:27:57 -08:00
Eric Liang 1a3e97cf23 Revert "Enable the Ray dashboard by default (#5976)" (#6068)
This reverts commit 6166ef3e09.
2019-11-01 17:08:37 -07:00
Eric Liang fb34928a2a [minor] Perf optimizations for direct actor task submission (#6044)
* merge optimizations

* fix

* fix memory err

* optimize

* fix tests

* fix serialization of method handles

* document weakref

* fix check

* bazel format

* disable on 2
2019-11-01 14:41:14 -07:00
Eric Liang 6166ef3e09 Enable the Ray dashboard by default (#5976) 2019-11-01 12:19:01 -07:00
Edward Oakes e9e78871b9 Remove unused function definition caching (#6042) 2019-10-30 16:41:18 -07:00
Eric Liang b89cac976a Basic direct actor call support in Python (#5991) 2019-10-28 22:09:04 -07:00
Eric Liang a5523466a2 Enable memstore by default (#6003) 2019-10-25 21:59:12 -07:00
Edward Oakes 1ce521a7f3 Remove task context from python worker (#5987)
Removes duplicated state between the python and C++ workers. Also cleans up the serialization codepaths a bit.
2019-10-25 07:38:33 -07:00
Edward Oakes 6f27d881bd Fix core worker shutdown errors (#6004) 2019-10-24 22:29:05 -07:00