Commit Graph

387 Commits

Author SHA1 Message Date
Max Fitton ad09aa985c Make Dashboard Port Configurable (#8999) 2020-06-19 16:26:22 -05:00
Siyuan (Ryans) Zhuang 7fa64f2b24 Clean up unused Python code (#8755) 2020-06-03 12:09:19 -07:00
Edward Oakes 860eb6f13a Update named actor API (#8559) 2020-05-24 20:08:03 -05:00
Hao Chen d27e6da1b2 Fix a lint issue (#8530) 2020-05-21 16:12:44 +08:00
fangfengbin e261b4778e Adjust the state initialization sequence and put it after core worker google logging initialization (#8511) 2020-05-21 11:30:28 +08:00
mehrdadn ebf060d484 Make more tests run on Windows (#8446)
* Remove worker Wait() call due to SIGCHLD being ignored

* Port _pid_alive to Windows

* Show PID as well as TID in glog

* Update TensorFlow version for Python 3.8 on Windows

* Handle missing Pillow on Windows

* Work around dm-tree PermissionError on Windows

* Fix some lint errors on Windows with Python 3.8

* Simplify torch requirements

* Quiet git clean

* Handle finalizer issues

* Exit with the signal number

* Get rid of wget

* Fix some Windows compatibility issues with tests

Co-authored-by: Mehrdad <noreply@github.com>
2020-05-20 12:25:04 -07:00
Stephanie Wang bd169749e0 Option to retry failed actor tasks (#8330)
* Python

* Consolidate state in the direct actor transport, set the caller starts at

* todo

* Remove unused

* Update and unit tests

* Doc

* Remove unused

* doc

* Remove debug

* Update src/ray/core_worker/transport/direct_actor_transport.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/core_worker/transport/direct_actor_transport.cc

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* lint and fix build

* Update

* Fix build

* Fix tests

* Unit test for max_task_retries=0

* Fix java?

* Fix bad test

* Cross language fix

* fix java

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-05-15 20:15:15 -07:00
Max Fitton 00325eb2b2 Rename max_reconstructions to max_restarts and use -1 for infinite (#8274)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-05-14 10:30:29 -05:00
ijrsvt cc7bd6650a [core] Enabling Remote Task Cancelation (#8225) 2020-05-04 15:24:22 -07:00
ijrsvt c393b6d165 Remove logging (#8211) 2020-04-29 09:15:43 -07:00
chaokunyang 91f630f709 [Streaming] Streaming Cross-Lang API (#7464) 2020-04-29 13:42:08 +08:00
Edward Oakes ebdccde030 Fetch internal config from raylet (#8195) 2020-04-28 13:12:11 -05:00
ijrsvt a77e5a8cbf [Doc] Fix Docstring for Task Cancellation (#8198) 2020-04-27 17:06:08 -07:00
Robert Nishihara 48250217ac Fix API documentation formatting. (#8197) 2020-04-27 10:48:42 -07:00
Philipp Moritz d7da25eee1 Use RAY_ADDRESS to connect to an existing Ray cluster if present (#7977) 2020-04-27 09:59:37 -07:00
ijrsvt 69ff7e3e35 TaskCancellation (#7669)
* Smol comment

* WIP, not passing ray.init

* Fixed small problem

* wip

* Pseudo interrupt things

* Basic prototype operational

* correct proc title

* Mostly done

* Cleanup

* cleaner raylet error

* Cleaning up a few loose ends

* Fixing Race Conds

* Prelim testing

* Fixing comments and adding second_check for kill

* Working_new_impl

* demo_ready

* Fixing my english

* Fixing a few problems

* Small problems

* Cleaning up

* Response to changes

* Fixing error passing

* Merged to master

* fixing lock

* Cleaning up print statements

* Format

* Fixing Unit test build failure

* mock_worker fix

* java_fix

* Canel

* Switching to Cancel

* Responding to Review

* FixFormatting

* Lease cancellation

* FInal comments?

* Moving exist check to CoreWorker

* Fix Actor Transport Test

* Fixing task manager test

* chaning clock repr

* Fix build

* fix white space

* lint fix

* Updating to medium size

* Fixing Java test compilation issue

* lengthen bad timeouts
2020-04-25 16:04:52 -07:00
Dean Wampler 5d2885c609 Minor Ray API doc refinements (#8060)
* Added small section on installation when using Anaconda. Also fixed an obsolete link to Anaconda.

* Delete more temporary directories when running the doc "make clean".

* Fine-tuning the core Ray API documentation

* Fix doc lines that were too long

Co-authored-by: Dean Wampler <dean@concurrentthought.com>
2020-04-18 15:19:35 -07:00
Clark Zinzow d4cae5f632 [Core] Added ability to specify different IP addresses for a core worker and its raylet. (#7985) 2020-04-16 10:32:24 -05:00
ijrsvt 44825d81e9 Change Proctitle to IDLE after an Error (#7863) 2020-04-08 11:33:43 -07:00
Kai Yang 48b48cc8c2 Support multiple core workers in one process (#7623) 2020-04-07 11:01:47 +08:00
ijrsvt 9bfc2c4b54 Moving Local Mode to C++ (#7670) 2020-04-01 15:50:57 -05:00
mehrdadn fc23f79f82 Windows process issues (#7739) 2020-03-29 12:48:32 -07:00
Cloud Han c1b05b720d calling register_custom_serializer require ray to be initialized (#7752) 2020-03-26 10:24:06 -07:00
Simon Mo a519b4f2a9 [Serve] Enhancement in HTTP Methods and Multi-route support (#7709) 2020-03-24 20:25:05 -07:00
Robert Nishihara 8b4c2b7e88 Remove unnecessary handling of setproctitle and psutil. (#7702) 2020-03-22 22:06:42 -07:00
Edward Oakes 58dc70f90e [minor] Remove get_global_worker(), RuntimeContext (#7638) 2020-03-20 15:45:29 -05:00
Eric Liang 745b9d643d First pass at ray memory command for memory debugging (#7589) 2020-03-17 20:45:07 -07:00
Simon Mo 3f1fcaa024 Blocking ray.get/wait inside async context will warn instead of error (#7262) 2020-03-14 22:02:30 -07:00
Stephanie Wang 53549314c5 [core] Option to fallback to LRU on OutOfMemory (#7410)
* Add a test for LRU fallback

* Update error message

* Upgrade arrow to master

* Integrate with arrow

* Revert "Bazel mirrors (#7385)"

This reverts commit 44aded5272.

* Don't LRU evict

* Revert "Revert "Bazel mirrors (#7385)""

This reverts commit b6359fea78d1bd3925452ca88ac71e0c9e5c7dd3.

* Add lru_evict flag

* fix internal config

* Fix

* upgrade arrow

* debug

* Set free period in config for lru_evict, override max retries to fix
test

* Fix test?

* fix test

* Revert "debug"

This reverts commit 98f01c63a267f38218f5047b1866e4c1c8280017.

* fix exception str

* Fix ref count test

* Shorten travis test?
2020-03-14 11:28:43 -07:00
Kai Yang d6e8f47065 Add a flag to disable reconstruction for a killed actor (#7346) 2020-03-13 19:10:21 +08:00
Edward Oakes 7b609ca211 Remove instances of 'raise Exception' (#7523) 2020-03-10 17:51:22 -07:00
Edward Oakes 4ab80eafb9 Deprecate use_pickle flag (#7474) 2020-03-09 16:03:56 -07:00
Edward Oakes 0abcca258f Add entries to in-memory store on Put() (#7085) 2020-03-04 10:17:27 -08:00
Edward Oakes 93fe4b0b58 Change actor.__ray_kill__() to ray.kill(actor) (#7360) 2020-02-28 11:55:13 -06:00
Edward Oakes d190e73727 Use our own implementation of parallel_memcopy (#7254) 2020-02-21 11:03:50 -08:00
fyrestone a6b8bd47b0 [xlang] Cross language serialize ActorHandle (#7134) 2020-02-17 20:44:56 +08:00
Richard Liaw 52d9189d5d [autoscaler] port-forward for attach + redis_port (#7145)
* port-forward

* fixport

* force redis port in init mode

* test

* Update python/ray/tests/test_ray_init.py
2020-02-14 15:17:00 -08:00
fyrestone 0648bd28ef [xlang] Cross language Python support (#6709) 2020-02-08 13:01:28 +08:00
ijrsvt 0826f95e1c Including psutil & setproctitle (#7031) 2020-02-05 14:16:58 -08:00
Edward Oakes 984490d2be Collect object IDs during serialization (#6946) 2020-02-03 18:38:11 -08:00
Siyuan (Ryans) Zhuang 42cbf801e1 workaround for python3.5 fast numpy serialization (#6675) 2020-02-03 13:08:18 -08:00
Eric Liang 8b4b49662b Force OMP_NUM_THREADS=1 if unset (#6998)
* force omp

* update

* set

* workers

* link
2020-02-01 11:46:11 -08:00
Edward Oakes 92525f35d1 Remove raylet client from Python worker (#6018) 2020-01-31 18:23:01 -08:00
Edward Oakes 341a921d81 Remove vanilla pickle serialization for task arguments (#6948) 2020-01-31 16:52:43 -08:00
SangBin Cho df518849ed Remove ray.wait timeout warning for milliseconds (#6980) 2020-01-30 19:07:52 -08:00
Simon Mo 1e3a34b223 Rewrite the async api documentation (#6936)
* Rewrite the async api documentation

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* clearify comment

* Add quickstart

* Add reference for async in ray.get ray.wait docstring

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-01-30 09:34:09 -08:00
Simon Mo 396d7fafc8 UI improvement for asyncio (#6905) 2020-01-27 12:45:51 -08:00
Simon Mo 4dd41844d0 Ignore blocking ray.wait if timeout is zero (#6891) 2020-01-22 16:05:34 -08:00
Sven Mika 4ee566129f Ignore io.UnsupportedOperation error when "Enabling nice stack traces on SIGSEGV etc." in worker.py::connect(). (#6771)
- Fixes RLlib tf-eager test cases for all agents when run locally on Ubuntu and Mac.
2020-01-13 14:31:13 -08:00
Sven 60d4d5e1aa Remove future imports (#6724)
* Remove all __future__ imports from RLlib.

* Remove (object) again from tf_run_builder.py::TFRunBuilder.

* Fix 2xLINT warnings.

* Fix broken appo_policy import (must be appo_tf_policy)

* Remove future imports from all other ray files (not just RLlib).

* Remove future imports from all other ray files (not just RLlib).

* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).

* Add two empty lines before Schedule class.

* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00