Commit Graph

2042 Commits

Author SHA1 Message Date
Sven Mika d537e9f0d8 [RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). (#7155) 2020-02-19 12:18:45 -08:00
Simon Mo e8941b1b79 Revert "Revert "Removing Pyarrow dependency (#7146)" (#7209) (#7214) 2020-02-19 10:08:52 -08:00
Stephanie Wang f76ce836b2 Distributed ref counting for serialized ObjectIDs (#6945)
* Skeleton plus a unit test for simple borrower case

* First unit test passes - forward an ID and task returns with 1 submitted task pending on the inner ID

* Invariant for contained_in

* Unit test passes for testing task return without creating a borrower

* Wrap ref count functionality in test case

* Fix bad delete

* Unit test and fix for borrowers creating more borrowers

* Unit test and fix for simple borrowing, but owner sends call after borrower's ref count goes to 0

* Refactor:
- keep a sentinel ref count for task argument IDs
- keep contained_in_borrowed in addition to contained_in_owned

* Unit test for nested IDs passes

* Refactor so that an object ID can only be contained in 1 borrowed ID at a time

* Add check

* Fix

* Unit test (passes) to test nesting object IDs but no borrowers created

* Unit test for nested objects from different owners passes, refactor to unset contained_in when popping refs

* Unit tests for borrowers receiving an ObjectID from multiple sources,
skip adding ownership info if we already have it to handle duplicate
refs

* Unit test for returning object ID passes

* More unit tests for returning object IDs pass

* Add serialized ID tests

* fix serialization issue

* remove swap

* It builds!

* debugging and some fixes:
- register handler for WaitForRefRemoved
- don't create a python reference for arg IDs
- pass in client factory into ReferenceCounter
- fix bad decrement in PopBorrowerRefs

* Fix accounting for serialized IDs:
- don't decrement for IDs on dependency resolution, wait until task finished
- add object IDs that were inlined when building the arguments to the task spec, pin these on the task executor until task finishes

* mu_ -> mutex_

* lint

* fix build

* clear outer_object_id

* add direct call type check

* Fix test for direct call IDs and return IDs for actor calls

* Fix CoreWorkerClient.Addr()

* Remove unneeded lock

* Remove unnecessary ObjectID refs

* Fix worker holding serialized refs test

* Fix hex IDs

* fix

* fix tests

* fix tests

* refactor and cleanups

* lint

* Put inlined Ids in task args and some cleanup

* Add back gc.collect() line for test case

* Refactor and fixes:
- store inlined IDs in RayObject
- allow storing objects with inlined IDs in memory store
- pin objects that were promoted to plasma

* oops

* make sure worker ID is set in address, pass in rpc::Address to CoreWorkerClient

* todos

* cleanups and test builds

* Fix tests

* Add feature flag

* cleanups

* address comments and some cleanups

* cleanup

* fix recursive test

* Comments for tests

* Turn off ref counting by default

* Skip tests

* Fix some bugs for test_array.py, java build

* Don't include nested objects in the ref count when the feature flag is off

* C++ feature flag does not work...

* Remove

* Turn on python tests and add a warning when plasma objects are evicted before being pinned

* Fix build and remove irrelevant test

* Fix for java

* Revert "Fix build and remove irrelevant test"

This reverts commit 056cca9b263ed05b0f9ab2250907338edcbca2d5.

* Fix ray.internal.free

* Fixes and skip some flaky tests

* fix java build

* fix windows build

* Add IDs contained in owned objects

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* update

* Try to fix ::test_direct_call_serialized_id_eviction

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-02-18 18:21:34 -08:00
mehrdadn 4a12243336 Use Process instead of pid_t (round 2) (#6882)
* Revert "Revert "Use Boost.Process instead of pid_t (#6510)" (#6909)"

This reverts commit bde575b8dd.

* Process wrapper, using Boost.Process on Windows

- Reverts bde575b8dd.
- Re-applies fb8e3615d5 after some refactoring.

* Remove Boost.Process dependency

* Don't open /proc file on Linux

* Change FATAL to ERROR and modify error message when process doesn't exist
2020-02-18 17:44:46 -08:00
Eric Liang 0aa9373d62 Revert "Removing Pyarrow dependency (#7146)" (#7209)
This reverts commit 2116fd3bca.
2020-02-18 14:12:06 -08:00
Eric Liang 5df801605e Add ray.util package and move libraries from experimental (#7100) 2020-02-18 13:43:19 -08:00
ijrsvt 2116fd3bca Removing Pyarrow dependency (#7146) 2020-02-17 18:00:13 -08:00
mehrdadn 3bd82d0bcd Fix various issues/warnings that come up on Jenkins (#7147)
* Avoid warning about swap being unlimited

Currently we get the following message on Jenkins:
"Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."

Since we're not limiting swap anyway, we might as well avoid trying to.
https://docs.docker.com/config/containers/resource_constraints/#--memory-swap-details

* Fix escaping in re.search()

* Fix escaping in _noisy_layer()

* Raise a more descriptive error when dashboard data isn't found

* Don't error on dashboard files not being found when webui isn't required

* Change dashboard error to a warning instead
2020-02-17 16:08:55 -08:00
Alex Wu 734629b4ea Ssh command format (#7176) 2020-02-17 14:15:42 -08:00
Alind Khare c6d768be14 [Serve] Added support for no http route services (#7010) 2020-02-17 11:31:30 -08:00
fyrestone a6b8bd47b0 [xlang] Cross language serialize ActorHandle (#7134) 2020-02-17 20:44:56 +08:00
Edward Oakes b079787c59 Fix flaky test_get_with_timeout (#7175) 2020-02-16 21:10:16 -08:00
Richard Liaw 94e2fcea2e [sgd] fp16 (apex) and scheduler support + move examples page (#7061)
* Init fp16

* fp16 and schedulers

* scheduler linking and fp16

* to fp16

* loss scaling and documentation

* more documentation

* add tests, refactor config

* moredocs

* more docs

* fix logo, add test mode, add fp16 flag

* fix tests

* fix scheduler

* fix apex

* improve safety

* fix tests

* fix tests

* remove pin memory default

* rm

* fix

* Update doc/examples/doc_code/raysgd_torch_signatures.py

* fix

* migrate changes from other PR

* ok thanks

* pass

* signatures

* lint'

* Update python/ray/experimental/sgd/pytorch/utils.py

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* should address most comments

* comments

* fix this ci

* fix tests'

* testmode

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-02-16 19:04:08 -08:00
Siyuan (Ryans) Zhuang 6745459f96 Apply cpython patch bpo-39492 for the reference counting issue in pickle5 (#7177)
* apply cpython patch bpo-39492 for the reference count issue
2020-02-15 21:16:13 -08:00
Edward Oakes dc5a27dac0 Move ray.experimental.multiprocessing to ray.util.multiprocessing (#7149) 2020-02-14 16:17:05 -08:00
Richard Liaw 52d9189d5d [autoscaler] port-forward for attach + redis_port (#7145)
* port-forward

* fixport

* force redis port in init mode

* test

* Update python/ray/tests/test_ray_init.py
2020-02-14 15:17:00 -08:00
Qing Wang f3703bafa3 [Java] Support concurrent actor calls API. (#7022)
* WIP

Temp change

Attach native thread to jvm

* Fix run mode

* Address comments.
2020-02-14 13:02:39 +08:00
Alex Wu 0d3687a10d No warning for docker memory > system memory (#7151) 2020-02-13 15:21:44 -08:00
Qing Wang 94a286ef1d [Java] Add session_dir as temp_dir for logs, socket files like Python (#7044)
* Support

* Add gcs_server support

* Fix ut

* Fix

* Remove unused py code

* Fix linting

* Fix cross language ci

* Fix CI

* Add docstring

* Fix

* Fix linting

* Add a singleton for config

* Refine

* fix

* Fix

* linting

* Remove FileUnit

* Fix

* Fix

* Fix

* Update java/runtime/src/main/java/org/ray/runtime/config/RayConfig.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Fix streaming singleprocess CI

* Fix checkstyle

Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-02-13 17:49:52 +08:00
Edward Oakes e904711e74 Add python tests for serialized object ID reference counting (#7038) 2020-02-12 16:52:07 -08:00
Edward Oakes d91d3ea936 Split half of test_actor into test_actor_advanced (#7143) 2020-02-12 15:17:25 -08:00
Simon Mo 0e94e1dc2a [Asyncio] Increase recursion limit manually (#7142) 2020-02-12 14:15:36 -08:00
Mitchell Stern 5dda0b66bf [Dashboard] Refactor dialogs to use parent component state instead of routes (#7129) 2020-02-12 10:59:47 -08:00
aannadi d941ac6c89 Updating package-lock.json with latest npm (#7128) 2020-02-12 09:54:20 -08:00
Eric Liang 305eaaabe9 Fix hang if actor object id is returned from a task that exits (#6885) 2020-02-11 20:28:13 -08:00
Simon Mo 039d2cde88 Change log level for OMP warning (#7114) 2020-02-11 14:15:38 -08:00
aannadi d7ff55852a [tune][Dashboard] Added Tune Dashboard (#6911) 2020-02-11 11:56:49 -08:00
Simon Mo 0ddc389830 Fix documentation building with psutil issue (#7077) 2020-02-11 10:00:29 -08:00
Eric Liang 58c94f6381 [core] Delete() should never remote objects from in-memory store (#7117) 2020-02-10 22:40:09 -08:00
Maksim Smolin 4139e02f01 [autoscaler] Add `--all-nodes` option to rsync-up (#7065)
* Add option to sync workers to rsync-up

* Format

* Rename --sync-workers to --all-nodes
2020-02-10 16:27:59 -08:00
Sven Mika 6e1c3ea824 [RLlib] Exploration API (+EpsilonGreedy sub-class). (#6974) 2020-02-10 15:22:07 -08:00
SangBin Cho 1e690673d8 Render tasks that are not schedulable on the dashboard. (#7034) 2020-02-10 14:23:06 -08:00
Alex Wu 3f99be8dad Add 'ray dashboard' command (#6959) 2020-02-10 12:55:21 -08:00
Alex Wu 72c31e3e19 Ray nodes should respect docker limits (#7039) 2020-02-10 11:08:38 -08:00
chaokunyang 247a4d022a Fix passing empty bytes in python tasks (#7045)
* ensure data_ won't be null_ptr when size == 0

* when data_sizes[i] == 0, we should Allocate an empty buffer

* work around for pyarrow.py_buffer

* fix comments

* add null ptr check

* add test for bytes

* lint
2020-02-10 12:07:29 +08:00
fangfengbin 694c0f2867 [Java] Enable GCS server when running java unit tests (#7041)
* enable gcs service when run java testcase

* fix ci bug

* fix windows compile bug

* fix ci bug

* restart ci job

* enable java testcase

* restart ci job

* restart ci job

* add debug log

* add debug log

* restart ci job

* add debug log

* restart ci

* add debug log

* fix java testcase bug

* restart ci job

* restart ci job

* restart ci job

* restart ci job

* restart ci job

* restart ci job

* restart ci job

* restart ci job
2020-02-10 09:39:14 +08:00
Eric Liang 48e2adbc21 [tune] Remove unused TF loggers (#7090) 2020-02-09 13:58:24 -08:00
Ujval Misra 98a07fe37e [tune] Asynchronous saves (#6912)
* Support asynchronous saves

* Fix merge issues

* Add test, fix existing tests

* More informative warning

* Lint, remove print statements

* Address comments, add checkpoint.is_resolved fn

* Add more detailed comments
2020-02-09 12:17:45 -08:00
fyrestone 0648bd28ef [xlang] Cross language Python support (#6709) 2020-02-08 13:01:28 +08:00
Alind Khare f146d05b36 [Serve] Added support for composing arbitrary DAGs (#7015) 2020-02-07 17:55:26 -08:00
Stephanie Wang 3333ee84a5 Fix ref counting (#7075) 2020-02-06 14:35:08 -08:00
Simon Mo a0ba4499ac [Serve] Fix batching bug 2020-02-05 14:18:19 -08:00
ijrsvt 0826f95e1c Including psutil & setproctitle (#7031) 2020-02-05 14:16:58 -08:00
Sven Mika 93ed86f175 [Tune] logger.py: Relax TBX Summary ValueErrors with e.g. empty lists in lists (and all… (#6987) 2020-02-05 12:02:39 -08:00
fangfengbin ade7ebfc0c Add service based gcs client (#6686) 2020-02-05 12:06:25 +08:00
Eric Liang 37053443b4 Restore set omp (#7051) 2020-02-04 15:02:23 -08:00
Simon Mo dd095c476a Move serve and asyncio tests to bazel (#6979) 2020-02-04 08:29:16 -08:00
Edward Oakes 844f607c93 Collect contained ObjectIDs during deserialization (#7029) 2020-02-03 22:49:14 -08:00
Simon Mo 5e8ded344a [Serve] Fix flaky test with nursery double init (#6982) 2020-02-03 21:32:12 -08:00
Edward Oakes 984490d2be Collect object IDs during serialization (#6946) 2020-02-03 18:38:11 -08:00