Commit Graph

5298 Commits

Author SHA1 Message Date
Sven Mika e540e425e4 [RLlib] rllib rollout test and bug fixes. (#9779) 2020-07-30 16:17:03 +02:00
Sven Mika f6bd12eb18 [RLlib] Add tensor-based tests for Schedules and fix some bugs related to using Schedules with tensor time input. (#9782) 2020-07-30 12:49:32 +02:00
Miguel Morales 372114b4ed Update sampler.py (#9805)
Minor fix for warning string
2020-07-29 22:58:35 -07:00
bermaker ccd6b90a42 Fix ray java worker metric registry indentation (#9780) 2020-07-30 13:20:24 +08:00
chaokunyang 6464bf55c6 [dist] Mvn deploy (#9777) 2020-07-30 11:48:31 +08:00
Kai Yang 9be5a2f0fc Fix GCS related tests (#9783) 2020-07-30 11:46:36 +08:00
Hao Chen 260bc52254 Java doc: "Ray Core Walkthrough" page (#8595) 2020-07-30 11:13:38 +08:00
chaokunyang 5aba53e9b2 [dist] Fix travis deploy for java dist (#9768) 2020-07-30 10:59:11 +08:00
SangBin Cho 826f14c824 [Stats] Fix harvestor threads + Fix flaky stats shutdown. (#9745) 2020-07-29 18:57:59 -05:00
mehrdadn 07022f3f11 Fix src/ray/core_worker/common.h deleted constructor (#9785)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-29 15:49:02 -07:00
Alex Wu 6e294dd90f [Core] Custom socket name (#9766)
* fix issues

* hot fixes

* test

* test

* socket name change only
2020-07-29 13:19:41 -07:00
Alex Wu e6696b2533 Fixed stderr logging (9765) 2020-07-29 13:19:04 -07:00
Alex Wu 72297dc46f [Core] Socket creation race condition bug fixes (#9764)
* fix issues

* hot fixes

* test

* test

* Always info log
2020-07-29 13:17:46 -07:00
Sven Mika b0b0463161 [RLlib] Trajectory View API (preparatory cleanup and enhancements). (#9678) 2020-07-29 21:15:09 +02:00
Bill Chambers 067c2752f8 [TUNE] Tune Docs re-organization (#9600)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-07-29 11:22:44 -07:00
SangBin Cho d1b37ca7e4 [GCS Actor Management] Fix flaky test_dead_actors. (#9715)
* Fix.

* Add logs.

* Add an unit test.
2020-07-29 10:54:18 -07:00
Tao Wang 2babad9906 [GCS]Use a separate thread in node failure detector to handle heartbeat (#9416)
* use a sole thread to handle heartbeat

* separate signal thread

* use work to avoid exiting when task is underway

* protect shared data structure to avoid deadlock

* add comments

* decrease io service num

* minor changes

* fix test

* per stephanie's comments

* use single io service instead of 1-size io service pool

* typo
2020-07-29 09:58:58 -07:00
Lingxuan Zuo 156067b423 [Stats] enable core worker stats (#9355) 2020-07-29 17:28:33 +08:00
fangfengbin a484947742 Fix leased worker leak bug if lease worker requests that are still waiting to be scheduled when GCS restarts (#9719) 2020-07-29 14:16:03 +08:00
Kai Yang 2cafc7cebe [Java] Fix MetricTest.java due to incomplete changes from #9703 (#9770) 2020-07-29 12:18:17 +08:00
Kai Yang bdc005a4d4 [Java] Use test groups to filter tests of different run modes (#9703) 2020-07-29 11:18:45 +08:00
Simon Mo 9fbfee2424 Pin pytest version (#9767) 2020-07-28 19:54:48 -07:00
mehrdadn fb5280f21b Fix some Windows CI issues (#9708)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-28 18:10:23 -07:00
SangBin Cho 423dc96cc4 Revert "[dist] swap mac/linux wheel build order (#9746)" and "Fix package and upload ray jar (#9742)" (#9758)
* Revert "[dist] swap mac/linux wheel build order (#9746)"

This reverts commit a9340565ff.

* Revert "Fix package and upload ray jar (#9742)"

This reverts commit c290c308fe.
2020-07-28 15:34:29 -07:00
Alex Wu 21af0ceb0c Register function race (#9346) 2020-07-28 13:51:34 -07:00
SangBin Cho c00742f103 [Release] Fix release tests (#9733) 2020-07-28 10:44:06 -07:00
SangBin Cho 7e3ba289dc [Stats] Basic Metrics Infrastructure (Metrics Agent + Prometheus Exporter) (#9607) 2020-07-28 10:28:01 -07:00
Alex Wu feb3751824 [New scheduler] First unit test for task manager (#9696)
* .

* .

* refactor WorkerInterface

* .

* Basic unit test structure complete?

* .

* bad git >:-(

* small clean up

* CR

* .

* .

* One more fixture

* One more fixture

* .

* .

* bazel-format

* .
2020-07-28 09:44:58 -07:00
Ian Rodney b1c2983c97 Run _with_interactive in Docker (#9747) 2020-07-28 08:57:04 -07:00
fangfengbin bd18e975c0 fix windows compile bug (#9741)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-07-28 21:30:31 +08:00
bermaker 6e23aff723 [Metrics]Ray java worker metric registry (#9636)
* ray worker metrics gauge init

* ray java metric mapping

* add jni source files for gauge and tagkey

* mapping all metric classes to stats object

* check non-null for tags and name

* lint

* add symbol for native metric JNI

* extern c for symbol

* add tests for all metrics

* Update Metric.java

use metricNativePointer instead.

* unify metric native stuff to one class

* fix jni file

* add comments for metric transform function in jni utils

* move metric function to native metric file

* remove unused disconnect jni

* Add a metric registry for java metircs

* Restore install-bazel.sh

* Add some comments for metric registry

* Fix thread safe problem of metrics

* Fix metric tests and remove sleep code from tests

* Fix comments of metrics

Co-authored-by: lingxuan.zlx <skyzlxuan@gmail.com>
2020-07-28 21:29:33 +08:00
Sven Mika ff9c1dac88 [RLlib] Issue 9667 DDPG Torch bugs and enhancements. (#9680) 2020-07-28 14:15:03 +02:00
Sven Mika e6ea33a03c [RLlib] Enhance reward clipping test; add action_clipping tests. (#9684) 2020-07-28 10:44:54 +02:00
chaokunyang a9340565ff [dist] swap mac/linux wheel build order (#9746) 2020-07-28 16:44:19 +08:00
Alan Guo 5831737287 Introduce file_mounts_sync_continuously cluster option (#9544)
* Separate out file_mounts contents hashing into its own separate hash

Add an option to continuously sync file_mounts from head node to worker nodes:
monitor.py will re-sync file mounts whenver contents change but will only run setup_commands if the config also changes

* add test and default value for file_mounts_sync_continuously

* format code

* Update comments

* Add param to skip setup commands when only file_mounts content changed during monitor.py's update tick

Fixed so setup commands run when ray up is run and file_mounts content changes

* Refactor so that runtime_hash retains previous behavior

runtime_hash is almost identical as before this PR. It is used to determine if setup_commands need to run
file_mounts_contents_hash is an additional hash of the file_mounts content that is used to detect when only file syncing has to occur.

Note: runtime_hash value will have changed from before the PR because we hash the hash of the contents of the file_mounts as a performance optimization

* fix issue with hashing a hash

* fix bug where trying to set contents hash when it wasn't generated

* Fix lint error

Fix bug in command_runner where check_output was no longer returning the output of the command

* clear out provider between tests to get rid of flakyness

* reduce chance of race condition from node_launcher launching a node in the middle of an autoscaler.update call
2020-07-28 00:02:08 -07:00
chaokunyang c290c308fe Fix package and upload ray jar (#9742) 2020-07-28 11:53:25 +08:00
Lingxuan Zuo 1049c9e53b [Stats] fix stats shutdown crash if opencensus exporter not initialized (#9727) 2020-07-28 11:21:10 +08:00
SangBin Cho 914cc96c91 Fix broken actor failure tests. (#9737) 2020-07-27 18:59:44 -07:00
Ian Rodney ebcfef012f [docker] Uses Latest Conda & Py 3.7 (#9732) 2020-07-27 16:18:12 -07:00
mehrdadn 2949c09ee8 Fix remote-watch.py (#9625)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-27 15:54:23 -07:00
Ian Rodney 78c34ae35e Include open-ssh-client for transparency (#9693) 2020-07-27 15:31:35 -07:00
Alisa 51e12ee97c Python api of placement group (#9243) 2020-07-27 14:57:05 -07:00
Michael Luo b51ab2af66 [RLlib] Offline Type Annotations (#9676)
* Offline Annotations

* Modifications

* Fixed circular dependencies

* Linter fix
2020-07-27 14:01:17 -07:00
Bill Chambers 2e9d748100 [Cluster Launcher] Re Org the cluster launcher pages. (#9687) 2020-07-27 13:47:06 -07:00
Ian Rodney d35605079e [core] Removes Error when Internal Config is not set (#9700) 2020-07-27 11:47:54 -07:00
Robert Nishihara 5d89aedd40 Keep build-autoscaler-images.sh alive in CI (#9720) 2020-07-27 10:43:14 -07:00
Simon Mo 9213a81734 Only build docker wheels in LINUX_WHEELS env (#9729) 2020-07-27 10:42:40 -07:00
Simon Mo 7740136b93 Revert "Package and upload ray cross-platform jar (#9540)" (#9730)
This reverts commit 881032593d.
2020-07-27 10:40:21 -07:00
chaokunyang 881032593d Package and upload ray cross-platform jar (#9540) 2020-07-27 17:20:20 +08:00
fangfengbin 2790818c53 [GCS]GCS client support multi-thread subscribe&resubscribe&unsubscribe (#9718) 2020-07-27 13:58:39 +08:00