Commit Graph

120 Commits

Author SHA1 Message Date
SangBin Cho d2963f4ee1 [Object Spilling] Clean up FS storage upon sigint for ray.init(). (#13649)
* Initial iteration done.

* Remove unnecessary messages.

* Addressed code review.

* Addressed code review.

* fix issues.

* addressed code review.

* Addressed the last code review.
2021-01-26 23:10:29 -08:00
SangBin Cho 8baafacb1e [Logging] Log rotation config (#13375)
* In Progress.

* formatting.

* in progress.

* linting.

* Done.

* Fix typo.

* Fixed the issue.
2021-01-26 20:15:55 -08:00
Ameer Haj Ali 4dabf017ee Close #12031 (Autoscaler is overriding your resource for same quantity) (#13671) 2021-01-24 16:31:53 -08:00
architkulkarni da5928304a [Metrics] Cache metrics ports in a file at each node (#13501)
* cache metric ports in a file at each node

* remove old assignment of export port

* lint

* lint

* move e2e test to top of file to avoid shutdown bug
2021-01-22 09:59:20 -08:00
Ameer Haj Ali 1fbc3ddfac Add ability to not start Monitor when calling ray start (#13505) 2021-01-18 18:31:53 -08:00
Philipp Moritz 9872fc1801 Start ray client server with 'ray start' (#13217) 2021-01-06 21:04:14 -08:00
Richard Liaw 87cf1a97e5 [core] recover startup logs (#12876) 2020-12-15 13:49:45 -08:00
Kai Yang e3b5deb741 [Multi-tenancy] Delete flag enable_multi_tenancy and remove old code path (#10573) 2020-12-10 19:01:40 +08:00
SangBin Cho 8223a33bff [Logging] Log rotation on all components (#12101)
* In Progress.

* Done.

* Fix the issue.

* Add wait for condition because logs are not written right away now.

* debug string.

* lint.

* Fix flaky test.

* Fix issues.

* Fix test.

* lint.
2020-11-30 19:03:55 -08:00
SangBin Cho 753cda2f28 [Dashboard] Delete old dashboard (#12144)
* Delete old dashboard from repo.

* Delete old dashboard from repo. 2
2020-11-25 11:31:02 -08:00
Richard Liaw 0d388c4d31 [autoscaler] remove unnecessary print output (#12131) 2020-11-18 18:33:48 -08:00
SangBin Cho f56d7c1a76 [Logging] Remove per worker job log file / support worker log rotation (#11927)
* In progress.

* MVP done.

* In Progress.

* Remove unnecessay code.

* Fix some issues.

* Fix test failures.

* Addressed code review + fix object spilling test failure.
2020-11-16 11:29:43 -08:00
Richard Liaw a09e49ee94 [core] Add retry for reading session name (#11844) 2020-11-09 11:22:50 -08:00
Akash Patel b7531fb4f5 [redis-py] change redis-py deprecated hmset usage to hset (#11776) 2020-11-03 22:23:02 -08:00
Stephanie Wang 0ba777af99 [Object spilling] Add policy to automatically spill objects on OutOfMemory (#11673) 2020-11-02 12:42:02 -08:00
Gekho457 9e63f7ccc3 [autoscaler/k8s] ray up 409 error fix (#11660) 2020-10-28 14:19:57 -05:00
Max Fitton caf3b04b27 [Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00
Edward Oakes 5d7f271e7d Add --worker-port-list option to ray start (#11481) 2020-10-21 14:46:45 -05:00
Max Fitton cdca5af53b Revert "[Dashboard] Turn on New Dashboard by Default (#11321)" (#11502)
This reverts commit f500292d41.
2020-10-20 10:53:10 -05:00
Max Fitton f500292d41 [Dashboard] Turn on New Dashboard by Default (#11321) 2020-10-19 12:31:11 -05:00
Eric Liang 609c1b8acd Start moving ray internal files to _private module (#10994) 2020-09-24 22:46:35 -07:00
SangBin Cho 8c241d5f1d [Core] Use node ip address properly in ray.init (#10829)
* Fix.

* Addressed code review.

* Addressed code review.
2020-09-24 11:44:52 -07:00
SongGuyang f9b040db52 add log-dir to new dashboard (#10885) 2020-09-24 13:40:37 +08:00
SangBin Cho 390107b6cb [Core] Allow to pass node ip address to gcs server. (#10946)
* Allow to pass node ip address to gcs server.

* Fix.

* Addressed code review.

* Fixed an error.

* Addressed code review.
2020-09-23 01:52:26 -07:00
Kai Yang afa0216280 Remove the '--include-java' option (#10594) 2020-09-09 17:01:17 +08:00
Alex Wu d9c68fca5c [Core] Logging improvements (#10625)
* other stuff
:

* lint

* .

* .

* lint

* comment

* lint

* .
2020-09-08 20:58:05 -07:00
chaokunyang bbfbc98a41 [Core] Allow users to specify the classpath and import path (#10560)
* move job resource path to job config

* job resource path support list

* job resource path support for python

* fix job_resource_path support

* fix worker command

* fix job config

* use jar file instead of parent path

* fix job resource path

* add test to test.sh

* lint

* Update java/runtime/src/main/resources/ray.default.conf

Co-authored-by: Kai Yang <kfstorm@outlook.com>

* fix testGetFunctionFromLocalResource

* lint

* fix rebase

* add jars in resource path to classloader

* add job_resource_path to worker

* add ray stop

* rename job_resource_path to resource_path

* fix resource_path

* refine resource_path comments

* rename job resource path to code search path

* Add instruction about starting a cross-language cluster

* fix ClassLoaderTest.java

* add code-search-path to RunManager

* refine comments for code-search-path

* rename resourcePath to codeSearchPath

* Update doc

* fix

* rename resourcePath to codeSearchPath

* update doc

* filter out empty path

* fix comments

* fix comments

* fix tests

* revert pom

* lint

* fix doc

* update doc

* Apply suggestions from code review

* lint

Co-authored-by: Kai Yang <kfstorm@outlook.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-09-09 00:46:32 +08:00
Alex Wu b1f3c9e10e [Autoscaler] Fix resource passing bug fix (#10397) 2020-08-28 15:43:18 -07:00
Stephanie Wang f75dfd60a3 [api] API deprecations and cleanups for 1.0 (internal_config and Checkpointable actor) (#10333)
* remove

* internal config updates, remove Checkpointable

* Lower object timeout default

* remove json

* Fix flaky test

* Fix unit test
2020-08-27 10:19:53 -07:00
fyrestone 05c103af94 [Dashboard] Start the new dashboard (#10131)
* Use new dashboard if environment var RAY_USE_NEW_DASHBOARD exists; new dashboard startup

* Make fake client/build/static directory for dashboard

* Add test_dashboard.py for new dashboard

* Travis CI enable new dashboard test

* Update new dashboard

* Agent manager service

* Add agent manager

* Register agent to agent manager

* Add a new line to the end of agent_manager.cc

* Fix merge; Fix lint

* Update dashboard/agent.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Update dashboard/head.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Fix bug

* Add tests for dashboard

* Fix

* Remove const from Process::Kill() & Fix bugs

* Revert error check of execute_after

* Raise exception from DashboardAgent.run

* Add more tests.

* Fix compile on Linux

* Use dict comprehension instead of dict(generator)

* Fix lint

* Fix windows compile

* Fix lint

* Test Windows CI

* Revert "Test Windows CI"

This reverts commit 945e01051ec95cff5fcc1c0bc37045b46e7ad9a6.

* Fix ParseWindowsCommandLine bug

* Update src/ray/util/util.cc

Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
2020-08-24 13:24:23 -07:00
SangBin Cho 92664249e8 Partially Use f string (#10218)
* flynt. trial 1.

* Trial 1.

* Addressed code review.
2020-08-20 18:21:16 -07:00
Lixin Wei d188becec2 [Python Worker] Add pid to log file name (#10149)
Co-authored-by: Alex Wu <alex@anyscale.io>
2020-08-18 11:48:48 -07:00
Alex Wu 0b5d5ec17d [Autoscaler] Pass custom resources to "ray start" multi instance autoscaling (#9986) 2020-08-17 22:34:07 -07:00
Siyuan (Ryans) Zhuang 17ca1d8ff4 [Core] Object spilling prototype (#9818) 2020-08-14 15:39:10 -07:00
Robert Nishihara 36e626e95d Revert "[Dashboard] Start the new dashboard (#9860)" (#10116)
This reverts commit 739933e5b8.
2020-08-14 14:06:57 -07:00
Simon Mo 01f38bc5d1 CoreWorker correctly push metrics to agent (#10031) 2020-08-13 16:44:53 -07:00
fyrestone 739933e5b8 [Dashboard] Start the new dashboard (#9860) 2020-08-13 11:01:46 +08:00
Alex Wu 84b7240c4b [Core] Read resources from an environment variable (#9831) 2020-08-06 18:32:01 -07:00
SangBin Cho ec2f1a225e [Stats] Metrics Export User Interface Part 1 (#9913)
* Metrics export port expose done.

* Support exposing metrics port + metrics agent service discovery through ray.nodes()

* Formatting.

* Added a doc.

* Linting.

* Change the location of metrics agent port.

* Addressed code review.

* Addressed code review.
2020-08-06 16:16:29 -07:00
Kai Yang 27cd323ce1 [Core] Multi-tenancy: Job isolation & implement per job config (except for env variables) (#9500) 2020-08-04 15:51:29 +08:00
Alex Wu 6e294dd90f [Core] Custom socket name (#9766)
* fix issues

* hot fixes

* test

* test

* socket name change only
2020-07-29 13:19:41 -07:00
SangBin Cho d49b19c24c [Stats] Improve Stats::Init & Add it to GCS server (#9563) 2020-07-25 10:42:08 -07:00
Clark Zinzow 9f969260e8 [core] Fix Ray service startup when logging redirection is disabled. (#9547) 2020-07-23 11:26:24 -05:00
Clark Zinzow 9b1772253f Ensure unique log file names across same-node raylets. (#9561) 2020-07-20 16:03:11 -05:00
SangBin Cho 539c51a003 [Core] Support GCS server port assignment. (#8962) 2020-07-14 11:49:56 -05:00
SangBin Cho f6eb47fc1f [Stats] metrics agent exporter (#9361) 2020-07-14 11:49:16 -05:00
Ian Rodney 0085cf75d0 Allow --lru-evict to be passed into ray start (#8959) 2020-07-13 14:09:39 -07:00
Hao Chen d49dadf891 Change Python's ObjectID to ObjectRef (#9353) 2020-07-10 17:49:04 +08:00
Ian Rodney 9172f8c3a6 [core] Store Internal Config in GCS (#8921) 2020-07-08 11:22:08 -05:00
Xianyang Liu 0bfcc2e5ba [core] Better support multi-nic environments by respecting user-provided IP (#8512) 2020-06-25 14:03:12 -05:00