Commit Graph

86 Commits

Author SHA1 Message Date
fyrestone 269e1f0b98 Fix push_error_to_driver_through_redis (#10848)
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-09-17 10:50:44 -07:00
Alex Wu 6f479d4697 [hotfix] CPU Detection (#10821) 2020-09-16 21:02:52 -07:00
Richard Liaw e3fd5eceec [minor] fix warning about docker cpus (#10768) 2020-09-14 09:08:34 -07:00
Ian Rodney 5bc2ba38fd [docker] Detect CPUs in container correctly (#10507)
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
2020-09-13 23:40:48 -07:00
Alex Wu d6a9f0e2e4 [Core] Accelerator type API (#10561) 2020-09-06 20:58:40 -07:00
Alex Wu a699f6a4d8 [Core] Fix override memory and object_store_memory in decorator (#10563) 2020-09-06 20:56:48 -07:00
fyrestone e9b046306a [Dashboard] Dashboard basic modules (#10303)
* Improve reporter module

* Add test_node_physical_stats to test_reporter.py

* Add test_class_method_route_table to test_dashboard.py

* Add stats_collector module for dashboard

* Subscribe actor table data

* Add log module for dashboard

* Only enable test module in some test cases

* CI run all dashboard tests

* Reduce test timeout to 10s

* Use fstring

* Remove unused code

* Remove blank line

* Fix dashboard tests

* Fix asyncio.create_task not available in py36; Fix lint

* Add format_web_url to ray.test_utils

* Update dashboard/modules/reporter/reporter_head.py

Co-authored-by: Max Fitton <mfitton@berkeley.edu>

* Add DictChangeItem type for Dict change

* Refine logger.exception

* Refine GET /api/launch_profiling

* Remove disable_test_module fixture

* Fix test_basic may fail

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <mfitton@berkeley.edu>
2020-08-29 23:09:34 -07:00
Ian Rodney d6f2b0d933 [docker] Run profiling without sudo (#10388)
* fix profiling for docker

* small fixes

* use name

* do not import pwd on windows
2020-08-28 21:25:10 -07:00
SangBin Cho 92664249e8 Partially Use f string (#10218)
* flynt. trial 1.

* Trial 1.

* Addressed code review.
2020-08-20 18:21:16 -07:00
yncxcw 32cd94b750 [Core] Do not convert gpu id to int (#9744)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-11 12:09:46 -07:00
Alex Wu 12d75784a4 [Core] test_advanced_3.py::test_logging_to_driver (round 2) (#9916) 2020-08-05 15:04:36 -05:00
kisuke95 28b1f7710c [Core] Error info pubsub (Remove ray.errors API) (#9665) 2020-08-04 14:04:29 +08:00
Clark Zinzow 9f969260e8 [core] Fix Ray service startup when logging redirection is disabled. (#9547) 2020-07-23 11:26:24 -05:00
Hao Chen d49dadf891 Change Python's ObjectID to ObjectRef (#9353) 2020-07-10 17:49:04 +08:00
Alex Wu 46962f5db1 [Core] Log monitor multidriver (#8953) 2020-06-25 11:05:53 -07:00
Alex Wu c152730e4a [Core] Log output from different jobs to different drivers. (#8885)
* .

* .

* Correct now

* No interactivity errors

* format

* Filtering

* lint

* .

* No more filtering

* Removed interactivity

* .

* .

* .

* .

* .

* .

* Redirection works

* formatting

* something broken?

* .

* Works

* formatting

* redirect output

* formatting

* formatting

* Fix file descriptor leakage

* format

* .

* .

* .

* .

* .

* Refactor

* .

* Only run on job switch

* .

* cleanup

* .

* ...

* Review

* .

* .

* .

* .

* whoops

* .

* Should fix bug

* .

* .

* addressed comments

* formatting

* formatting

* Fix typo

* .

* .

* .

* .

Co-authored-by: Ubuntu <ubuntu@ip-172-31-14-33.us-west-2.compute.internal>
2020-06-23 18:45:32 -07:00
mehrdadn 8958728139 Windows bug fixes (#7740) 2020-03-30 20:39:23 -05:00
mehrdadn fc23f79f82 Windows process issues (#7739) 2020-03-29 12:48:32 -07:00
Robert Nishihara ee8c9ff732 Remove six and cloudpickle from setup.py. (#7694) 2020-03-23 11:42:05 -07:00
mehrdadn a0700e2f86 Change /tmp to platform-specific temporary directory (#7529) 2020-03-16 18:10:14 -07:00
mehrdadn 4d42664b2a Use prctl(PR_SET_PDEATHSIG) on Linux instead of reaper (#7150) 2020-03-03 11:45:42 -06:00
Alex Wu 0d3687a10d No warning for docker memory > system memory (#7151) 2020-02-13 15:21:44 -08:00
Alex Wu 72c31e3e19 Ray nodes should respect docker limits (#7039) 2020-02-10 11:08:38 -08:00
ijrsvt 0826f95e1c Including psutil & setproctitle (#7031) 2020-02-05 14:16:58 -08:00
Richard Liaw 52c33b53f7 [minor][core] fix gpu ids for SLURM (#7014)
* fix gpu ids

* fix
2020-02-02 16:09:22 -08:00
Edward Oakes 92525f35d1 Remove raylet client from Python worker (#6018) 2020-01-31 18:23:01 -08:00
Ziyad Edher c480d1d1e4 Treat static methods as class methods instead of instance methods in actors (#6756)
* Treat static methods as class methods rather than instance methods

* Add tests for static methods in actors

* Revert formatting changes

* Readd future imports

* Restructure static method check

* Documentation enhancements

* Fix linting issues
2020-01-15 19:38:41 -06:00
Sven 60d4d5e1aa Remove future imports (#6724)
* Remove all __future__ imports from RLlib.

* Remove (object) again from tf_run_builder.py::TFRunBuilder.

* Fix 2xLINT warnings.

* Fix broken appo_policy import (must be appo_tf_policy)

* Remove future imports from all other ray files (not just RLlib).

* Remove future imports from all other ray files (not just RLlib).

* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).

* Add two empty lines before Schedule class.

* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Lixin Wei 859dbad155 Fix estimate_available_memory() in utils.py (#6302) 2020-01-08 15:22:47 +08:00
Ujval Misra 5b40408678 [tune] Remove py2.7-specific code (#6665)
* Remove backwards compatability py2.7 code.

* Use exists_ok=True in ray

* nit

* nit

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-03 01:03:13 -08:00
Yuhao Yang 3db8faab0d [tune] fix log dir race condition (#6420) 2019-12-10 21:00:19 -08:00
Eric Liang 4edae7ea2b Speed up task submissions a bit (#5992) 2019-10-25 00:10:37 -07:00
Edward Oakes 07c4c6367a [core worker] Python core worker object interface (#5272) 2019-09-12 23:07:46 -07:00
Edward Oakes dfd2a45f69 Simplify symlinking and don't print warnings (#5615) 2019-09-02 13:31:10 -07:00
Edward Oakes 0cc0abf857 Create session_latest symlink for Ray sessions (#5580) 2019-08-31 22:14:54 -07:00
Eric Liang e2e30ca507 Ray, Tune, and RLlib support for memory, object_store_memory options (#5226) 2019-08-21 23:01:10 -07:00
Qing Wang f2293243cc [ID Refactor] Shorten the length of JobID to 4 bytes (#5110)
* WIP

* Fix

* Add jobid test

* Fix

* Add python part

* Fix

* Fix tes

* Remove TODOs

* Fix C++ tests

* Lint

* Fix

* Fix exporting functions in multiple ray.init

* Fix java test

* Fix lint

* Fix linting

* Address comments.

* FIx

* Address and fix linting

* Refine and fix

* Fix

* address

* Address comments.

* Fix linting

* Fix

* Address

* Address comments.

* Address

* Address

* Fix

* Fix

* Fix

* Fix lint

* Fix

* Fix linting

* Address comments.

* Fix linting

* Address comments.

* Fix linting

* address comments.

* Fix
2019-07-11 14:25:16 +08:00
Eric Liang 5aec750107 Add warning/error if object store memory exceeds available memory (#4893)
* exclude

* format

* add warning

* hatch

* reduce mem usage

* reduce object store mem

* set obj mem
2019-07-08 21:37:08 -07:00
Qing Wang 62e4b591e3 [ID Refactor] Rename DriverID to JobID (#5004)
* WIP

WIP

WIP

Rename Driver -> Job

Fix complition

Fix

Rename in Java

In py

WIP

Fix

WIP

Fix

Fix test

Fix

Fix C++ linting

Fix

* Update java/runtime/src/main/java/org/ray/runtime/config/RayConfig.java

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Update src/ray/core_worker/core_worker.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Address comments

* Fix

* Fix CI

* Fix cpp linting

* Fix py lint

* FIx

* Address comments and fix

* Address comments

* Address

* Fix import_threading
2019-06-28 00:44:51 +08:00
Hao Chen 0131353d42 [gRPC] Migrate gcs data structures to protobuf (#5024) 2019-06-25 14:31:19 -07:00
Yuhong Guo 1a39fee9c6 Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (#4776)
* Enable BaseId.

* Change TaskID and make python test pass

* Remove unnecessary functions and fix test failure and change TaskID to
16 bytes.

* Java code change draft

* Refine

* Lint

* Update java/api/src/main/java/org/ray/api/id/TaskId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/ObjectId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comment

* Lint

* Fix SINGLE_PROCESS

* Fix comments

* Refine code

* Refine test

* Resolve conflict
2019-05-22 14:46:30 +08:00
Si-Yuan bd00735fe8 Fix tempfile issues (#4605) 2019-05-05 16:06:15 -07:00
Yuhong Guo c2349cf12d Remove local/global_scheduler from code and doc. (#4549) 2019-04-03 17:05:09 -07:00
Yuhong Guo 41b81af11b Downgrade six to 1.0.0 (#4180) 2019-02-27 13:05:25 -08:00
Si-Yuan 21472b890a Integrate "tempfile_service" into "ray.node.Node" (#3953) 2019-02-12 17:34:04 -08:00
Robert Nishihara ef527f84ab Stream logs to driver by default. (#3892)
* Stream logs to driver by default.

* Fix from rebase

* Redirect raylet output independently of worker output.

* Fix.

* Create redis client with services.create_redis_client.

* Suppress Redis connection error at exit.

* Remove thread_safe_client from redis.

* Shutdown driver threads in ray.shutdown().

* Add warning for too many log messages.

* Only stop threads if worker is connected.

* Only stop threads if they exist.

* Remove unnecessary try/excepts.

* Fix

* Only add new logging handler once.

* Increase timeout.

* Fix tempfile test.

* Fix logging in cluster_utils.

* Revert "Increase timeout."

This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95.

* Retry longer when connecting to plasma store from node manager and object manager.

* Close pubsub channels to avoid leaking file descriptors.

* Limit log monitor open files to 200.

* Increase plasma connect retries.

* Add comment.
2019-02-07 19:53:50 -08:00
Si-Yuan 9295ab8f60 Various Python code cleanups. (#3837) 2019-02-03 10:16:24 -08:00
Richard Liaw d128636bab Ray Logging Configuration (#3691)
* fix logging for autoscaler

* module logging

* try this for logging

* yapf

* fix

* Initial logging setup

* momery

* ok

* remove basicconfig

* catch

* remove package logging

* print

* fix

* try_fix

* fix 1

* revert rllib

* logging level

* flake8

* fix

* fix

* Remove vestigal TODO
2019-01-30 21:01:12 -08:00
Si-Yuan 48139cf861 Migrate Python C extension to Cython (#3541) 2019-01-24 09:17:14 -08:00
Yuhong Guo d2cf8561f2 Refactor code about ray.ObjectID. (#3674)
* Refactor code about ray.ObjectID.

* remove from_random and use nil_id instead of constructor

* remove id() in hash

* Lint and fix

* Change driver id to ObjectID

* Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()
2019-01-13 01:47:29 -08:00