Commit Graph

88 Commits

Author SHA1 Message Date
Eric Liang e2e30ca507 Ray, Tune, and RLlib support for memory, object_store_memory options (#5226) 2019-08-21 23:01:10 -07:00
Edward Oakes c7ae4e5e1f Check for dead processes in blocked ray start (#5458) 2019-08-17 20:44:08 -07:00
Robert Nishihara 61b23a9a70 Don't stop Jupyter notebook in ray stop. (#5387) 2019-08-11 15:18:01 -07:00
Eric Liang df47bdf6c9 Allow address instead of redis_address (#5412)
* addr

* wip

* fix typo

* add to start

* switch to ray address for train

* say address

* disambiguate help

* comments 2
2019-08-10 00:18:41 -07:00
Simon Mo d9b45cceec [Project] Implementing Project CLI (#5397) 2019-08-08 21:28:25 -07:00
Philipp Moritz e8d9cfc1f1 Ray projects schema and validation (#5329) 2019-08-06 14:36:04 -07:00
Simon Mo 25b5bd1530 ray stop sends SIGKILL instead of SIGTERM (#5354) 2019-08-02 14:46:03 -07:00
Richard Liaw 1798d4f077 [autoscaler] Add hard kill and monitor commands (#5082)
* Add hard kill and monitor commands

* better_commands

* Update python/ray/scripts/scripts.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2019-07-06 21:52:55 -07:00
Robert Nishihara 6703519144 Move global state API out of global_state object. (#4857) 2019-05-26 11:27:53 -07:00
Richard Liaw 88b45a53d6 [autoscaler] rsync cluster (#4785) 2019-05-16 23:11:06 -07:00
Richard Liaw ffe61fcc70 [tune] Support non-arg submit (#4803) 2019-05-16 23:10:07 -07:00
Richard Liaw 3bbafc7105 [autoscaler] Fix submit (#4782) 2019-05-14 19:52:28 -07:00
Qing Wang 62c949bbd5 Fix ray stop by killing raylet before plasma (#4778) 2019-05-13 14:53:10 +08:00
Daniel Edgecumbe 3e1adafbce [autoscaler] Add an aggressive_autoscaling flag (#4285) 2019-04-13 18:44:32 -07:00
Robert Nishihara 9c158c6a87 Start dashboard on all nodes and other small fixes. (#4428)
* Start reporter on all nodes.

* More fixes
2019-03-20 13:04:06 -07:00
Eric Liang 78ad9c4cbb Add "ray timeline" command to auto-dump Chrome trace for the current Ray instance (#4239) 2019-03-05 16:28:00 -08:00
Eric Liang 6e3384a719 [rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} (#4215) 2019-03-04 14:05:42 -08:00
Hao Chen 14ff402d70 Make ray stop command also kill Java workers (#4179) 2019-03-01 11:05:19 +08:00
Robert Nishihara d9bcaa20b5 Turn UI off by default. (#4188) 2019-02-28 17:29:52 -08:00
Richard Liaw 5bfcfa8ec8 [autoscaler] Fix Submit (#4174) 2019-02-27 00:02:50 -08:00
Eric Liang 60dbc771a2 Revert "[autoscaler] Fix redirects, fix submit (#4085)" (#4158)
This reverts commit acf4d53b55.
2019-02-25 17:00:59 -08:00
Robert Nishihara 688a0d17e6 Kill dashboard and reporter in ray stop. (#4116) 2019-02-23 12:08:39 -08:00
William Ma fedad488d8 Kills gdb processes with ray stop (#4046) 2019-02-21 11:28:26 -08:00
Richard Liaw acf4d53b55 [autoscaler] Fix redirects, fix submit (#4085) 2019-02-20 21:35:33 -08:00
Yuhong Guo 1f864a02bc Add option of load_code_from_local which is required in cross-language ray call. (#3675) 2019-02-21 12:37:17 +08:00
Eric Liang 6e46d75554 [tune] Remove slow gzip of checkpoints; ignore jupyter stop errors (#4076)
* fix gzip

* ignore jupyter
2019-02-18 01:30:13 -08:00
Robert Nishihara 5f71751891 API cleanups. Remove worker argument. Remove some deprecated arguments. (#4025)
* Remove worker argument from API methods.

* Remove deprecated arguments and deprecate redirect_output and redirect_worker_output.

* Fix
2019-02-15 10:49:16 -08:00
Kristian Hartikainen 729d0b2825 [autoscaler] docker run options (#3921)
Adds support for docker options, allowing for use of nvidia-docker.

Closes #2657.
2019-02-13 12:26:28 -08:00
Wang Qing e1c68a0881 Enable including Java worker for ray start command (#3838) 2019-02-04 16:23:43 +08:00
Kristian Hartikainen b9eed2e86c [autoscaler] Move attach helper text under exec_cluster (#3920)
## What do these changes do?
Moves the attach command helper from cli commands to the actual `exec_cluster` function.
2019-01-31 17:01:24 -08:00
Richard Liaw d128636bab Ray Logging Configuration (#3691)
* fix logging for autoscaler

* module logging

* try this for logging

* yapf

* fix

* Initial logging setup

* momery

* ok

* remove basicconfig

* catch

* remove package logging

* print

* fix

* try_fix

* fix 1

* revert rllib

* logging level

* flake8

* fix

* fix

* Remove vestigal TODO
2019-01-30 21:01:12 -08:00
Robert Nishihara 8723d6b061 Define a Node class to manage Ray processes. (#3733)
* Implement Node class and move most of services.py into it.

* Wait for nodes as they are added to the cluster.

* Fix Redis authentication bug.

* Fix bug in client table ordering.

* Address comments.

* Kill raylet before plasma store in test.

* Minor
2019-01-11 22:30:38 -08:00
Stephanie Wang cc5ecd71c5 [autoscaler] Add kill and get IP commands to CLI for testing (#3731)
## What do these changes do?

Adds 2 commands to the CLI that take in an autoscaler config:
1. Kill a random ray node in the cluster.
2. Get all the worker node IP addresses.

These commands are both for testing and are not recommended for normal use.

## Related issue number
Closes #3685.
2019-01-10 22:06:57 -08:00
Robert Nishihara c9d70f0dda Remove num_local_schedulers argument from ray.worker._init. (#3704)
* Remove num_local_schedulers argument from ray.worker._init.

* Fix

* Fix tests.
2019-01-07 12:44:49 -08:00
Robert Nishihara 586a5c9ffa Limit default redis max memory to 10GB. (#3630)
* Limit Redis max memory to 10GB/shard by default.

* Update stress tests.

* Reorganize

* Update

* Add minimum cap size for object store and redis.

* Small test update.
2019-01-03 13:23:54 -08:00
Robert Nishihara b6bcd18d65 Split profile table among many keys in the GCS. (#3676)
* Divide profile table among many keys in GCS.

* Fix, and remove --collect-profiling-data arg.

* Remove reference in doc.
2019-01-02 21:33:01 -08:00
Yuhong Guo c9b8ecca51 Add RayParams to refactor the parameters used by ray python. (#3558) 2018-12-29 22:04:27 +08:00
Eric Liang cffe8f9806 Add option to evict keys LRU from the sharded redis tables (#3499)
* wip

* wip

* format

* wip

* note

* lint

* fix

* flag

* typo

* raise timeout

* fix

* optional get

* fix flag

* increase timeout in test

* update docs

* format
2018-12-09 05:48:52 -08:00
Kristian Hartikainen be6567e6fd Tweak/exec attach info (#3447)
* Add custom cluster name to exec info

* Update submit info to match exec info
2018-12-03 21:39:43 -08:00
Eric Liang c46ea2ff4b Click 0.7 changes the naming convention for commands; fix this 2018-11-28 14:59:58 -08:00
Eric Liang 0d56fc10cc Move setproctitle to ray[debug] package (#3415) 2018-11-27 09:50:59 -08:00
Richard Liaw c24d87b4d1 [autoscaler] Submit command (#3312) 2018-11-20 14:03:34 -08:00
Philipp Moritz 8894883153 Force kill web UI in ray stop (#3257) 2018-11-08 00:05:32 -08:00
Richard Liaw 0bab8ed95c Expose internal config parameters for starting Ray (#3246)
## What do these changes do?

This PR exposes the CL option for using a config parameter. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly.

Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible.

#3239 depends on this.

TODO:
 - [x] Add documentation to method arguments before merging.
 - [x] Add test to verify this works?

## Related issue number
2018-11-07 21:46:02 -08:00
Eric Liang 725df3a485 Set the process title in workers and actors (#3219) 2018-11-06 14:59:22 -08:00
Eric Liang 9a0f0db070 Add ray stack tool for debugging (#3213) 2018-11-03 13:13:02 -07:00
Robert Nishihara e495ab5e7c Fix some paths /tmp/raylogs -> /tmp/ray. (#3189) 2018-11-02 12:10:53 -07:00
Robert Nishihara fd854ff090 Allow the node manager port and object manager port to be set through… (#3130)
* Allow the node manager port and object manager port to be set through ray start.

* Linting

* Fix Java test

* Address comments.
2018-10-28 17:28:41 -07:00
Robert Nishihara 658c14282c Remove legacy Ray code. (#3121)
* Remove legacy Ray code.

* Fix cmake and simplify monitor.

* Fix linting

* Updates

* Fix

* Implement some methods.

* Remove more plasma manager references.

* Fix

* Linting

* Fix

* Fix

* Make sure class IDs are strings.

* Some path fixes

* Fix

* Path fixes and update arrow

* Fixes.

* linting

* Fixes

* Java fixes

* Some java fixes

* TaskLanguage -> Language

* Minor

* Fix python test and remove unused method signature.

* Fix java tests

* Fix jenkins tests

* Remove commented out code.
2018-10-26 13:36:58 -07:00
Robert Nishihara 5aa29613db Fix linting errors. (#3127) 2018-10-24 16:30:00 -07:00