Eric Liang
e2e30ca507
Ray, Tune, and RLlib support for memory, object_store_memory options ( #5226 )
2019-08-21 23:01:10 -07:00
Edward Oakes
c7ae4e5e1f
Check for dead processes in blocked ray start ( #5458 )
2019-08-17 20:44:08 -07:00
Robert Nishihara
61b23a9a70
Don't stop Jupyter notebook in ray stop. ( #5387 )
2019-08-11 15:18:01 -07:00
Eric Liang
df47bdf6c9
Allow address instead of redis_address ( #5412 )
...
* addr
* wip
* fix typo
* add to start
* switch to ray address for train
* say address
* disambiguate help
* comments 2
2019-08-10 00:18:41 -07:00
Simon Mo
d9b45cceec
[Project] Implementing Project CLI ( #5397 )
2019-08-08 21:28:25 -07:00
Philipp Moritz
e8d9cfc1f1
Ray projects schema and validation ( #5329 )
2019-08-06 14:36:04 -07:00
Simon Mo
25b5bd1530
ray stop sends SIGKILL instead of SIGTERM (#5354 )
2019-08-02 14:46:03 -07:00
Richard Liaw
1798d4f077
[autoscaler] Add hard kill and monitor commands ( #5082 )
...
* Add hard kill and monitor commands
* better_commands
* Update python/ray/scripts/scripts.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com >
2019-07-06 21:52:55 -07:00
Robert Nishihara
6703519144
Move global state API out of global_state object. ( #4857 )
2019-05-26 11:27:53 -07:00
Richard Liaw
88b45a53d6
[autoscaler] rsync cluster ( #4785 )
2019-05-16 23:11:06 -07:00
Richard Liaw
ffe61fcc70
[tune] Support non-arg submit ( #4803 )
2019-05-16 23:10:07 -07:00
Richard Liaw
3bbafc7105
[autoscaler] Fix submit ( #4782 )
2019-05-14 19:52:28 -07:00
Qing Wang
62c949bbd5
Fix ray stop by killing raylet before plasma ( #4778 )
2019-05-13 14:53:10 +08:00
Daniel Edgecumbe
3e1adafbce
[autoscaler] Add an aggressive_autoscaling flag ( #4285 )
2019-04-13 18:44:32 -07:00
Robert Nishihara
9c158c6a87
Start dashboard on all nodes and other small fixes. ( #4428 )
...
* Start reporter on all nodes.
* More fixes
2019-03-20 13:04:06 -07:00
Eric Liang
78ad9c4cbb
Add "ray timeline" command to auto-dump Chrome trace for the current Ray instance ( #4239 )
2019-03-05 16:28:00 -08:00
Eric Liang
6e3384a719
[rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} ( #4215 )
2019-03-04 14:05:42 -08:00
Hao Chen
14ff402d70
Make ray stop command also kill Java workers ( #4179 )
2019-03-01 11:05:19 +08:00
Robert Nishihara
d9bcaa20b5
Turn UI off by default. ( #4188 )
2019-02-28 17:29:52 -08:00
Richard Liaw
5bfcfa8ec8
[autoscaler] Fix Submit ( #4174 )
2019-02-27 00:02:50 -08:00
Eric Liang
60dbc771a2
Revert "[autoscaler] Fix redirects, fix submit ( #4085 )" ( #4158 )
...
This reverts commit acf4d53b55 .
2019-02-25 17:00:59 -08:00
Robert Nishihara
688a0d17e6
Kill dashboard and reporter in ray stop. ( #4116 )
2019-02-23 12:08:39 -08:00
William Ma
fedad488d8
Kills gdb processes with ray stop ( #4046 )
2019-02-21 11:28:26 -08:00
Richard Liaw
acf4d53b55
[autoscaler] Fix redirects, fix submit ( #4085 )
2019-02-20 21:35:33 -08:00
Yuhong Guo
1f864a02bc
Add option of load_code_from_local which is required in cross-language ray call. ( #3675 )
2019-02-21 12:37:17 +08:00
Eric Liang
6e46d75554
[tune] Remove slow gzip of checkpoints; ignore jupyter stop errors ( #4076 )
...
* fix gzip
* ignore jupyter
2019-02-18 01:30:13 -08:00
Robert Nishihara
5f71751891
API cleanups. Remove worker argument. Remove some deprecated arguments. ( #4025 )
...
* Remove worker argument from API methods.
* Remove deprecated arguments and deprecate redirect_output and redirect_worker_output.
* Fix
2019-02-15 10:49:16 -08:00
Kristian Hartikainen
729d0b2825
[autoscaler] docker run options ( #3921 )
...
Adds support for docker options, allowing for use of nvidia-docker.
Closes #2657 .
2019-02-13 12:26:28 -08:00
Wang Qing
e1c68a0881
Enable including Java worker for ray start command ( #3838 )
2019-02-04 16:23:43 +08:00
Kristian Hartikainen
b9eed2e86c
[autoscaler] Move attach helper text under exec_cluster ( #3920 )
...
## What do these changes do?
Moves the attach command helper from cli commands to the actual `exec_cluster` function.
2019-01-31 17:01:24 -08:00
Richard Liaw
d128636bab
Ray Logging Configuration ( #3691 )
...
* fix logging for autoscaler
* module logging
* try this for logging
* yapf
* fix
* Initial logging setup
* momery
* ok
* remove basicconfig
* catch
* remove package logging
* print
* fix
* try_fix
* fix 1
* revert rllib
* logging level
* flake8
* fix
* fix
* Remove vestigal TODO
2019-01-30 21:01:12 -08:00
Robert Nishihara
8723d6b061
Define a Node class to manage Ray processes. ( #3733 )
...
* Implement Node class and move most of services.py into it.
* Wait for nodes as they are added to the cluster.
* Fix Redis authentication bug.
* Fix bug in client table ordering.
* Address comments.
* Kill raylet before plasma store in test.
* Minor
2019-01-11 22:30:38 -08:00
Stephanie Wang
cc5ecd71c5
[autoscaler] Add kill and get IP commands to CLI for testing ( #3731 )
...
## What do these changes do?
Adds 2 commands to the CLI that take in an autoscaler config:
1. Kill a random ray node in the cluster.
2. Get all the worker node IP addresses.
These commands are both for testing and are not recommended for normal use.
## Related issue number
Closes #3685 .
2019-01-10 22:06:57 -08:00
Robert Nishihara
c9d70f0dda
Remove num_local_schedulers argument from ray.worker._init. ( #3704 )
...
* Remove num_local_schedulers argument from ray.worker._init.
* Fix
* Fix tests.
2019-01-07 12:44:49 -08:00
Robert Nishihara
586a5c9ffa
Limit default redis max memory to 10GB. ( #3630 )
...
* Limit Redis max memory to 10GB/shard by default.
* Update stress tests.
* Reorganize
* Update
* Add minimum cap size for object store and redis.
* Small test update.
2019-01-03 13:23:54 -08:00
Robert Nishihara
b6bcd18d65
Split profile table among many keys in the GCS. ( #3676 )
...
* Divide profile table among many keys in GCS.
* Fix, and remove --collect-profiling-data arg.
* Remove reference in doc.
2019-01-02 21:33:01 -08:00
Yuhong Guo
c9b8ecca51
Add RayParams to refactor the parameters used by ray python. ( #3558 )
2018-12-29 22:04:27 +08:00
Eric Liang
cffe8f9806
Add option to evict keys LRU from the sharded redis tables ( #3499 )
...
* wip
* wip
* format
* wip
* note
* lint
* fix
* flag
* typo
* raise timeout
* fix
* optional get
* fix flag
* increase timeout in test
* update docs
* format
2018-12-09 05:48:52 -08:00
Kristian Hartikainen
be6567e6fd
Tweak/exec attach info ( #3447 )
...
* Add custom cluster name to exec info
* Update submit info to match exec info
2018-12-03 21:39:43 -08:00
Eric Liang
c46ea2ff4b
Click 0.7 changes the naming convention for commands; fix this
2018-11-28 14:59:58 -08:00
Eric Liang
0d56fc10cc
Move setproctitle to ray[debug] package ( #3415 )
2018-11-27 09:50:59 -08:00
Richard Liaw
c24d87b4d1
[autoscaler] Submit command ( #3312 )
2018-11-20 14:03:34 -08:00
Philipp Moritz
8894883153
Force kill web UI in ray stop ( #3257 )
2018-11-08 00:05:32 -08:00
Richard Liaw
0bab8ed95c
Expose internal config parameters for starting Ray ( #3246 )
...
## What do these changes do?
This PR exposes the CL option for using a config parameter. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly.
Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible.
#3239 depends on this.
TODO:
- [x] Add documentation to method arguments before merging.
- [x] Add test to verify this works?
## Related issue number
2018-11-07 21:46:02 -08:00
Eric Liang
725df3a485
Set the process title in workers and actors ( #3219 )
2018-11-06 14:59:22 -08:00
Eric Liang
9a0f0db070
Add ray stack tool for debugging ( #3213 )
2018-11-03 13:13:02 -07:00
Robert Nishihara
e495ab5e7c
Fix some paths /tmp/raylogs -> /tmp/ray. ( #3189 )
2018-11-02 12:10:53 -07:00
Robert Nishihara
fd854ff090
Allow the node manager port and object manager port to be set through… ( #3130 )
...
* Allow the node manager port and object manager port to be set through ray start.
* Linting
* Fix Java test
* Address comments.
2018-10-28 17:28:41 -07:00
Robert Nishihara
658c14282c
Remove legacy Ray code. ( #3121 )
...
* Remove legacy Ray code.
* Fix cmake and simplify monitor.
* Fix linting
* Updates
* Fix
* Implement some methods.
* Remove more plasma manager references.
* Fix
* Linting
* Fix
* Fix
* Make sure class IDs are strings.
* Some path fixes
* Fix
* Path fixes and update arrow
* Fixes.
* linting
* Fixes
* Java fixes
* Some java fixes
* TaskLanguage -> Language
* Minor
* Fix python test and remove unused method signature.
* Fix java tests
* Fix jenkins tests
* Remove commented out code.
2018-10-26 13:36:58 -07:00
Robert Nishihara
5aa29613db
Fix linting errors. ( #3127 )
2018-10-24 16:30:00 -07:00