Commit Graph

101 Commits

Author SHA1 Message Date
Eric Liang 4044af8520 Try to enable dashboard (again) (#6069)
* Revert "Revert "Enable the Ray dashboard by default (#5976)" (#6068)"

This reverts commit 1a3e97cf23.

* fix tests that assume the dashboard isn't a job

* travis
2019-11-08 10:48:48 -08:00
Eric Liang 1a3e97cf23 Revert "Enable the Ray dashboard by default (#5976)" (#6068)
This reverts commit 6166ef3e09.
2019-11-01 17:08:37 -07:00
Eric Liang 6166ef3e09 Enable the Ray dashboard by default (#5976) 2019-11-01 12:19:01 -07:00
Simon Mo e08b5d0cae [Serve] Add a minimal cli (#5854)
* Add a minimal cli

* Integrate serve_cli with ray scripts
2019-10-28 09:51:31 -07:00
Mitchell Stern 235dec8aa3 [Dashboard] Remove token authentication from dashboard (#5888) 2019-10-21 12:48:48 -07:00
Philipp Moritz d23696de17 Introduce flag to use pickle for serialization (#5805) 2019-10-18 22:29:36 -07:00
Philipp Moritz 29eee7f970 Forward multiple ports for autoscaler (#5893) 2019-10-18 16:50:46 -07:00
Eric Liang 5ecb02fb80 Release 0.7.5 updates (#5727) 2019-09-26 10:30:37 -07:00
Mitchell Stern b03147e7bf Update call to py-spy to conform to new API (#5758) 2019-09-23 14:52:23 -07:00
Edward Oakes 62bc30c1cf Validate redis address parameters (#5746)
* Validate redis address params

* Fix comment

* Add check
2019-09-23 10:52:34 -05:00
Eric Liang 56ab9a00bb [autoscaler] cache stopped nodes, no screen on attach (#5741) 2019-09-22 17:30:35 -07:00
Edward Oakes a8888c5ff4 [flaky test] Fix test_calling_start_ray_head (#5644) 2019-09-14 22:27:45 -07:00
Kai Yang 8a352a8e70 ray stop kills processes more carefully (#5508) 2019-09-06 17:49:12 +08:00
Eric Liang e2e30ca507 Ray, Tune, and RLlib support for memory, object_store_memory options (#5226) 2019-08-21 23:01:10 -07:00
Edward Oakes c7ae4e5e1f Check for dead processes in blocked ray start (#5458) 2019-08-17 20:44:08 -07:00
Robert Nishihara 61b23a9a70 Don't stop Jupyter notebook in ray stop. (#5387) 2019-08-11 15:18:01 -07:00
Eric Liang df47bdf6c9 Allow address instead of redis_address (#5412)
* addr

* wip

* fix typo

* add to start

* switch to ray address for train

* say address

* disambiguate help

* comments 2
2019-08-10 00:18:41 -07:00
Simon Mo d9b45cceec [Project] Implementing Project CLI (#5397) 2019-08-08 21:28:25 -07:00
Philipp Moritz e8d9cfc1f1 Ray projects schema and validation (#5329) 2019-08-06 14:36:04 -07:00
Simon Mo 25b5bd1530 ray stop sends SIGKILL instead of SIGTERM (#5354) 2019-08-02 14:46:03 -07:00
Richard Liaw 1798d4f077 [autoscaler] Add hard kill and monitor commands (#5082)
* Add hard kill and monitor commands

* better_commands

* Update python/ray/scripts/scripts.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2019-07-06 21:52:55 -07:00
Robert Nishihara 6703519144 Move global state API out of global_state object. (#4857) 2019-05-26 11:27:53 -07:00
Richard Liaw 88b45a53d6 [autoscaler] rsync cluster (#4785) 2019-05-16 23:11:06 -07:00
Richard Liaw ffe61fcc70 [tune] Support non-arg submit (#4803) 2019-05-16 23:10:07 -07:00
Richard Liaw 3bbafc7105 [autoscaler] Fix submit (#4782) 2019-05-14 19:52:28 -07:00
Qing Wang 62c949bbd5 Fix ray stop by killing raylet before plasma (#4778) 2019-05-13 14:53:10 +08:00
Daniel Edgecumbe 3e1adafbce [autoscaler] Add an aggressive_autoscaling flag (#4285) 2019-04-13 18:44:32 -07:00
Robert Nishihara 9c158c6a87 Start dashboard on all nodes and other small fixes. (#4428)
* Start reporter on all nodes.

* More fixes
2019-03-20 13:04:06 -07:00
Eric Liang 78ad9c4cbb Add "ray timeline" command to auto-dump Chrome trace for the current Ray instance (#4239) 2019-03-05 16:28:00 -08:00
Eric Liang 6e3384a719 [rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} (#4215) 2019-03-04 14:05:42 -08:00
Hao Chen 14ff402d70 Make ray stop command also kill Java workers (#4179) 2019-03-01 11:05:19 +08:00
Robert Nishihara d9bcaa20b5 Turn UI off by default. (#4188) 2019-02-28 17:29:52 -08:00
Richard Liaw 5bfcfa8ec8 [autoscaler] Fix Submit (#4174) 2019-02-27 00:02:50 -08:00
Eric Liang 60dbc771a2 Revert "[autoscaler] Fix redirects, fix submit (#4085)" (#4158)
This reverts commit acf4d53b55.
2019-02-25 17:00:59 -08:00
Robert Nishihara 688a0d17e6 Kill dashboard and reporter in ray stop. (#4116) 2019-02-23 12:08:39 -08:00
William Ma fedad488d8 Kills gdb processes with ray stop (#4046) 2019-02-21 11:28:26 -08:00
Richard Liaw acf4d53b55 [autoscaler] Fix redirects, fix submit (#4085) 2019-02-20 21:35:33 -08:00
Yuhong Guo 1f864a02bc Add option of load_code_from_local which is required in cross-language ray call. (#3675) 2019-02-21 12:37:17 +08:00
Eric Liang 6e46d75554 [tune] Remove slow gzip of checkpoints; ignore jupyter stop errors (#4076)
* fix gzip

* ignore jupyter
2019-02-18 01:30:13 -08:00
Robert Nishihara 5f71751891 API cleanups. Remove worker argument. Remove some deprecated arguments. (#4025)
* Remove worker argument from API methods.

* Remove deprecated arguments and deprecate redirect_output and redirect_worker_output.

* Fix
2019-02-15 10:49:16 -08:00
Kristian Hartikainen 729d0b2825 [autoscaler] docker run options (#3921)
Adds support for docker options, allowing for use of nvidia-docker.

Closes #2657.
2019-02-13 12:26:28 -08:00
Wang Qing e1c68a0881 Enable including Java worker for ray start command (#3838) 2019-02-04 16:23:43 +08:00
Kristian Hartikainen b9eed2e86c [autoscaler] Move attach helper text under exec_cluster (#3920)
## What do these changes do?
Moves the attach command helper from cli commands to the actual `exec_cluster` function.
2019-01-31 17:01:24 -08:00
Richard Liaw d128636bab Ray Logging Configuration (#3691)
* fix logging for autoscaler

* module logging

* try this for logging

* yapf

* fix

* Initial logging setup

* momery

* ok

* remove basicconfig

* catch

* remove package logging

* print

* fix

* try_fix

* fix 1

* revert rllib

* logging level

* flake8

* fix

* fix

* Remove vestigal TODO
2019-01-30 21:01:12 -08:00
Robert Nishihara 8723d6b061 Define a Node class to manage Ray processes. (#3733)
* Implement Node class and move most of services.py into it.

* Wait for nodes as they are added to the cluster.

* Fix Redis authentication bug.

* Fix bug in client table ordering.

* Address comments.

* Kill raylet before plasma store in test.

* Minor
2019-01-11 22:30:38 -08:00
Stephanie Wang cc5ecd71c5 [autoscaler] Add kill and get IP commands to CLI for testing (#3731)
## What do these changes do?

Adds 2 commands to the CLI that take in an autoscaler config:
1. Kill a random ray node in the cluster.
2. Get all the worker node IP addresses.

These commands are both for testing and are not recommended for normal use.

## Related issue number
Closes #3685.
2019-01-10 22:06:57 -08:00
Robert Nishihara c9d70f0dda Remove num_local_schedulers argument from ray.worker._init. (#3704)
* Remove num_local_schedulers argument from ray.worker._init.

* Fix

* Fix tests.
2019-01-07 12:44:49 -08:00
Robert Nishihara 586a5c9ffa Limit default redis max memory to 10GB. (#3630)
* Limit Redis max memory to 10GB/shard by default.

* Update stress tests.

* Reorganize

* Update

* Add minimum cap size for object store and redis.

* Small test update.
2019-01-03 13:23:54 -08:00
Robert Nishihara b6bcd18d65 Split profile table among many keys in the GCS. (#3676)
* Divide profile table among many keys in GCS.

* Fix, and remove --collect-profiling-data arg.

* Remove reference in doc.
2019-01-02 21:33:01 -08:00
Yuhong Guo c9b8ecca51 Add RayParams to refactor the parameters used by ray python. (#3558) 2018-12-29 22:04:27 +08:00