Philipp Moritz
f24d96ec4f
Revert "Try to enable dashboard (again) ( #6069 )" ( #6159 )
...
This reverts commit 4044af8520 .
2019-11-13 12:32:12 -08:00
Stephanie Wang
35d177f459
Use grpc for communication from worker to local raylet (task submission and direct actor args only) ( #6118 )
...
* Skeleton for SubmitTask proto
* Pass through node manager port, connect in raylet client
* Switch submit task to grpc
* Check port in use
* doc
* Remove default port, set port randomly from driver
* update
* Fix test
* Fix object manager test
2019-11-11 21:17:25 -08:00
Eric Liang
4044af8520
Try to enable dashboard (again) ( #6069 )
...
* Revert "Revert "Enable the Ray dashboard by default (#5976 )" (#6068 )"
This reverts commit 1a3e97cf23 .
* fix tests that assume the dashboard isn't a job
* travis
2019-11-08 10:48:48 -08:00
Eric Liang
1a3e97cf23
Revert "Enable the Ray dashboard by default ( #5976 )" ( #6068 )
...
This reverts commit 6166ef3e09 .
2019-11-01 17:08:37 -07:00
Eric Liang
6166ef3e09
Enable the Ray dashboard by default ( #5976 )
2019-11-01 12:19:01 -07:00
Edward Oakes
f8a6ed7832
Spawn processes in background sessions ( #6008 )
...
Allows us to properly handle KeyboardInterrupts in interactive python interpreters.
2019-10-25 13:01:35 -07:00
Mitchell Stern
235dec8aa3
[Dashboard] Remove token authentication from dashboard ( #5888 )
2019-10-21 12:48:48 -07:00
Philipp Moritz
d23696de17
Introduce flag to use pickle for serialization ( #5805 )
2019-10-18 22:29:36 -07:00
Edward Oakes
62bc30c1cf
Validate redis address parameters ( #5746 )
...
* Validate redis address params
* Fix comment
* Add check
2019-09-23 10:52:34 -05:00
Mitchell Stern
98dcc1d440
[Dashboard] Add initial version of new dashboard ( #5730 )
2019-09-23 08:50:40 -07:00
Edward Oakes
ee5db5b67f
Raise error if space in redis password ( #5673 )
2019-09-11 20:58:39 -07:00
Kai Yang
732336fc4f
[Java] Support multiple workers in Java worker process ( #5505 )
2019-09-07 22:52:05 +08:00
Eric Liang
d20696300e
Fix autoscaler format string for memory ( #5542 )
...
* add format string
* fix cast
2019-08-26 23:25:11 -07:00
Eric Liang
e2e30ca507
Ray, Tune, and RLlib support for memory, object_store_memory options ( #5226 )
2019-08-21 23:01:10 -07:00
Eric Liang
df47bdf6c9
Allow address instead of redis_address ( #5412 )
...
* addr
* wip
* fix typo
* add to start
* switch to ray address for train
* say address
* disambiguate help
* comments 2
2019-08-10 00:18:41 -07:00
Eric Liang
955154a19d
Reduce Ray / RLlib startup messages ( #5368 )
2019-08-05 13:23:54 -07:00
Qing Wang
f2293243cc
[ID Refactor] Shorten the length of JobID to 4 bytes ( #5110 )
...
* WIP
* Fix
* Add jobid test
* Fix
* Add python part
* Fix
* Fix tes
* Remove TODOs
* Fix C++ tests
* Lint
* Fix
* Fix exporting functions in multiple ray.init
* Fix java test
* Fix lint
* Fix linting
* Address comments.
* FIx
* Address and fix linting
* Refine and fix
* Fix
* address
* Address comments.
* Fix linting
* Fix
* Address
* Address comments.
* Address
* Address
* Fix
* Fix
* Fix
* Fix lint
* Fix
* Fix linting
* Address comments.
* Fix linting
* Address comments.
* Fix linting
* address comments.
* Fix
2019-07-11 14:25:16 +08:00
Eric Liang
5aec750107
Add warning/error if object store memory exceeds available memory ( #4893 )
...
* exclude
* format
* add warning
* hatch
* reduce mem usage
* reduce object store mem
* set obj mem
2019-07-08 21:37:08 -07:00
Qing Wang
e33d0eac68
Add dynamic worker options for worker command. ( #4970 )
...
* Add fields for fbs
* WIP
* Fix complition errors
* Add java part
* FIx
* Fix
* Fix
* Fix lint
* Refine API
* address comments and add test
* Fix
* Address comment.
* Address comments.
* Fix linting
* Refine
* Fix lint
* WIP: address comment.
* Fix java
* Fix py
* Refin
* Fix
* Fix
* Fix linting
* Fix lint
* Address comments
* WIP
* Fix
* Fix
* minor refine
* Fix lint
* Fix raylet test.
* Fix lint
* Update src/ray/raylet/worker_pool.h
Co-Authored-By: Hao Chen <chenh1024@gmail.com >
* Update java/runtime/src/main/java/org/ray/runtime/AbstractRayRuntime.java
Co-Authored-By: Hao Chen <chenh1024@gmail.com >
* Address comments.
* Address comments.
* Fix test.
* Update src/ray/raylet/worker_pool.h
Co-Authored-By: Hao Chen <chenh1024@gmail.com >
* Address comments.
* Address comments.
* Fix
* Fix lint
* Fix lint
* Fix
* Address comments.
* Fix linting
2019-06-23 18:08:33 +08:00
Philipp Moritz
1e2b649580
Use proper session directory for debug_string.txt ( #4960 )
2019-06-10 23:46:37 -07:00
Si-Yuan
4e0be8b450
Drop duplicated string format ( #4897 )
...
This string format is unnecessary. java_worker_options has been appended to the commandline later.
2019-05-30 19:43:27 +08:00
Robert Nishihara
6703519144
Move global state API out of global_state object. ( #4857 )
2019-05-26 11:27:53 -07:00
Qing Wang
259cdfa0de
Fix issue when starting raylet_monitor ( #4829 )
2019-05-22 11:08:24 +08:00
Qing Wang
dcd6d4949c
Fix Java worker log dir ( #4781 )
2019-05-17 16:13:28 +08:00
Qing Wang
f39b6747e5
Refactor command line argument parsing with gflags ( #4676 )
2019-04-24 14:53:07 +08:00
Daniel Edgecumbe
3e1adafbce
[autoscaler] Add an aggressive_autoscaling flag ( #4285 )
2019-04-13 18:44:32 -07:00
Romil Bhardwaj
0f42f87ebc
Updating zero capacity resource semantics ( #4555 )
2019-04-12 16:53:57 -07:00
Si-Yuan
dab99d26af
Improve code related to node ( #4383 )
...
* Make full use of node
implement local node
fix bugs mentioned in comments
* Add more tests
* Use more specific exception handling
* fix, lint
* fix for py2.x
2019-04-09 17:27:54 +08:00
Yuhong Guo
c2349cf12d
Remove local/global_scheduler from code and doc. ( #4549 )
2019-04-03 17:05:09 -07:00
Robert Nishihara
8548f12eb2
Give better error when include_webui=1 and webui can't be started. ( #4471 )
2019-03-26 14:54:32 -07:00
Philipp Moritz
95254b3d71
Remove the old web UI ( #4301 )
2019-03-07 23:15:11 -08:00
Hao Chen
f0465bc68c
[Java] Refine tests and fix single-process mode ( #4265 )
2019-03-07 09:59:13 +08:00
Eric Liang
3896b726dd
Dynamically adjust redis memory usage ( #4152 )
...
* f
* Update services.py
2019-02-25 16:21:37 -08:00
Daniel Edgecumbe
2e30f7ba38
Add a web dashboard for monitoring node resource usage ( #4066 )
2019-02-21 00:10:04 -08:00
Yuhong Guo
1f864a02bc
Add option of load_code_from_local which is required in cross-language ray call. ( #3675 )
2019-02-21 12:37:17 +08:00
Wang Qing
7574757391
Fix crash for Java task's task.argument() in state. ( #4063 )
2019-02-19 12:46:07 +08:00
Si-Yuan
2de31eb489
minor fix ( #4040 )
2019-02-13 17:22:45 -08:00
Si-Yuan
21472b890a
Integrate "tempfile_service" into "ray.node.Node" ( #3953 )
2019-02-12 17:34:04 -08:00
Wang Qing
c523bc04ad
Enable redis password in Java worker ( #3943 )
...
* Support Java redis password
* Fix
* Refine
* Fix lint.
2019-02-12 13:11:25 +08:00
Robert Nishihara
ef527f84ab
Stream logs to driver by default. ( #3892 )
...
* Stream logs to driver by default.
* Fix from rebase
* Redirect raylet output independently of worker output.
* Fix.
* Create redis client with services.create_redis_client.
* Suppress Redis connection error at exit.
* Remove thread_safe_client from redis.
* Shutdown driver threads in ray.shutdown().
* Add warning for too many log messages.
* Only stop threads if worker is connected.
* Only stop threads if they exist.
* Remove unnecessary try/excepts.
* Fix
* Only add new logging handler once.
* Increase timeout.
* Fix tempfile test.
* Fix logging in cluster_utils.
* Revert "Increase timeout."
This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95.
* Retry longer when connecting to plasma store from node manager and object manager.
* Close pubsub channels to avoid leaking file descriptors.
* Limit log monitor open files to 200.
* Increase plasma connect retries.
* Add comment.
2019-02-07 19:53:50 -08:00
William Ma
f067223c4a
Allow Ray processes to be started inside of gdb and tmux. ( #3847 )
2019-02-04 15:23:39 -08:00
Wang Qing
e1c68a0881
Enable including Java worker for ray start command ( #3838 )
2019-02-04 16:23:43 +08:00
Si-Yuan
9295ab8f60
Various Python code cleanups. ( #3837 )
2019-02-03 10:16:24 -08:00
Richard Liaw
d128636bab
Ray Logging Configuration ( #3691 )
...
* fix logging for autoscaler
* module logging
* try this for logging
* yapf
* fix
* Initial logging setup
* momery
* ok
* remove basicconfig
* catch
* remove package logging
* print
* fix
* try_fix
* fix 1
* revert rllib
* logging level
* flake8
* fix
* fix
* Remove vestigal TODO
2019-01-30 21:01:12 -08:00
Robert Nishihara
0b1608a546
Factor out code for starting new processes and test plasma store in valgrind. ( #3824 )
...
* Factor out starting Ray processes.
* Detect flags through environment variables.
* Return ProcessInfo from start_ray_process.
* Print valgrind errors at exit.
* Test valgrind in travis.
* Some valgrind fixes.
* Undo raylet monitor change.
* Only test plasma store in valgrind.
2019-01-22 14:59:11 -08:00
Robert Nishihara
8723d6b061
Define a Node class to manage Ray processes. ( #3733 )
...
* Implement Node class and move most of services.py into it.
* Wait for nodes as they are added to the cluster.
* Fix Redis authentication bug.
* Fix bug in client table ordering.
* Address comments.
* Kill raylet before plasma store in test.
* Minor
2019-01-11 22:30:38 -08:00
Robert Nishihara
6bbc667f93
Remove unused code path in services.py. ( #3722 )
2019-01-08 19:57:16 -08:00
Robert Nishihara
c9d70f0dda
Remove num_local_schedulers argument from ray.worker._init. ( #3704 )
...
* Remove num_local_schedulers argument from ray.worker._init.
* Fix
* Fix tests.
2019-01-07 12:44:49 -08:00
Robert Nishihara
586a5c9ffa
Limit default redis max memory to 10GB. ( #3630 )
...
* Limit Redis max memory to 10GB/shard by default.
* Update stress tests.
* Reorganize
* Update
* Add minimum cap size for object store and redis.
* Small test update.
2019-01-03 13:23:54 -08:00
Robert Nishihara
b6bcd18d65
Split profile table among many keys in the GCS. ( #3676 )
...
* Divide profile table among many keys in GCS.
* Fix, and remove --collect-profiling-data arg.
* Remove reference in doc.
2019-01-02 21:33:01 -08:00