Commit Graph

11 Commits

Author SHA1 Message Date
Robert Nishihara ef527f84ab Stream logs to driver by default. (#3892)
* Stream logs to driver by default.

* Fix from rebase

* Redirect raylet output independently of worker output.

* Fix.

* Create redis client with services.create_redis_client.

* Suppress Redis connection error at exit.

* Remove thread_safe_client from redis.

* Shutdown driver threads in ray.shutdown().

* Add warning for too many log messages.

* Only stop threads if worker is connected.

* Only stop threads if they exist.

* Remove unnecessary try/excepts.

* Fix

* Only add new logging handler once.

* Increase timeout.

* Fix tempfile test.

* Fix logging in cluster_utils.

* Revert "Increase timeout."

This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95.

* Retry longer when connecting to plasma store from node manager and object manager.

* Close pubsub channels to avoid leaking file descriptors.

* Limit log monitor open files to 200.

* Increase plasma connect retries.

* Add comment.
2019-02-07 19:53:50 -08:00
Si-Yuan 48139cf861 Migrate Python C extension to Cython (#3541) 2019-01-24 09:17:14 -08:00
Yuhong Guo d2cf8561f2 Refactor code about ray.ObjectID. (#3674)
* Refactor code about ray.ObjectID.

* remove from_random and use nil_id instead of constructor

* remove id() in hash

* Lint and fix

* Change driver id to ObjectID

* Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()
2019-01-13 01:47:29 -08:00
Yuhong Guo fb33fa9097 Enable function_descriptor in backend to replace the function_id (#3028) 2018-12-18 18:53:59 -05:00
Robert Nishihara 82863b5251 [autoscaler] Update autoscaler to use heartbeat batches. (#3409) 2018-11-27 23:46:27 -08:00
Robert Nishihara 658c14282c Remove legacy Ray code. (#3121)
* Remove legacy Ray code.

* Fix cmake and simplify monitor.

* Fix linting

* Updates

* Fix

* Implement some methods.

* Remove more plasma manager references.

* Fix

* Linting

* Fix

* Fix

* Make sure class IDs are strings.

* Some path fixes

* Fix

* Path fixes and update arrow

* Fixes.

* linting

* Fixes

* Java fixes

* Some java fixes

* TaskLanguage -> Language

* Minor

* Fix python test and remove unused method signature.

* Fix java tests

* Fix jenkins tests

* Remove commented out code.
2018-10-26 13:36:58 -07:00
Peter Schafhalter 5da6e78db1 Add available resources to global state (#2501) 2018-09-10 15:46:32 -07:00
Robert Nishihara bd64c940e9 Push error to driver when monitor raises an exception. (#2834) 2018-09-07 17:42:45 -07:00
Melih Elibol 8ae82180b4 [xray] Adds a driver table. (#2289)
This PR adds a driver table for the new GCS, which enables cleanup functionality associated with monitoring driver death.

Some testing in `monitor_test.py` is restored, but redis sharding for xray is needed to enable remaining tests.
2018-08-08 23:41:40 -07:00
Robert Nishihara b90e551b41 [xray] Implement timeline and profiling API. (#2306)
* Add profile table and store profiling information there.

* Code for dumping timeline.

* Improve color scheme.

* Push timeline events on driver only for raylet.

* Improvements to profiling and timeline visualization

* Some linting

* Small fix.

* Linting

* Propagate node IP address through profiling events.

* Fix test.

* object_id.hex() should return byte string in python 2.

* Include gcs.fbs in node_manager.fbs.

* Remove flatbuffer definition duplication.

* Decode to unicode in Python 3 and bytes in Python 2.

* Minor

* Submit profile events in a batch. Revert some CMake changes.

* Fix

* Workaround test failure.

* Fix linting

* Linting

* Don't return anything from chrome_tracing_dump when filename is provided.

* Remove some redundancy from profile table.

* Linting

* Move TODOs out of docstring.

* Minor
2018-07-04 23:23:48 -07:00
Robert Nishihara ff2217251f [xray] Add error table and push error messages to driver through node manager. (#2256)
* Fix documentation indentation.

* Add error table to GCS and push error messages through node manager.

* Add type to error data.

* Linting

* Fix failure_test bug.

* Linting.

* Enable one more test.

* Attempt to fix doc building.

* Restructuring

* Fixes

* More fixes.

* Move current_time_ms function into util.h.
2018-06-20 21:29:28 -07:00