* Implement Node class and move most of services.py into it.
* Wait for nodes as they are added to the cluster.
* Fix Redis authentication bug.
* Fix bug in client table ordering.
* Address comments.
* Kill raylet before plasma store in test.
* Minor
## What do these changes do?
Adds 2 commands to the CLI that take in an autoscaler config:
1. Kill a random ray node in the cluster.
2. Get all the worker node IP addresses.
These commands are both for testing and are not recommended for normal use.
## Related issue number
Closes#3685.
* Limit Redis max memory to 10GB/shard by default.
* Update stress tests.
* Reorganize
* Update
* Add minimum cap size for object store and redis.
* Small test update.
## What do these changes do?
This PR exposes the CL option for using a config parameter. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly.
Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible.
#3239 depends on this.
TODO:
- [x] Add documentation to method arguments before merging.
- [x] Add test to verify this works?
## Related issue number
This commit fix some small defects.
1. Remove a comment that should have been removed in #3003
2. Remove `redis_protected_mode` that is never used in `ray.init()`
3. Fix `object_id_seed` that is forgotten to be passed into `ray._init()`
4. Remove several redundant brackets.
Adds a tmux flag that can be used to support background execution of experiments. Cannot be used together with screen. Seems to be useful feature that has shown up with different users.
Before this change, the autoscaler `up` and related commands don't print any info messages to the console at all. This was a regression from 0.5. @richardliaw @robertnishihara https://github.com/ray-project/ray/issues/2812
## What do these changes do?
1. Separate the log related code to logger.py from services.py.
2. Allow users to modify logging formatter in `ray start`.
## Related issue number
https://github.com/ray-project/ray/pull/2664
This PR makes it so that when Ray is started via ray.init() (as opposed to via ray start) the Redis servers will be started in "protected mode" (which means that clients can only connect by connecting to localhost).
In practice, we actually connect redis clients by passing in the node IP address (not localhost), so I need to create a redis config file on the fly to allow both localhost and the node's actual IP address (it would have been nice to find a way to do this from the Python redis client, but I couldn't find one).
This adds some experimental (undocumented) support for launching Ray on existing nodes. You have to provide the head ip, and the list of worker ips.
There are also a couple additional utils added for rsyncing files and port-forward.
ray exec CLUSTER CMD [--screen] [--start] [--stop]
ray attach CLUSTER [--start]
Example:
ray exec sgd.yaml 'source activate tensorflow_p27 && cd ~/ray/python/ray/rllib && ./train.py --run=PPO --env=CartPole-v0' --screen --start --stop
This will in one command create a cluster and run the command on it in a screen session. The screen can later be attached to via ray attach. After the command finishes, the cluster workers will be terminated and the head node stopped.
* Add profile table and store profiling information there.
* Code for dumping timeline.
* Improve color scheme.
* Push timeline events on driver only for raylet.
* Improvements to profiling and timeline visualization
* Some linting
* Small fix.
* Linting
* Propagate node IP address through profiling events.
* Fix test.
* object_id.hex() should return byte string in python 2.
* Include gcs.fbs in node_manager.fbs.
* Remove flatbuffer definition duplication.
* Decode to unicode in Python 3 and bytes in Python 2.
* Minor
* Submit profile events in a batch. Revert some CMake changes.
* Fix
* Workaround test failure.
* Fix linting
* Linting
* Don't return anything from chrome_tracing_dump when filename is provided.
* Remove some redundancy from profile table.
* Linting
* Move TODOs out of docstring.
* Minor
* Run xray tests in travis.
* Comment out TaskTests.testSubmittingManyTasks.
* Comment out failing tests.
* Comment out hanging test.
* Linting
* Comment out failing test.
* Comment out failing test.
* Ignore test_dataframe.py for now.
* Comment out testDriverExitingQuickly.
* separate task placement and task dispatch; throttle task dispatch with locally available resournces
* keep track of worker's being started/in flight and suppress starting extraneous workers
* cleanup comments
* remove early termination in task dispatch to support zero-resource actor tasks
* info -> debug
* add documentation
* linting
* mock the worker pool for testing
* some linting
* kill all workers in flight; clear the worker pool in dtor
* remove fixed todo
* lint
* Integrate worker with raylet.
* Begin allowing worker to attach to cluster.
* Fix linting and documentation.
* Fix linting.
* Comment tests back in.
* Fix type of worker command.
* Remove xray python files and tests.
* Fix from rebase.
* Add test.
* Copy over raylet executable.
* Small cleanup.