wassname/ray - ray - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/ray.git synced 2026-06-28 13:19:38 +08:00

Author	SHA1	Message	Date
Eric Liang	c46ea2ff4b	Click 0.7 changes the naming convention for commands; fix this	2018-11-28 14:59:58 -08:00
Eric Liang	0d56fc10cc	Move setproctitle to ray[debug] package (#3415 )	2018-11-27 09:50:59 -08:00
Richard Liaw	c24d87b4d1	[autoscaler] Submit command (#3312 )	2018-11-20 14:03:34 -08:00
Philipp Moritz	8894883153	Force kill web UI in ray stop (#3257 )	2018-11-08 00:05:32 -08:00
Richard Liaw	0bab8ed95c	Expose internal config parameters for starting Ray (#3246 ) ## What do these changes do? This PR exposes the CL option for using a config parameter. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly. Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible. #3239 depends on this. TODO: - [x] Add documentation to method arguments before merging. - [x] Add test to verify this works? ## Related issue number	2018-11-07 21:46:02 -08:00
Eric Liang	725df3a485	Set the process title in workers and actors (#3219 )	2018-11-06 14:59:22 -08:00
Eric Liang	9a0f0db070	Add `ray stack` tool for debugging (#3213 )	2018-11-03 13:13:02 -07:00
Robert Nishihara	e495ab5e7c	Fix some paths /tmp/raylogs -> /tmp/ray. (#3189 )	2018-11-02 12:10:53 -07:00
Robert Nishihara	fd854ff090	Allow the node manager port and object manager port to be set through… (#3130 ) * Allow the node manager port and object manager port to be set through ray start. * Linting * Fix Java test * Address comments.	2018-10-28 17:28:41 -07:00
Robert Nishihara	658c14282c	Remove legacy Ray code. (#3121 ) * Remove legacy Ray code. * Fix cmake and simplify monitor. * Fix linting * Updates * Fix * Implement some methods. * Remove more plasma manager references. * Fix * Linting * Fix * Fix * Make sure class IDs are strings. * Some path fixes * Fix * Path fixes and update arrow * Fixes. * linting * Fixes * Java fixes * Some java fixes * TaskLanguage -> Language * Minor * Fix python test and remove unused method signature. * Fix java tests * Fix jenkins tests * Remove commented out code.	2018-10-26 13:36:58 -07:00
Robert Nishihara	5aa29613db	Fix linting errors. (#3127 )	2018-10-24 16:30:00 -07:00
Robert Nishihara	9c1826ed69	Use XRay backend by default. (#3020 ) * Use XRay backend by default. * Remove irrelevant valgrind tests. * Fix * Move tests around. * Fix * Fix test * Fix test. * String/unicode fix. * Fix test * Fix unicode issue. * Minor changes * Fix bug in test_global_state.py. * Fix test. * Linting * Try arrow change and other object manager changes. * Use newer plasma client API * Small updates * Revert plasma client api change. * Update * Update arrow and allow SendObjectHeaders to fail. * Update arrow * Update python/ray/experimental/state.py Co-Authored-By: robertnishihara <robertnishihara@gmail.com> * Address comments.	2018-10-23 12:46:39 -07:00
Peter Schafhalter	b82fd157a7	Remove Redis protected mode (#3073 ) Follow-up to #2925 and #2952. Removes the Redis protected mode implementation from Ray which was replaced by Redis port authentication.	2018-10-17 22:48:14 -07:00
Peter Schafhalter	a41bbc10ef	Add password authentication to Redis ports (#2952 ) * Implement Redis authentication * Throw exception for legacy Ray * Add test * Formatting * Fix bugs in CLI * Fix bugs in Raylet * Move default password to constants.h * Use pytest.fixture * Fix bug * Authenticate using formatted strings * Add missing passwords * Add test * Improve authentication of async contexts * Disable Redis authentication for credis * Update test for credis * Fix rebase artifacts * Fix formatting * Add workaround for issue #3045 * Increase timeout for test * Improve C++ readability * Fixes for CLI * Add security docs * Address comments * Address comments * Adress comments * Use ray.get * Fix lint	2018-10-16 22:48:30 -07:00
Si-Yuan	f2dbd3096c	Minor improvements and fixes in Python code. (#3022 ) This commit fix some small defects. 1. Remove a comment that should have been removed in #3003 2. Remove `redis_protected_mode` that is never used in `ray.init()` 3. Fix `object_id_seed` that is forgotten to be passed into `ray._init()` 4. Remove several redundant brackets.	2018-10-03 21:08:20 -07:00
Si-Yuan	cc7e2ecdd5	Change logfile names and also allow plasma store socket to be passed in. (#2862 )	2018-10-03 10:03:53 -07:00
Eric Liang	cf9cd5da9d	[ray] Add --new flag for ray attach (#2973 ) * new flag * yapf	2018-09-29 23:04:13 -07:00
Richard Liaw	1c9617bc1c	[autoscaler] Add tmux support for attach and exec (#2907 ) Adds a tmux flag that can be used to support background execution of experiments. Cannot be used together with screen. Seems to be useful feature that has shown up with different users.	2018-09-26 23:22:45 -07:00
Eric Liang	588c573d41	Ray stop needs to kill `plasma_store_server` not `plasma_store` (#2850 )	2018-09-09 19:23:09 -07:00
Eric Liang	e7db54bdb0	Log at INFO level by default (including in autoscaler). (#2824 ) Before this change, the autoscaler `up` and related commands don't print any info messages to the console at all. This was a regression from 0.5. @richardliaw @robertnishihara https://github.com/ray-project/ray/issues/2812	2018-09-06 13:31:19 -07:00
Mitar	3850e3ba64	Added extra logging related arguments to "ray start" (#2664 )	2018-08-28 23:00:37 -07:00
Yuhong Guo	0b6e08ebee	Separate python logger module-wise (#2703 ) ## What do these changes do? 1. Separate the log related code to logger.py from services.py. 2. Allow users to modify logging formatter in `ray start`. ## Related issue number https://github.com/ray-project/ray/pull/2664	2018-08-26 13:46:14 -07:00
Eric Liang	aa014af85b	[rllib] Fix atari reward calculations, add LR annealing, explained var stat for A2C / impala (#2700 ) Changes needed to reproduce Atari plots in IMPALA / A2C: https://github.com/ray-project/rl-experiments	2018-08-23 17:49:10 -07:00
Robert Nishihara	89d4a6df93	Start Redis in protected mode when started via ray.init(). (#2697 ) This PR makes it so that when Ray is started via ray.init() (as opposed to via ray start) the Redis servers will be started in "protected mode" (which means that clients can only connect by connecting to localhost). In practice, we actually connect redis clients by passing in the node IP address (not localhost), so I need to create a redis config file on the fly to allow both localhost and the node's actual IP address (it would have been nice to find a way to do this from the Python redis client, but I couldn't find one).	2018-08-20 14:08:01 -07:00
Eric Liang	9473da69bd	[autoscaler] Experimental support for local / on-prem clusters (#2678 ) This adds some experimental (undocumented) support for launching Ray on existing nodes. You have to provide the head ip, and the list of worker ips. There are also a couple additional utils added for rsyncing files and port-forward.	2018-08-19 12:43:04 -07:00
Eric Liang	079c4e482a	ray exec and ray attach commands (#2560 ) ray exec CLUSTER CMD [--screen] [--start] [--stop] ray attach CLUSTER [--start] Example: ray exec sgd.yaml 'source activate tensorflow_p27 && cd ~/ray/python/ray/rllib && ./train.py --run=PPO --env=CartPole-v0' --screen --start --stop This will in one command create a cluster and run the command on it in a screen session. The screen can later be attached to via ray attach. After the command finishes, the cluster workers will be terminated and the head node stopped.	2018-08-15 14:31:50 -07:00
Robert Nishihara	515da7721a	Change ray.worker.cleanup -> ray.shutdown and improve API documentation. (#2374 ) * Change ray.worker.cleanup -> ray.shutdown and improve API documentation. * Deprecate ray.worker.cleanup() gracefully. * Fix linting	2018-07-12 12:00:00 -07:00
Robert Nishihara	b90e551b41	[xray] Implement timeline and profiling API. (#2306 ) * Add profile table and store profiling information there. * Code for dumping timeline. * Improve color scheme. * Push timeline events on driver only for raylet. * Improvements to profiling and timeline visualization * Some linting * Small fix. * Linting * Propagate node IP address through profiling events. * Fix test. * object_id.hex() should return byte string in python 2. * Include gcs.fbs in node_manager.fbs. * Remove flatbuffer definition duplication. * Decode to unicode in Python 3 and bytes in Python 2. * Minor * Submit profile events in a batch. Revert some CMake changes. * Fix * Workaround test failure. * Fix linting * Linting * Don't return anything from chrome_tracing_dump when filename is provided. * Remove some redundancy from profile table. * Linting * Move TODOs out of docstring. * Minor	2018-07-04 23:23:48 -07:00
Robert Nishihara	52b0f3734a	[xray] Add Travis build for testing xray on Linux. (#2047 ) * Run xray tests in travis. * Comment out TaskTests.testSubmittingManyTasks. * Comment out failing tests. * Comment out hanging test. * Linting * Comment out failing test. * Comment out failing test. * Ignore test_dataframe.py for now. * Comment out testDriverExitingQuickly.	2018-05-13 21:22:01 -07:00
Alexey Tumanov	1c965fcfeb	Raylet task dispatch and throttling worker startup (#1912 ) * separate task placement and task dispatch; throttle task dispatch with locally available resournces * keep track of worker's being started/in flight and suppress starting extraneous workers * cleanup comments * remove early termination in task dispatch to support zero-resource actor tasks * info -> debug * add documentation * linting * mock the worker pool for testing * some linting * kill all workers in flight; clear the worker pool in dtor * remove fixed todo * lint	2018-04-18 10:58:11 -07:00
Philipp Moritz	74162d1492	Lint Python files with Yapf (#1872 )	2018-04-11 10:11:35 -07:00
Robert Nishihara	fbfbb1c079	[xray] Integrate worker.py with raylet. (#1810 ) * Integrate worker with raylet. * Begin allowing worker to attach to cluster. * Fix linting and documentation. * Fix linting. * Comment tests back in. * Fix type of worker command. * Remove xray python files and tests. * Fix from rebase. * Add test. * Copy over raylet executable. * Small cleanup.	2018-04-03 02:38:56 -07:00
Robert Nishihara	4bccabd910	Redirect output of all processes by default. (#1752 ) * Redirect output of all processes by default. * Add separate flag for redirecting worker output. * Fix tests.	2018-03-20 18:14:54 -07:00
Robert Nishihara	330159d8bd	Allow setting redis shard ports through ray start (also object store memory). (#1581 ) * Allow passing in --object-store-memory to ray start. * Allow setting ports for the redis shards. * Reorder arguments and infer number of shards from ports. * Move code block into only the head node case. * Add test.	2018-02-22 11:05:37 -08:00
Richard Liaw	e62ad7007d	[autoscaler] Improve UX for Autoscaler (#1558 )	2018-02-21 22:19:04 -08:00
Richard Liaw	73be235701	Quick Fix for Killing Ray Notebooks (#1563 )	2018-02-19 16:10:37 -08:00
Eric Liang	b6c42f96be	Auto-scale ray clusters based on GCS load metrics (#1348 ) This adds (experimental) auto-scaling support for Ray clusters based on GCS load metrics. The auto-scaling algorithm is as follows: Based on current (instantaneous) load information, we compute the approximate number of "used workers". This is based on the bottleneck resource, e.g. if 8/8 GPUs are used in a 8-node cluster but all the CPUs are idle, the number of used nodes is still counted as 8. This number can also be fractional. We scale that number by 1 / target_utilization_fraction and round up to determine the target cluster size (subject to the max_workers constraint). The autoscaler control loop takes care of launching new nodes until the target cluster size is met. When a node is idle for more than idle_timeout_minutes, we remove it from the cluster if that would not drop the cluster size below min_workers. Note that we'll need to update the wheel in the example yaml file after this PR is merged.	2017-12-31 14:39:57 -08:00
Eric Liang	f5ea44338e	EC2 cluster setup scripts and initial version of auto-scaler (#1311 )	2017-12-15 23:56:39 -08:00
Robert Nishihara	c21e189371	Allow scheduling with arbitrary user-defined resource labels. (#1236 ) * Enable scheduling with custom resource labels. * Fix. * Minor fixes and ref counting fix. * Linting * Use .data() instead of .c_str(). * Fix linting. * Fix ResourcesTest.testGPUIDs test by waiting for workers to start up. * Sleep in test so that all tasks are submitted before any completes.	2017-12-01 11:41:40 -08:00
Robert Nishihara	c1496b8111	Check version info in ray start for non-head nodes. (#1264 ) * Check version info in ray start for non-head nodes. * Small fix. * Fix * Push error to all drivers when worker has version mismatch. * Linting * Linting * Fix * Unify methods. * Fix bug.	2017-11-27 22:03:38 -08:00
Robert Nishihara	0b4961b161	Provide flag for setting redis maxclients. (#1257 ) * Add flag for attempting to increase ulimit -n and the redis maxclients. * Don't bother trying to set ulimit -n. * Fix linting. * Add basic test.	2017-11-26 18:25:55 -08:00
Robert Nishihara	11f8f8bd8c	Document --num-workers better. (#1201 )	2017-11-09 17:02:18 -08:00
Robert Nishihara	3317d38278	Replace hostnames with numerical IP addresses in redis address. (#1177 ) * Replace hostnames with numerical IP addresses in redis address. * Also do conversion for node_ip_address. Add test. * Simplifications.	2017-11-01 17:13:22 -07:00
Alexey Tumanov	2d0f439b7b	hugepage + plasma directory support plumbing + documentation (#1030 ) * hugepage + plasma directory support plumbing + documentation * Indentation fix. * huge_pages_enabled --> huge_pages * One more change	2017-09-30 09:56:52 -07:00
Robert Nishihara	b991dc8900	Add flag for ignoring the UI, don't start UI in jenkins tests. (#1021 )	2017-09-29 15:22:51 -07:00
Alexey Tumanov	fc885bd918	Adding basic support for a user-interpretable resource label (#761 ) * adding support for the user-interpretable label(UIR) * more plumbing for num_uirs further upstream; set to infty when specified on cmd line * pass default num_uirs for actors; update GlobalStateAPI * support num_uirs in ray.init() * local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting * global scheduler test updated * Fix bug introduced by rebase. * Rename UIR -> CustomResource and add test. * Small changes and use constexpr instead of macros. * Linting and some renaming. * Reorder some code. * Remove cpus_in_use and fix bug. * Add another test and make a small change. * Rephrase documentation about feature stability.	2017-08-08 02:53:59 -07:00
Robert Nishihara	e0867c8845	Switch Python indentation from 2 spaces to 4 spaces. (#726 ) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes.	2017-07-13 21:53:57 +00:00
Robert Nishihara	2d636d9278	Kill jupyter in ray stop. (#689 ) * Kill jupyter in ray stop. * Terminate jupyter notebook in ray stop. * Fix linting.	2017-06-21 05:58:34 +00:00
Robert Nishihara	1a682e2807	Enable starting and stopping ray with "ray start" and "ray stop". (#628 ) * Install start_ray and stop_ray scripts in setup.py. * Update documentation. * Fix docker tests. * Implement stop_ray script in python. * Fix linting.	2017-06-02 20:17:48 +00:00

49 Commits