wassname/ray - ray - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/ray.git synced 2026-06-28 08:40:02 +08:00

Author	SHA1	Message	Date
Robert Nishihara	7792032ee3	Fix UI issue for non-json-serializable task arguments. (#1892 ) * Fix UI issue for non-json-serializable task arguments. * Simplify approach.	2018-04-15 13:54:42 -07:00
Philipp Moritz	74162d1492	Lint Python files with Yapf (#1872 )	2018-04-11 10:11:35 -07:00
Robert Nishihara	7c9e291b4b	In the UI, display task breakdowns by default. (#1857 )	2018-04-09 13:24:38 -07:00
Robert Nishihara	5bde5e75e7	Implement unsafe method for flushing entire object table and task table. (#1824 ) * Implement unsafe method for flushing entire object table and task table. * Add test. * Fix test.	2018-04-04 18:29:24 -07:00
Robert Nishihara	8d52fe931b	Add experimental feature for flushing event logs and logfiles. (#1659 ) * Add experimental feature for flushing event logs and logfiles. * Add documentation.	2018-03-27 11:57:52 -07:00
Robert Nishihara	2922e1c388	Add API for getting total cluster resources. (#1736 ) * Add API for getting total cluster resources. * Add test.	2018-03-20 15:57:00 -07:00
Robert Nishihara	96913be939	Treat actor creation like a regular task. (#1668 ) * Treat actor creation like a regular task. * Small cleanups. * Change semantics of actor resource handling. * Bug fix. * Minor linting * Bug fix * Fix jenkins test. * Fix actor tests * Some cleanups * Bug fix * Fix bug. * Remove cached actor tasks when a driver is removed. * Add more info to taskspec in global state API. * Fix cyclic import bug in tune. * Fix * Fix linting. * Fix linting. * Don't schedule any tasks (especially actor creaiton tasks) on local schedulers with 0 CPUs. * Bug fix. * Add test for 0 CPU case * Fix linting * Address comments. * Fix typos and add comment. * Add assertion and fix test.	2018-03-16 11:18:07 -07:00
William Paul	f2b6a7b58d	Polished TensorFlowVariables code and documentation (#566 )	2018-02-12 15:38:58 -08:00
Alexey Tumanov	f1303291b4	Ray scheduler spillback plumbing + mechanism (#1362 ) * spillback mechanism and plumbing : adding spillback counter + timestamp * linting fix * documentation * Fix argument name.	2018-01-23 20:18:12 -08:00
Eric Liang	a2b190e65b	Fix occasional task timeline failure to get task ids (#1442 )	2018-01-21 12:04:44 -08:00
Melih Elibol	24b93b1123	fixes default type for product of empty shape. (#1341 )	2017-12-18 17:41:44 -08:00
Stephanie Wang	12fdb3f53a	Convert actor dummy objects to task execution edges. (#1281 ) * Define execution dependencies flatbuffer and add to Redis commands * Convert TaskSpec to TaskExecutionSpec * Add execution dependencies to Python bindings * Submitting actor tasks uses execution dependency API instead of dummy argument * Fix dependency getters and some cleanup for fetching missing dependencies * C++ convention * Make TaskExecutionSpec a C++ class * Convert local scheduler to use TaskExecutionSpec class * Convert some pointers to references * Finish conversion to TaskExecutionSpec class * fix * Fix * Fix memory errors? * Cast flatbuffers GetSize to size_t * Fixes * add more retries in global scheduler unit test * fix linting and cast fbb.GetSize to size_t * Style and doc * Fix linting and simplify from_flatbuf.	2017-12-14 20:47:54 -08:00
Robert Nishihara	c21e189371	Allow scheduling with arbitrary user-defined resource labels. (#1236 ) * Enable scheduling with custom resource labels. * Fix. * Minor fixes and ref counting fix. * Linting * Use .data() instead of .c_str(). * Fix linting. * Fix ResourcesTest.testGPUIDs test by waiting for workers to start up. * Sleep in test so that all tasks are submitted before any completes.	2017-12-01 11:41:40 -08:00
Robert Nishihara	2865128df0	Remove counter from run_function_on_all_workers. Also remove utilitie… (#1260 ) * Remove counter from run_function_on_all_workers. Also remove utilities for copying directories across machines. * Fix linting.	2017-11-26 18:29:10 -08:00
Philipp Moritz	e798a652bc	Change TaskSpec to allow multiple object IDs per argument. (#1204 ) * Implement object ID bags * linting * fix tests * fix linting * fix comments	2017-11-10 16:33:34 -08:00
Stephanie Wang	07f0532b9b	Local scheduler filters out dead clients during reconstruction (#1182 ) * Object table lookup returns vector of DBClientID instead of address strings * Add node IP address to DBClient notification * DB client cache stores entire DB client, convert addresses to std::string * get cached db client returns the client * Expose a call to initialize the redis cache * Local scheduler filters out dead clients during reconstruction * Remove node ip address from dbclient, use aux_address for plasma managers * Get entire db client entry when not found in cache * Fix common tests * Fix address in tests * Push error to driver if driver task did the put * Address Robert's comments and cleanup * Remove unused Redis command * Fix db test	2017-11-10 11:29:24 -08:00
Zongheng Yang	5a50e80b63	Make Monitor remove dead Redis entries from exiting drivers. (#994 ) * WIP: removing OL, OI, TT on client exit; no saving yet. * ray_redis_module.cc: update header comment. * Cleanup: just the removal. * Reformat via yapf: use pep8 style instead of google. * Checkpoint addressing comments (partially) * Add 'b' marker before strings (py3 compat) * Add MonitorTest. * Use `isort` to sort imports. * Remove some loggings * Fix flake8 noqa marker runtest.py * Try to separate tests out to monitor_test.py * Rework cleanup algorithm: correct logic * Extend tests to cover multi-shard cases * Add some small comments and formatting changes.	2017-09-26 00:11:38 -07:00
Eric Liang	d8aa826e63	[webui] Scalability fixes for the task timeline and visualizations (#935 ) * fixes * comments * fix test * Update ui.py * upd * Fix linting.	2017-09-10 15:47:44 -07:00
Robert Nishihara	f3c1248d98	Clone catapult and generate html files during installation. (#956 ) * Clone catapult and generate static html during setup. * Include UI files in installation. * Fix directory to clone catapult to and fix linting. * Use absolute path. * Make sure we find a sufficiently new version of python2 when building wheels. * Copy the trace_viewer_full.html file to the local directory if it is not present. * Make sure wheels fail to build if UI is not included.	2017-09-10 13:41:16 -07:00
Eric Liang	953878364e	[webui] Print out timeline link for full-screen trace viewing (#936 ) * up * update	2017-09-06 01:41:21 -07:00
Eric Liang	a2814567e1	[webui] Quick fix to timeline on task failure (#930 ) * foo * update * Move _add_missing_timestamps to task_profiles function.	2017-09-04 22:58:19 -07:00
Eric Liang	63d8d11714	[webui] Checkboxes should go to the left of their labels (#932 )	2017-09-04 17:05:13 -07:00
Robert Nishihara	8ed03b1cf0	Make task timeline work with ipywidgets==7.0.0, change slider default values. (#925 ) * Make task timeline work with ipywidgets==7.0.0. * Change initial UI slider values from 70-100 to 0-100.	2017-09-03 23:15:46 -07:00
Wapaul1	4db45c9c54	Improved layout of controls for Web UI (#876 ) * Improved layout of controls * Added explicit labels and some comments * Fix linting errors	2017-08-28 14:43:34 -07:00
Robert Nishihara	d43a435c68	Don't redirect worker output to log files if redirect_output=False. (#873 ) * Don't redirect worker output to log files if redirect_output=False. * Fix, handle case where RedirectOutput key is not in Redis.	2017-08-27 14:27:44 -07:00
Robert Nishihara	ca53e9ae7b	Fix bugs in task timeline visualization. (#836 ) * Fix bugs in task timeline visualization. * Some cleanups. * Remove print statements.	2017-08-13 23:39:37 -07:00
alanamarzoev	bfe473fa8c	Embedded task trace with object dependencies. (#818 ) * Embedded timeline * Yeah * Fixed arrows not showing up. * Fixed arrows not showing up, and added check boxes for the kinds of dependencies that should be included in the trace. * first * Fixes * Fixed typo in comments, added more comments. fixed linting. * Added more comments. * Formatting. * fixes * Fixed state.py linting. * Fixed ui.py linting errors. * Fixed linting errors. * Renamed task dependencies and included instructions for viewing arrows. * Fixed according to PR comments. * Fixed bug. * Undid changes to metadata blocks. * Fixes according to comments. * Fixed linting. * Fixed linting. * NOQA keyword added to link line.	2017-08-09 23:00:14 -07:00
Alexey Tumanov	fc885bd918	Adding basic support for a user-interpretable resource label (#761 ) * adding support for the user-interpretable label(UIR) * more plumbing for num_uirs further upstream; set to infty when specified on cmd line * pass default num_uirs for actors; update GlobalStateAPI * support num_uirs in ray.init() * local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting * global scheduler test updated * Fix bug introduced by rebase. * Rename UIR -> CustomResource and add test. * Small changes and use constexpr instead of macros. * Linting and some renaming. * Reorder some code. * Remove cpus_in_use and fix bug. * Add another test and make a small change. * Rephrase documentation about feature stability.	2017-08-08 02:53:59 -07:00
alanamarzoev	64eaaaebf0	Show timeline button. (#809 )	2017-08-03 20:11:50 -07:00
alanamarzoev	99badc7ae4	UI functions in separate file. (#801 ) * UI file. * Fixed linting. * Change UI instructions slightly.	2017-08-02 19:32:18 -07:00
Robert Nishihara	8c8258de20	Move worker methods into Worker class and expose more TaskSpec fields to Python. (#796 ) * Move worker methods inside worker class. Move some helper methods from actor.py into utils.py and state.py. * Add more methods exposing task spec fields to Python. * Fix linting. * Fix error. * Remove unused code in default worker.	2017-08-01 17:16:57 -07:00
Philipp Moritz	c3b39b4d86	Pull Plasma from Apache Arrow and remove Plasma store from Ray. (#692 ) * Rebase Ray on top of Plasma in Apache Arrow * add thirdparty building scripts * use rebased arrow * fix * fix build * fix python visibility * comment out C tests for now * fix multithreading * fix * reduce logging * fix plasma manager multithreading * make sure old and new object IDs can coexist peacefully * more rebasing * update * fixes * fix * install pyarrow * install cython * fix * install newer cmake * fix * rebase on top of latest arrow * getting runtest.py run locally (needed to comment out a test for that to work) * work on plasma tests * more fixes * fix local scheduler tests * fix global scheduler test * more fixes * fix python 3 bytes vs string * fix manager tests valgrind * fix documentation building * fix linting * fix c++ linting * fix linting * add tests back in * Install without sudo. * Set PKG_CONFIG_PATH in build.sh so that Ray can find plasma. * Install pkg-config * Link -lpthread, note that find_package(Threads) doesn't seem to work reliably. * Comment in testGPUIDs in runtest.py. * Set PKG_CONFIG_PATH when building pyarrow. * Pull apache/arrow and not pcmoritz/arrow. * Fix installation in docker image. * adapt to changes of the plasma api * Fix installation of pyarrow module. * Fix linting. * Use correct python executable to build pyarrow.	2017-07-31 21:04:15 -07:00
Robert Nishihara	c394a65ffc	Wait longer when getting redis shards to initialize global state API. (#786 )	2017-07-31 17:56:11 -07:00
alanamarzoev	2b3190ad13	Chrome trace timeline with sliders. (#731 ) * Trace timeline with sliders. * Trace. * Switched ujson to json. * Fixed tests. * linting fixes * Fixed bug. * Cleaned up code. * Fixes according to comments. * removed checkpoints. * Undid accidental delete. * Fixed linting error. * Added documentation to notebook. * Undid accidental deletes. * Add comments and small formatting fixes. * Small fix.	2017-07-17 19:59:49 -07:00
Robert Nishihara	e0867c8845	Switch Python indentation from 2 spaces to 4 spaces. (#726 ) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes.	2017-07-13 21:53:57 +00:00
alanamarzoev	8464d77c76	Change event logs to store one Redis ZSET per worker. (#705 ) * Changing to zset * Fixed bug. * Fixed another bug. * Modified task_profiles. * Removed extra file. * Modified task_profiles test. * WIP * WIP * Undid changes * Updated * WIP * Made changes according to comments. * Removed unneeded print. * Removed ujson usage. * failing test * tests passing * Fixed linting errors and modified style. * Fixed bug. * Fixed linting * Fixed according to comments. * Redis crashing? * Fixed linting * Fixed linting	2017-07-09 01:42:29 +02:00
alanamarzoev	2b11a7bca2	Add task ID and object ID search boxes to web UI. (#704 ) * Task search box. * Cleaned up. * Small reformatting. * Add object table search box.	2017-07-01 17:48:23 -04:00
alanamarzoev	716469160e	Enable dumping profiling information to timeline format viewable by chrome tracing. (#703 ) * Chrome tracing timeline. * Modified decode statement. * Some cleanups and add test. * Remove example. * Fix test.	2017-06-30 12:14:11 -04:00
alanamarzoev	e16df6da9a	Updated task_profiles function to avoid future repetitive parsing. (#691 ) * Updated task_profiles function to avoid future repetitive parsing. * Fix indentation. * Fixed according to comments. * Included updated test for task_profiles function. * Simplify test. * Fix indentation. * Fix.	2017-06-22 19:21:18 -07:00
alanamarzoev	4d5ac9dad5	Include object size and hash in the table returned by the object_table function in the GlobalStateAPI. (#665 ) * added log_table function and a test * fixed log_files and added task_profiles * fixed formatting * fixed linting errors * fixes * removed file * more fixes * hopefully fixed * Small changes. * Fix linting. * Fix bug in log monitor. * Small changes. * Fix bug in travis. * Including data_size and hash in the ResultTableReply. * Included data_size and hash info in object_table. * Fixed bugs in ray_redis_module.cc. * Removing commented out code. * Fixes * Freed hash and data_size strings after using, and checked if they're null along with task_id and is_put. * Changed it so that data_size is set correctly. * Removed iostream import. * Included a check to ensure that the Redis string to long long conversion was successful. * Included separate data_size and hash null checks. * Fixed bug. * Made linting changes. * Another linting error. * Slight simplication.	2017-06-16 23:17:11 -07:00
alanamarzoev	cc4990b543	Task profiles function and test (#647 ) Expose some task profiling information through global state API.	2017-06-13 17:53:34 -07:00
alanamarzoev	f0339f3386	Expose log files through global state API. (#641 ) * added log_table function and a test * fixed log_files and added task_profiles * fixed formatting * fixed linting errors * fixes * removed file * more fixes * hopefully fixed * Small changes. * Fix linting. * Fix bug in log monitor. * Small changes. * Fix bug in travis.	2017-06-08 00:08:10 -07:00
Stephanie Wang	ee08c8274b	Shard Redis. (#539 ) * Implement sharding in the Ray core * Single node Python modifications to do sharding * Do the sharding in redis.cc * Pipe num_redis_shards through start_ray.py and worker.py. * Use multiple redis shards in multinode tests. * first steps for sharding ray.global_state * Fix problem in multinode docker test. * fix runtest.py * fix some tests * fix redis shard startup * fix redis sharding * fix * fix bug introduced by the map-iterator being consumed * fix sharding bug * shard event table * update number of Redis clients to be 64K * Fix object table tests by flushing shards in between unit tests * Fix local scheduler tests * Documentation * Register shard locations in the primary shard * Add plasma unit tests back to build * lint * lint and fix build * Fix * Address Robert's comments * Refactor start_ray_processes to start Redis shard * lint * Fix global scheduler python tests * Fix redis module test * Fix plasma test * Fix component failure test * Fix local scheduler test * Fix runtest.py * Fix global scheduler test for python3 * Fix task_table_test_and_update bug, from actor task table submission race * Fix jenkins tests. * Retry Redis shard connections * Fix test cases * Convert database clients to DBClient struct * Fix race condition when subscribing to db client table * Remove unused lines, add APITest for sharded Ray * Fix * Fix memory leak * Suppress ReconstructionTests output * Suppress output for APITestSharded * Reissue task table add/update commands if initial command does not publish to any subscribers. * fix * Fix linting. * fix tests * fix linting * fix python test * fix linting	2017-05-18 17:40:41 -07:00
Philipp Moritz	28f0882387	Expose function table to python global control state API (#542 ) * expose function table to python global control state API * fix * fix linting * add test for function table	2017-05-16 20:06:13 -07:00
Robert Nishihara	ec2534422b	Remove register_class from API. (#550 ) * Perform ray.register_class under the hood. * Fix bug. * Release worker lock when waiting for imports to arrive in get. * Remove calls to register_class from examples and tests. * Clear serialization state between tests. * Fix bug and add test for multiple custom classes with same name. * Fix failure test. * Fix linting and cleanups to python code. * Fixes to documentation. * Implement recursion depth for recursively registering classes. * Fix linting. * Push warning to user if waiting for class for too long. * Fix typos. * Don't export FunctionToRun if pickling the function fails. * Don't broadcast class definition when pickling class.	2017-05-16 18:38:52 -07:00
Robert Nishihara	0ac125e9b2	Clean up when a driver disconnects. (#462 ) * Clean up state when drivers exit. * Remove unnecessary field in ActorMapEntry struct. * Have monitor release GPU resources in Redis when driver exits. * Enable multiple drivers in multi-node tests and test driver cleanup. * Make redis GPU allocation a redis transaction and small cleanups. * Fix multi-node test. * Small cleanups. * Make global scheduler take node_ip_address so it appears in the right place in the client table. * Cleanups. * Fix linting and cleanups in local scheduler. * Fix removed_driver_test. * Fix bug related to vector -> list. * Fix linting. * Cleanup. * Fix multi node tests. * Fix jenkins tests. * Add another multi node test with many drivers. * Fix linting. * Make the actor creation notification a flatbuffer message. * Revert "Make the actor creation notification a flatbuffer message." This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39. * Add comment explaining flatbuffer problems.	2017-04-24 18:10:21 -07:00
Robert Nishihara	7af6f462fb	Add API for querying global control state. (#431 ) * Add API for querying global control state. * Fix linting. * Fix errors in Python 2. * Fix bug in test. * Fix bug in test.	2017-04-06 23:51:12 -07:00
Robert Nishihara	ba02fc0eb0	Run flake8 in Travis and make code PEP8 compliant. (#387 )	2017-03-21 12:57:54 -07:00
Wapaul1	c66178bcd7	Resnet Adapted to Ray (#229 ) * Initial conversion * Further changes * fixes * some changes * Fixes * Added data pipeline * Added updates to cifar * Currently borken need sep pr * Added test for retriving variables from an optimizer * Removed FlAG ref in environment variables * Added comments to test * Addressed comments * Added updates * Made further changes for tfutils * Fixed finalized bug * Removed ipython * Added accuracy printing * Temp commit * added fixes * changes * Added writing to file * Fixes for gpus * Cleaned up code * Temp commit * Gpu support fully implemented * Updated to use num_gpus for actors * Finished testing gpus implementation * Changed to be more in line with origin implementation * Updated test to use actors * Added support for cpu only systems * Now works with no cpus * Minor changes and some documentation.	2017-03-07 01:07:32 -08:00
Stephanie Wang	41b8675d04	Availability after local scheduler failure (#329 ) * Clean up plasma subscribers on EPIPE First pass at a monitoring script - monitor can detect local scheduler death Clean up task table upon local scheduler death in monitoring script Don't schedule to dead local schedulers in global scheduler Have global scheduler update the db clients table, monitor script cleans up state Documentation Monitor script should scan tables before beginning to read from subscription channel Fix for python3 Redirect monitor output to redis logs, fix hanging in multinode tests * Publish auxiliary addresses as part of db_client deletion notifications * Fix test case? * Small changes. * Use SCAN instead of KEYS * Address comments * Address more comments * Free redis module strings	2017-03-02 19:51:20 -08:00

1 2

58 Commits