wassname/ray - ray - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/ray.git synced 2026-06-28 12:28:10 +08:00

Author	SHA1	Message	Date
Robert Nishihara	cd80891ddb	Try to figure out the memory limit in a docker container. (#3605 ) * Try to figure out the memory limit in a docker container. * Update comment * Fix * Fix	2019-01-03 23:07:24 -08:00
Yuhong Guo	4b23a34c93	Fix multi-thread problem of function manager and Jenkins test (#3648 )	2019-01-03 17:05:13 +08:00
Robert Nishihara	bb7ca3bae7	Upgrade flatbuffers version to 1.10.0. (#3559 ) * Upgrade flatbuffers version to 1.10.0. * Temporarily change ray.utils.decode for backwards compatibility.	2018-12-23 14:56:34 -08:00
Si-Yuan	84fae57ab5	Convert the raylet client (the code in local_scheduler_client.cc) to proper C++. (#3511 ) * refactoring * fix bugs * create client class * create client class for java; bug fix * remove legacy code * improve code by using std::string, std::unique_ptr rename private fields and removing legacy code * rename class * improve naming * fix * rename files * fix names * change name * change return types * make a mutex private field * fix comments * fix bugs * lint * bug fix * bug fix * move too short functions into the header file * Loose crash conditions for some APIs. * Apply suggestions from code review Co-Authored-By: suquark <suquark@gmail.com> * format * update * rename python APIs * fix java * more fixes * change types of cpython interface * more fixes * improve error processing * improve error processing for java wrapper * lint * fix java * make fields const * use pointers for [out] parameters * fix java & error msg * fix resource leak, etc.	2018-12-13 13:39:10 -08:00
Stephanie Wang	d950e92f63	Allow multiple threads to call ray.get and ray.wait (#3244 ) * Handle multiple threads calling ray.get * Multithreaded ray.wait * Pass in current task ID in java backend * Add multithreaded actor to tests, add warning messages to worker for multithreaded ray.get * Fix test * Some cleanups * Improve error message * Add assertion * Cleanup, throw error in HandleTaskUnblocked if task not actually blocked * lint * Fix python worker reset * Fix references to reconstruct_objects * Linting * java lint * Fix java * Fix iterator	2018-11-07 22:39:28 -08:00
Robert Nishihara	9868af4c7c	Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small. (#3149 ) * Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small. * Add logging statement and address comments. * Fix	2018-10-28 20:09:06 -07:00
Robert Nishihara	658c14282c	Remove legacy Ray code. (#3121 ) * Remove legacy Ray code. * Fix cmake and simplify monitor. * Fix linting * Updates * Fix * Implement some methods. * Remove more plasma manager references. * Fix * Linting * Fix * Fix * Make sure class IDs are strings. * Some path fixes * Fix * Path fixes and update arrow * Fixes. * linting * Fixes * Java fixes * Some java fixes * TaskLanguage -> Language * Minor * Fix python test and remove unused method signature. * Fix java tests * Fix jenkins tests * Remove commented out code.	2018-10-26 13:36:58 -07:00
Yuhong Guo	9948e8c11b	Move function/actor exporting & loading code to function_manager.py (#3003 ) Move function/actor exporting & loading code to function_manager.py to prepare the code change for function descriptor for python.	2018-10-03 16:21:04 -07:00
Robert Nishihara	ea9d1cc887	Remove dependence on psutil. Add utility functions for getting system memory. (#2892 )	2018-09-18 15:03:29 +08:00
Robert Nishihara	bd64c940e9	Push error to driver when monitor raises an exception. (#2834 )	2018-09-07 17:42:45 -07:00
Robert Nishihara	909d7172b1	Introduce constant for ID_SIZE in python code. (#2517 )	2018-07-31 12:40:53 -07:00
Eric Liang	38d00986a5	[rllib] Cleanups: deep merge configs properly; enforce min iter time on APEX (#2500 ) The dict merge prevents crashes when tune is trying to get resource requests for agents and you override a config subkey. The min iter time prevents iterations from getting too small, incurring high overhead. This is easy to run into on Ape-X since throughput can get very high.	2018-07-30 13:25:35 -07:00
Hao Chen	05f485e274	Allow Ray API to be used from multiple threads (#2422 )	2018-07-20 15:39:01 -07:00
Robert Nishihara	b90e551b41	[xray] Implement timeline and profiling API. (#2306 ) * Add profile table and store profiling information there. * Code for dumping timeline. * Improve color scheme. * Push timeline events on driver only for raylet. * Improvements to profiling and timeline visualization * Some linting * Small fix. * Linting * Propagate node IP address through profiling events. * Fix test. * object_id.hex() should return byte string in python 2. * Include gcs.fbs in node_manager.fbs. * Remove flatbuffer definition duplication. * Decode to unicode in Python 3 and bytes in Python 2. * Minor * Submit profile events in a batch. Revert some CMake changes. * Fix * Workaround test failure. * Fix linting * Linting * Don't return anything from chrome_tracing_dump when filename is provided. * Remove some redundancy from profile table. * Linting * Move TODOs out of docstring. * Minor	2018-07-04 23:23:48 -07:00
Hao Chen	20c0ecb522	Reuse code of checking large pickles (#2291 )	2018-06-28 16:51:23 -10:00
Robert Nishihara	ff2217251f	[xray] Add error table and push error messages to driver through node manager. (#2256 ) * Fix documentation indentation. * Add error table to GCS and push error messages through node manager. * Add type to error data. * Linting * Fix failure_test bug. * Linting. * Enable one more test. * Attempt to fix doc building. * Restructuring * Fixes * More fixes. * Move current_time_ms function into util.h.	2018-06-20 21:29:28 -07:00
Alok Singh	f795173b51	Use flake8-comprehensions (#1976 ) * Add flake8 to Travis * Add flake8-comprehensions [flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that checks for useless constructions. * Use generators instead of lists where appropriate A lot of the builtins can take in generators instead of lists. This commit applies `flake8-comprehensions` to find them. * Fix lint error * Fix some string formatting The rest can be fixed in another PR * Fix compound literals syntax This should probably be merged after #1963. * dict() -> {} * Use dict literal syntax dict(...) -> {...} * Rewrite nested dicts * Fix hanging indent * Add missing import * Add missing quote * fmt * Add missing whitespace * rm duplicate pip install This is already installed in another file. * Fix indent * move `merge_dicts` into utils * Bring up to date with `master` * Add automatic syntax upgrade * rm pyupgrade In case users want to still use it on their own, the upgrade-syn.sh script was left in the `.travis` dir.	2018-05-20 16:15:06 -07:00
Robert Nishihara	99ae74e1d2	Improve error message printing and suppression. (#2104 )	2018-05-20 12:13:14 -07:00
Robert Nishihara	8fbb88485b	Create RemoteFunction class, remove FunctionProperties, simplify worker Python code. (#2052 ) * Cleaning up worker and actor code. Create remote function class. Remove FunctionProperties object. * Remove register_actor_signatures function. * Small cleanups. * Fix linting. * Support @ray.method syntax for actor methods. * Fix pickling bug. * Fix linting. * Shorten testBlockingTasks. * Small fixes. * Call get_global_worker().	2018-05-14 14:35:23 -07:00
Philipp Moritz	74162d1492	Lint Python files with Yapf (#1872 )	2018-04-11 10:11:35 -07:00
Robert Nishihara	f88a2544bf	Speed up actor creation task submission by generating IDs with uuid. (#1744 ) * Speed up actor creation task submission by generating IDs deterministically. * Revert "Speed up actor creation task submission by generating IDs deterministically." This reverts commit 175d9587302664916ce9db4071185485da8da041. * Don't generate actor IDs deterministically yet. * Factor out ID generation method.	2018-03-19 19:32:46 -07:00
Robert Nishihara	96913be939	Treat actor creation like a regular task. (#1668 ) * Treat actor creation like a regular task. * Small cleanups. * Change semantics of actor resource handling. * Bug fix. * Minor linting * Bug fix * Fix jenkins test. * Fix actor tests * Some cleanups * Bug fix * Fix bug. * Remove cached actor tasks when a driver is removed. * Add more info to taskspec in global state API. * Fix cyclic import bug in tune. * Fix * Fix linting. * Fix linting. * Don't schedule any tasks (especially actor creaiton tasks) on local schedulers with 0 CPUs. * Bug fix. * Add test for 0 CPU case * Fix linting * Address comments. * Fix typos and add comment. * Add assertion and fix test.	2018-03-16 11:18:07 -07:00
Stephanie Wang	ff8e7f8259	Actor checkpointing for distributed actor handles (#1498 ) * Expose calls to get and set the actor frontier * Remove fields used for old checkpointing prototype, change actor_checkpoint_failed -> succeeded * Prototype for actor checkpointing * Filter out duplicate tasks on the local scheduler * Clean up some of the Python checkpointing code * More cleanups * Documentation * cleanup and fix unit test * Allow remote checkpoint calls through actor handle * Check whether object is local before reconstructing * Enable checkpointing for distributed actor handles, refactor tests * Fix local scheduler tests * lint * Address comments * lint * Skip tests that fail on new GCS * style * Don't put same object twice when setting the actor frontier * Address Philipp's comments, cleaner fbs naming	2018-02-07 11:19:32 -08:00
Robert Nishihara	ed77a4c415	Make ray.get_gpu_ids() respect existing CUDA_VISIBLE_DEVICES. (#1499 ) * Make ray.get_gpu_ids() respect existing CUDA_VISIBLE_DEVICES. * Comment out failing GPUID check. * Add import. * Fix test. * Remove test. * Factor out environment variable setting/getting into utils.	2018-02-01 21:29:14 -08:00
Robert Nishihara	c21e189371	Allow scheduling with arbitrary user-defined resource labels. (#1236 ) * Enable scheduling with custom resource labels. * Fix. * Minor fixes and ref counting fix. * Linting * Use .data() instead of .c_str(). * Fix linting. * Fix ResourcesTest.testGPUIDs test by waiting for workers to start up. * Sleep in test so that all tasks are submitted before any completes.	2017-12-01 11:41:40 -08:00
Robert Nishihara	c1496b8111	Check version info in ray start for non-head nodes. (#1264 ) * Check version info in ray start for non-head nodes. * Small fix. * Fix * Push error to all drivers when worker has version mismatch. * Linting * Linting * Fix * Unify methods. * Fix bug.	2017-11-27 22:03:38 -08:00
Daniel Suo	4f0da6f81c	Add basic functionality for Cython functions and actors (#1193 ) * Add basic functionality for Cython functions and actors * Fix up per @pcmoritz comments * Fixes per @richardliaw comments * Fixes per @robertnishihara comments * Forgot double quotes when updating masked_log * Remove import typing for Python 2 compatibility	2017-11-09 17:49:06 -08:00
Robert Nishihara	4669c59fa8	Release GPU resources as soon as an actor exits. (#1088 ) * Release GPU resources as soon as an actor exits. * Add a test. * Store local_scheduler_id and driver_id in the worker object instead of the actor object.	2017-10-06 17:58:19 -07:00
Alexey Tumanov	fc885bd918	Adding basic support for a user-interpretable resource label (#761 ) * adding support for the user-interpretable label(UIR) * more plumbing for num_uirs further upstream; set to infty when specified on cmd line * pass default num_uirs for actors; update GlobalStateAPI * support num_uirs in ray.init() * local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting * global scheduler test updated * Fix bug introduced by rebase. * Rename UIR -> CustomResource and add test. * Small changes and use constexpr instead of macros. * Linting and some renaming. * Reorder some code. * Remove cpus_in_use and fix bug. * Add another test and make a small change. * Rephrase documentation about feature stability.	2017-08-08 02:53:59 -07:00
Robert Nishihara	cb84972f6b	Recreate actors when local schedulers die. (#804 ) * Reconstruct actor state when local schedulers fail. * Simplify construction of arguments to pass into default_worker.py from local scheduler. * Remove deprecated ray.actor. * Simplify actor reconstruction method. * Fix linting. * Small fixes.	2017-08-02 18:02:52 -07:00
Robert Nishihara	8c8258de20	Move worker methods into Worker class and expose more TaskSpec fields to Python. (#796 ) * Move worker methods inside worker class. Move some helper methods from actor.py into utils.py and state.py. * Add more methods exposing task spec fields to Python. * Fix linting. * Fix error. * Remove unused code in default worker.	2017-08-01 17:16:57 -07:00
Robert Nishihara	e0867c8845	Switch Python indentation from 2 spaces to 4 spaces. (#726 ) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes.	2017-07-13 21:53:57 +00:00
Philipp Moritz	54925996ca	Allow remote functions to specify max executions and kill worker once limit is reached. (#660 ) * implement restarting workers after certain number of task executions * Clean up python code. * Don't start new worker when an actor disconnects. * Move wait_for_pid_to_exit to test_utils.py. * Add test. * Fix linting errors. * Fix linting. * Fix typo.	2017-06-13 00:34:58 -07:00
Robert Nishihara	245c8ab888	Make sure user seeding does not affect actor ID generation. (#506 ) * Make sure user seeding does not affect actor ID generation. * Fix linting. * Add test.	2017-05-03 16:29:55 -07:00
Robert Nishihara	0ac125e9b2	Clean up when a driver disconnects. (#462 ) * Clean up state when drivers exit. * Remove unnecessary field in ActorMapEntry struct. * Have monitor release GPU resources in Redis when driver exits. * Enable multiple drivers in multi-node tests and test driver cleanup. * Make redis GPU allocation a redis transaction and small cleanups. * Fix multi-node test. * Small cleanups. * Make global scheduler take node_ip_address so it appears in the right place in the client table. * Cleanups. * Fix linting and cleanups in local scheduler. * Fix removed_driver_test. * Fix bug related to vector -> list. * Fix linting. * Cleanup. * Fix multi node tests. * Fix jenkins tests. * Add another multi node test with many drivers. * Fix linting. * Make the actor creation notification a flatbuffer message. * Revert "Make the actor creation notification a flatbuffer message." This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39. * Add comment explaining flatbuffer problems.	2017-04-24 18:10:21 -07:00

35 Commits