wassname/ray - ray - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/ray.git synced 2026-06-29 19:17:01 +08:00

Author	SHA1	Message	Date
Hao Chen	6f1a29ad3f	Consodiate CI Python tests and fix bug about multiple ray.init (#4195 )	2019-03-01 14:38:28 -08:00
Robert Nishihara	d9bcaa20b5	Turn UI off by default. (#4188 )	2019-02-28 17:29:52 -08:00
Ion	88e14feb53	Reset signal counters when a task finishes (#4173 )	2019-02-28 15:15:03 -08:00
Yuhong Guo	1f864a02bc	Add option of load_code_from_local which is required in cross-language ray call. (#3675 )	2019-02-21 12:37:17 +08:00
Robert Nishihara	e7651b1117	Fix excessive buffering of worker stdout/stderr. (#4094 ) * Start workers with 'python -u' to prevent buffering of prints. * Set sys.stdout and sys.stderr. * Add comment.	2019-02-19 20:20:47 -08:00
Eric Liang	e9ee38ace2	More compact format for worker logs (#4092 )	2019-02-19 19:53:43 -08:00
Wang Qing	794a093249	Add runtime_context to get some runtime fields in worker (#4065 )	2019-02-19 15:57:30 +08:00
Hao Chen	de17443dc2	Propagate backend error to worker (#4039 )	2019-02-16 11:39:15 +08:00
Robert Nishihara	5f71751891	API cleanups. Remove worker argument. Remove some deprecated arguments. (#4025 ) * Remove worker argument from API methods. * Remove deprecated arguments and deprecate redirect_output and redirect_worker_output. * Fix	2019-02-15 10:49:16 -08:00
Hao Chen	042ad84573	Simplify Cython ID types and fix bug of ActorCheckpointID (#4045 )	2019-02-15 20:15:16 +08:00
Si-Yuan	2de31eb489	minor fix (#4040 )	2019-02-13 17:22:45 -08:00
Hao Chen	f31a79f3f7	Implement actor checkpointing (#3839 ) * Implement Actor checkpointing * docs * fix * fix * fix * move restore-from-checkpoint to HandleActorStateTransition * Revert "move restore-from-checkpoint to HandleActorStateTransition" This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12. * resubmit waiting tasks when actor frontier restored * add doc about num_actor_checkpoints_to_keep=1 * add num_actor_checkpoints_to_keep to Cython * add checkpoint_expired api * check if actor class is abstract * change checkpoint_ids to long string * implement java * Refactor to delay actor creation publish until checkpoint is resumed * debug, lint * Erase from checkpoints to restore if task fails * fix lint * update comments * avoid duplicated actor notification log * fix unintended change * add actor_id to checkpoint_expired * small java updates * make checkpoint info per actor * lint * Remove logging * Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager * Replace old actor checkpointing tests * Fix test and lint * address comments * consolidate kill_actor * Remove __ray_checkpoint__ * fix non-ascii char * Loosen test checks * fix java * fix sphinx-build	2019-02-13 19:39:02 +08:00
Si-Yuan	21472b890a	Integrate "tempfile_service" into "ray.node.Node" (#3953 )	2019-02-12 17:34:04 -08:00
Ion	3c32343c63	Ray signal (#3624 )	2019-02-11 10:14:48 -08:00
Yuhong Guo	5fb1efd60d	Fix CI test failures (#4007 )	2019-02-11 11:01:14 +08:00
Robert Nishihara	ef527f84ab	Stream logs to driver by default. (#3892 ) * Stream logs to driver by default. * Fix from rebase * Redirect raylet output independently of worker output. * Fix. * Create redis client with services.create_redis_client. * Suppress Redis connection error at exit. * Remove thread_safe_client from redis. * Shutdown driver threads in ray.shutdown(). * Add warning for too many log messages. * Only stop threads if worker is connected. * Only stop threads if they exist. * Remove unnecessary try/excepts. * Fix * Only add new logging handler once. * Increase timeout. * Fix tempfile test. * Fix logging in cluster_utils. * Revert "Increase timeout." This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95. * Retry longer when connecting to plasma store from node manager and object manager. * Close pubsub channels to avoid leaking file descriptors. * Limit log monitor open files to 200. * Increase plasma connect retries. * Add comment.	2019-02-07 19:53:50 -08:00
Robert Nishihara	fa4eb8313d	Suppress warning for serializing different unique ID types in Python. (#3872 ) * Suppress warning for serializing different unique ID types in Python. * Add _ID_TYPES variable.	2019-02-05 11:38:33 -08:00
Si-Yuan	9295ab8f60	Various Python code cleanups. (#3837 )	2019-02-03 10:16:24 -08:00
Richard Liaw	d128636bab	Ray Logging Configuration (#3691 ) * fix logging for autoscaler * module logging * try this for logging * yapf * fix * Initial logging setup * momery * ok * remove basicconfig * catch * remove package logging * print * fix * try_fix * fix 1 * revert rllib * logging level * flake8 * fix * fix * Remove vestigal TODO	2019-01-30 21:01:12 -08:00
Si-Yuan	48139cf861	Migrate Python C extension to Cython (#3541 )	2019-01-24 09:17:14 -08:00
Wang Qing	816406ea3d	[Java] Fix `setCurrentTask()` in multi threading (#3821 )	2019-01-23 20:45:30 +08:00
Robert Nishihara	0b1608a546	Factor out code for starting new processes and test plasma store in valgrind. (#3824 ) * Factor out starting Ray processes. * Detect flags through environment variables. * Return ProcessInfo from start_ray_process. * Print valgrind errors at exit. * Test valgrind in travis. * Some valgrind fixes. * Undo raylet monitor change. * Only test plasma store in valgrind.	2019-01-22 14:59:11 -08:00
Yuhong Guo	d2cf8561f2	Refactor code about ray.ObjectID. (#3674 ) * Refactor code about ray.ObjectID. * remove from_random and use nil_id instead of constructor * remove id() in hash * Lint and fix * Change driver id to ObjectID * Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()	2019-01-13 01:47:29 -08:00
Robert Nishihara	8723d6b061	Define a Node class to manage Ray processes. (#3733 ) * Implement Node class and move most of services.py into it. * Wait for nodes as they are added to the cluster. * Fix Redis authentication bug. * Fix bug in client table ordering. * Address comments. * Kill raylet before plasma store in test. * Minor	2019-01-11 22:30:38 -08:00
Hao Chen	597abb24ea	Refine multi-threading support (#3672 ) * [Python] refine multi-threading support fix * [java] refine multithreading code fix java * format	2019-01-10 13:58:11 -08:00
Stephanie Wang	04f31db54d	Actor dummy object garbage collection (#3593 ) * Convert UniqueID::nil() to a constructor * Cleanup actor handle pickling code * Add new actor handles to the task spec * Pass in new actor handles * Add new handles to the actor registration * Regression test for actor handle forking and GC * lint and doc * Handle pickled actor handles in the backend and some refactoring * Add regression test for dummy object GC and pickled actor handles * Check for duplicate actor tasks on submission * Regression test for forking twice, fix failed named actor leak * Fix bug for forking twice * lint * Revert "Fix bug for forking twice" This reverts commit 3da85e59d401e53606c2e37ffbebcc8653ff27ac. * Add new actor handles when task is assigned, not finished * Remove comment * remove UniqueID() * Updates * update * fix * fix java * fixes * fix	2019-01-09 10:37:11 -08:00
Robert Nishihara	d1e21b702e	Change timeout from milliseconds to seconds in ray.wait. (#3706 ) * Change timeout from milliseconds to seconds in ray.wait. * Suppress warning. * Suppress warning. * Add prominent warning in API documentation.	2019-01-08 21:32:08 -08:00
Robert Nishihara	5e76d52868	Improve cluster.wait_for_nodes() API. (#3712 ) * Separate out functionality for querying client table and improve cluster.wait_for_nodes() API. * Linting * Add back logging statements. * info -> debug	2019-01-07 21:26:58 -08:00
Robert Nishihara	c9d70f0dda	Remove num_local_schedulers argument from ray.worker._init. (#3704 ) * Remove num_local_schedulers argument from ray.worker._init. * Fix * Fix tests.	2019-01-07 12:44:49 -08:00
Robert Nishihara	586a5c9ffa	Limit default redis max memory to 10GB. (#3630 ) * Limit Redis max memory to 10GB/shard by default. * Update stress tests. * Reorganize * Update * Add minimum cap size for object store and redis. * Small test update.	2019-01-03 13:23:54 -08:00
Yuhong Guo	4b23a34c93	Fix multi-thread problem of function manager and Jenkins test (#3648 )	2019-01-03 17:05:13 +08:00
Robert Nishihara	b6bcd18d65	Split profile table among many keys in the GCS. (#3676 ) * Divide profile table among many keys in GCS. * Fix, and remove --collect-profiling-data arg. * Remove reference in doc.	2019-01-02 21:33:01 -08:00
Si-Yuan	93d54110f8	Prevent overriding faulthandler settings (#3668 ) This change ensures that Ray set up fault handlers only if it has not been enabled by other applications. Otherwise some applications could face strange issues when using Ray, and some unittests using xml runners will fail.	2018-12-31 16:36:26 -08:00
Yuhong Guo	c9b8ecca51	Add RayParams to refactor the parameters used by ray python. (#3558 )	2018-12-29 22:04:27 +08:00
Alexey Tumanov	c4cba98c75	Remove deprecation warnings when running actor tests (#3563 ) * remove deprecation warnings when running actor tests * replacing logger.warn with logger.warning * Update worker.py * Update policy_client.py * Update compression.py	2018-12-18 17:04:51 -08:00
Yuhong Guo	fb33fa9097	Enable function_descriptor in backend to replace the function_id (#3028 )	2018-12-18 18:53:59 -05:00
Yuhong Guo	75ddf7cca4	Fix 2 small bugs (#3573 )	2018-12-18 14:52:21 -05:00
Robert Nishihara	417c7f2d6f	Update arrow and remove plasma_manager references. (#3545 )	2018-12-15 23:36:02 -08:00
Philipp Moritz	b3bf608608	Update arrow to reduce plasma IPCs. (#3497 )	2018-12-14 23:49:37 -05:00
Hao Chen	e7b51cbd1b	[xray] Implement Actor Reconstruction (#3332 ) * Implement Actor Reconstruction * fix * fix actor handle __del__ * fix lint * add comment * Remove actorCreationDummyObjectId * address comments * fix * address comments * avoid copy * change log to debug * fix error name	2018-12-13 21:28:58 -08:00
Si-Yuan	84fae57ab5	Convert the raylet client (the code in local_scheduler_client.cc) to proper C++. (#3511 ) * refactoring * fix bugs * create client class * create client class for java; bug fix * remove legacy code * improve code by using std::string, std::unique_ptr rename private fields and removing legacy code * rename class * improve naming * fix * rename files * fix names * change name * change return types * make a mutex private field * fix comments * fix bugs * lint * bug fix * bug fix * move too short functions into the header file * Loose crash conditions for some APIs. * Apply suggestions from code review Co-Authored-By: suquark <suquark@gmail.com> * format * update * rename python APIs * fix java * more fixes * change types of cpython interface * more fixes * improve error processing * improve error processing for java wrapper * lint * fix java * make fields const * use pointers for [out] parameters * fix java & error msg * fix resource leak, etc.	2018-12-13 13:39:10 -08:00
Eric Liang	0e00533ed4	Different approach to removing RayGetError (#3471 )	2018-12-12 20:30:51 -08:00
Eric Liang	cffe8f9806	Add option to evict keys LRU from the sharded redis tables (#3499 ) * wip * wip * format * wip * note * lint * fix * flag * typo * raise timeout * fix * optional get * fix flag * increase timeout in test * update docs * format	2018-12-09 05:48:52 -08:00
Tianming Xu	f6490f9bef	Resolve no handlers could be found for logger 'ray.worker' when importing ray (#3483 )	2018-12-06 20:46:53 -08:00
Si-Yuan	2e6f9bedf2	Add the extra fallback for serialization (#3468 ) * Add the extra fallback for serialization. * Better comments & warnings. quotes. * Update test/runtest.py Co-Authored-By: suquark <suquark@gmail.com> * Update test/runtest.py Co-Authored-By: suquark <suquark@gmail.com> * linting * Don't hijack too much errors. * simplify the test * Update runtest.py * simplify	2018-12-05 13:09:08 -08:00
Eric Liang	0d56fc10cc	Move setproctitle to ray[debug] package (#3415 )	2018-11-27 09:50:59 -08:00
Robert Nishihara	3856533065	Fix incompatibility with most recent version of Redis. (#3379 ) * Fix incompatibility with most recent version of Redis. * Fix * Fixes.	2018-11-24 16:36:38 -08:00
Eric Liang	afc48d7b77	Don't setpgid() on actors (#3347 )	2018-11-19 17:35:26 -08:00
Eric Liang	e0bf9d7305	Add debug string to raylet (#3317 ) * initial debug string * format * wip debug string * fix compile * fix * update * finished * to file * logs dir * use temp root * fix * override	2018-11-15 21:47:50 -08:00
Eric Liang	5723291db6	Raise exception if the node is nearly out of memory (#3323 ) * wip * add * comment * escape hatch * update * object store too * .2	2018-11-15 12:55:25 -08:00

1 2 3 4 5

241 Commits