Commit Graph

32 Commits

Author SHA1 Message Date
gycn a432285e77 Disable parallelization for Actors and ray.wait for debugging (#961)
Support actors and ray.wait in PYTHON_MODE.
2017-09-17 00:12:50 -07:00
Stephanie Wang 99c8b1f38c Actor fault tolerance using object lineage reconstruction (#902)
* Revert Python actor reconstruction

* Actor reconstruction using object lineage

* Add dummy arguments and return values for actor tasks

* Pin dummy outputs for actor tasks

* Skip checkpointing test for now

* TODOs

* minor edits

* Generate dummy object dependencies in Python, not C

* Fix linting.

* Move actor counter and dummy objects inside of the actor handle

* Refactor Worker._process_task, suppress exception propagation for
sequential actor tasks
2017-09-10 19:29:28 -07:00
Eric Liang c943ecaa42 Better error message when actor creation is attempted before ray.init() (#858) 2017-08-22 21:20:15 -07:00
Robert Nishihara cf41964816 Fix some bugs causing Travis test failures. (#839)
* Fix bug in which worker has no actor_class attribute.

* Remove case where we check if processes are defunct.
2017-08-16 01:18:32 -07:00
Alexey Tumanov fc885bd918 Adding basic support for a user-interpretable resource label (#761)
* adding support for the user-interpretable label(UIR)

* more plumbing for num_uirs further upstream; set to infty when specified on cmd line

* pass default num_uirs for actors; update GlobalStateAPI

* support num_uirs in ray.init()

* local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting

* global scheduler test updated

* Fix bug introduced by rebase.

* Rename UIR -> CustomResource and add test.

* Small changes and use constexpr instead of macros.

* Linting and some renaming.

* Reorder some code.

* Remove cpus_in_use and fix bug.

* Add another test and make a small change.

* Rephrase documentation about feature stability.
2017-08-08 02:53:59 -07:00
Robert Nishihara dbe3d9351c Prototype actor checkpointing. (#814)
* Initial testing of checkpointing functions.

* Save checkpoints in Redis.

* Pipe checkpoint_interval through remote decorator.

* Add a test.

* Small cleanups.

* Submit dummy tasks when reconstructing tasks before the most recent tasks so that we don't end up reconstructing the arguments for those tasks.

* Remove old checkpoints to save space.

* Fix linting.
2017-08-07 17:52:39 -07:00
Robert Nishihara cb84972f6b Recreate actors when local schedulers die. (#804)
* Reconstruct actor state when local schedulers fail.

* Simplify construction of arguments to pass into default_worker.py from local scheduler.

* Remove deprecated ray.actor.

* Simplify actor reconstruction method.

* Fix linting.

* Small fixes.
2017-08-02 18:02:52 -07:00
Robert Nishihara 8c8258de20 Move worker methods into Worker class and expose more TaskSpec fields to Python. (#796)
* Move worker methods inside worker class. Move some helper methods from actor.py into utils.py and state.py.

* Add more methods exposing task spec fields to Python.

* Fix linting.

* Fix error.

* Remove unused code in default worker.
2017-08-01 17:16:57 -07:00
Robert Nishihara e0867c8845 Switch Python indentation from 2 spaces to 4 spaces. (#726)
* 4 space indentation for actor.py.

* 4 space indentation for worker.py.

* 4 space indentation for more files.

* 4 space indentation for some test files.

* Check indentation in Travis.

* 4 space indentation for some rl files.

* Fix failure test.

* Fix multi_node_test.

* 4 space indentation for more files.

* 4 space indentation for remaining files.

* Fixes.
2017-07-13 21:53:57 +00:00
Robert Nishihara 019ba07e9c Correct actor class name and module. (#675)
* Correct actor class name and module.

* Add test.

* Fix linting.
2017-06-17 05:44:42 +00:00
Philipp Moritz 54925996ca Allow remote functions to specify max executions and kill worker once limit is reached. (#660)
* implement restarting workers after certain number of task executions

* Clean up python code.

* Don't start new worker when an actor disconnects.

* Move wait_for_pid_to_exit to test_utils.py.

* Add test.

* Fix linting errors.

* Fix linting.

* Fix typo.
2017-06-13 00:34:58 -07:00
Richard Liaw 8d350f628a Fixing Redis Key Consistencies for Actor, FunctionTable, FunctionsToRun, and RemoteFunction (#659)
* consistencies for Actor, FunctionTable, and FunctionsToRun

* NOT WORKING: changing remote fn keys
2017-06-10 23:45:22 +00:00
Robert Nishihara a4d8e13094 Suppress excess warning messages related to intentional actor deaths. (#627)
* Don't submit the actor destructor tasks when the job is exiting.

* Don't propagate error messages to the driver when an actor exits intentionally.
2017-06-01 20:10:40 +00:00
Robert Nishihara d0bfc0a849 Clean up actor workers when actor handle goes out of scope. (#617) 2017-06-01 07:02:43 +00:00
Chelsea Finn f97d0393cc Fix to json decoding bug (#597)
* fix json decoding bug

* Fix linting error.
2017-05-25 18:48:39 -07:00
Robert Nishihara 997aa35721 Remove cloudpickle customization and just use plain cloudpickle. (#588)
* Remove augmentations of cloudpickle.

* Entirely remove cloudpickle modifications. Just use plain cloudpickle.
2017-05-24 20:22:28 -07:00
Robert Nishihara c647dd5f6c Make it possible to use actor definitions within remote functions and other actors. (#587)
* Enable remote function and actor definitions to close over actor definitions.

* Give better error message if actor objects are pickled.

* Add tests for closing over actor definitions.

* Fix linting.
2017-05-24 15:43:32 -07:00
Robert Nishihara 9f91eb8c91 Change API for remote function declaration, actor instantiation, and actor method invocation. (#541)
* Direction substitution of @ray.remote -> @ray.task.

* Changes to make '@ray.task' work.

* Instantiate actors with Class.remote() instead of Class().

* Convert actor instantiation in tests and examples from Class() to Class.remote().

* Change actor method invocation from object.method() to object.method.remote().

* Update tests and examples to invoke actor methods with .remote().

* Fix bugs in jenkins tests.

* Fix example applications.

* Change @ray.task back to @ray.remote.

* Changes to make @ray.actor -> @ray.remote work.

* Direct substitution of @ray.actor -> @ray.remote.

* Fixes.

* Raise exception if @ray.actor decorator is used.

* Simplify ActorMethod class.
2017-05-14 00:01:20 -07:00
Robert Nishihara b4788ae518 Only export actor classes once. (#510)
* Only export actor classes once.

* Fix linting.

* Fixes after rebase.
2017-05-09 19:49:23 -07:00
Robert Nishihara f32368bcbe Prevent actors from being placed on removed nodes or nodes with no CPUs. (#527)
* Make note about bug in which actor creation notification message is not received.

* Prevent actors from being created on removed nodes.

* Prevent actors from being created on nodes with no CPUs.

* Fix linting.

* Add test for scheduling actors on local schedulers with no CPUs.

* Improve error message when actors created before ray.init called.
2017-05-08 20:39:43 -07:00
Robert Nishihara c688a64235 Expose GPU IDs to remote functions. (#496)
* Change local scheduler bookkeeping to use GPU IDs.

* Update actor test.

* Add tests for actors and tasks simultaneously using GPUs.

* Add additional task GPU ID test.

* Fix linting.

* Make redis GPU assignment ignore GPU IDs.

* Small fix.
2017-05-07 13:03:49 -07:00
Robert Nishihara 245c8ab888 Make sure user seeding does not affect actor ID generation. (#506)
* Make sure user seeding does not affect actor ID generation.

* Fix linting.

* Add test.
2017-05-03 16:29:55 -07:00
Robert Nishihara 0ac125e9b2 Clean up when a driver disconnects. (#462)
* Clean up state when drivers exit.

* Remove unnecessary field in ActorMapEntry struct.

* Have monitor release GPU resources in Redis when driver exits.

* Enable multiple drivers in multi-node tests and test driver cleanup.

* Make redis GPU allocation a redis transaction and small cleanups.

* Fix multi-node test.

* Small cleanups.

* Make global scheduler take node_ip_address so it appears in the right place in the client table.

* Cleanups.

* Fix linting and cleanups in local scheduler.

* Fix removed_driver_test.

* Fix bug related to vector -> list.

* Fix linting.

* Cleanup.

* Fix multi node tests.

* Fix jenkins tests.

* Add another multi node test with many drivers.

* Fix linting.

* Make the actor creation notification a flatbuffer message.

* Revert "Make the actor creation notification a flatbuffer message."

This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39.

* Add comment explaining flatbuffer problems.
2017-04-24 18:10:21 -07:00
Robert Nishihara f4c1adae17 Unify function signature handling between remote functions and actor … (#441)
* Unify function signature handling between remote functions and actor methods.

* Fixes.

* Fix tests.
2017-04-08 21:34:13 -07:00
Robert Nishihara ba02fc0eb0 Run flake8 in Travis and make code PEP8 compliant. (#387) 2017-03-21 12:57:54 -07:00
Robert Nishihara a7ddac6fb1 Properly mock ray submodules when building documentation. (#337) 2017-03-04 23:02:56 -08:00
Robert Nishihara 6a4bde54dc Only install ray python packages. (#330)
* Only install ray python packages.

* Add some __init__.py files.

* Install Ray before building documentation.

* Fix install-ray.sh.

* Fix.
2017-03-01 23:34:44 -08:00
Robert Nishihara 1ae7e7d29e Rename photon -> local scheduler. (#322) 2017-02-27 12:24:07 -08:00
Robert Nishihara 54238c4ad0 Propagate errors from importing actors. (#309)
* Propagate errors from importing actors.

* Fix bug.
2017-02-22 15:15:45 -08:00
Robert Nishihara e399f57e6b Let actors use GPUs. (#302)
* Add num_cpus and num_gpus to actor decorator.

* Assign GPU IDs to actors.

* Add additional actor test.

* Remove duplicated line.

* Factor out local scheduler selection method.

* Add test and simplify local scheduler selection.
2017-02-21 01:13:04 -08:00
Robert Nishihara 88a5b4e77b Simplify imports and exports and provide driver isolation for remote functions. (#288)
* Remove import counter and export counter.

* Provide isolation between drivers for remote functions.

* Add test for driver function isolation.

* Hash source code into function ID to reduce likelihood of collisions.

* Fix failure test example.

* Replace assertTrue with assertIn to improve failure messages in tests.

* Fix failure test.
2017-02-16 11:30:35 -08:00
Philipp Moritz 12a68e84d2 Implement a first pass at actors in the API. (#242)
* Implement actor field for tasks

* Implement actor management in local scheduler.

* initial python frontend for actors

* import actors on worker

* IPython code completion and tests

* prepare creating actors through local schedulers

* add actor id to PyTask

* submit actor calls to local scheduler

* starting to integrate

* simple fix

* Fixes from rebasing.

* more work on python actors

* Improve local scheduler actor handlers.

* Pass actor ID to local scheduler when connecting a client.

* first working version of actors

* fixing actors

* fix creating two copies of the same actor

* fix actors

* remove sleep

* get rid of export synchronization

* update

* insert actor methods into the queue in the right order

* remove print statements

* make it compile again after rebase

* Minor updates.

* fix python actor ids

* Pass actor_id to start_worker.

* add test

* Minor changes.

* Update actor tests.

* Temporary plan for import counter.

* Temporarily fix import counters.

* Fix some tests.

* Fixes.

* Make actor creation non-blocking.

* Fix test?

* Fix actors on Python 2.

* fix rare case.

* Fix python 2 test.

* More tests.

* Small fixes.

* Linting.

* Revert tensorflow version to 0.12.0 temporarily.

* Small fix.

* Enhance inheritance test.
2017-02-15 00:10:05 -08:00