The dict merge prevents crashes when tune is trying to get resource requests for agents and you override a config subkey. The min iter time prevents iterations from getting too small, incurring high overhead. This is easy to run into on Ape-X since throughput can get very high.
* Add profile table and store profiling information there.
* Code for dumping timeline.
* Improve color scheme.
* Push timeline events on driver only for raylet.
* Improvements to profiling and timeline visualization
* Some linting
* Small fix.
* Linting
* Propagate node IP address through profiling events.
* Fix test.
* object_id.hex() should return byte string in python 2.
* Include gcs.fbs in node_manager.fbs.
* Remove flatbuffer definition duplication.
* Decode to unicode in Python 3 and bytes in Python 2.
* Minor
* Submit profile events in a batch. Revert some CMake changes.
* Fix
* Workaround test failure.
* Fix linting
* Linting
* Don't return anything from chrome_tracing_dump when filename is provided.
* Remove some redundancy from profile table.
* Linting
* Move TODOs out of docstring.
* Minor
* Fix documentation indentation.
* Add error table to GCS and push error messages through node manager.
* Add type to error data.
* Linting
* Fix failure_test bug.
* Linting.
* Enable one more test.
* Attempt to fix doc building.
* Restructuring
* Fixes
* More fixes.
* Move current_time_ms function into util.h.
* Add flake8 to Travis
* Add flake8-comprehensions
[flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that
checks for useless constructions.
* Use generators instead of lists where appropriate
A lot of the builtins can take in generators instead of lists.
This commit applies `flake8-comprehensions` to find them.
* Fix lint error
* Fix some string formatting
The rest can be fixed in another PR
* Fix compound literals syntax
This should probably be merged after #1963.
* dict() -> {}
* Use dict literal syntax
dict(...) -> {...}
* Rewrite nested dicts
* Fix hanging indent
* Add missing import
* Add missing quote
* fmt
* Add missing whitespace
* rm duplicate pip install
This is already installed in another file.
* Fix indent
* move `merge_dicts` into utils
* Bring up to date with `master`
* Add automatic syntax upgrade
* rm pyupgrade
In case users want to still use it on their own, the upgrade-syn.sh script was
left in the `.travis` dir.
* Speed up actor creation task submission by generating IDs deterministically.
* Revert "Speed up actor creation task submission by generating IDs deterministically."
This reverts commit 175d9587302664916ce9db4071185485da8da041.
* Don't generate actor IDs deterministically yet.
* Factor out ID generation method.
* Treat actor creation like a regular task.
* Small cleanups.
* Change semantics of actor resource handling.
* Bug fix.
* Minor linting
* Bug fix
* Fix jenkins test.
* Fix actor tests
* Some cleanups
* Bug fix
* Fix bug.
* Remove cached actor tasks when a driver is removed.
* Add more info to taskspec in global state API.
* Fix cyclic import bug in tune.
* Fix
* Fix linting.
* Fix linting.
* Don't schedule any tasks (especially actor creaiton tasks) on local schedulers with 0 CPUs.
* Bug fix.
* Add test for 0 CPU case
* Fix linting
* Address comments.
* Fix typos and add comment.
* Add assertion and fix test.
* Expose calls to get and set the actor frontier
* Remove fields used for old checkpointing prototype, change actor_checkpoint_failed -> succeeded
* Prototype for actor checkpointing
* Filter out duplicate tasks on the local scheduler
* Clean up some of the Python checkpointing code
* More cleanups
* Documentation
* cleanup and fix unit test
* Allow remote checkpoint calls through actor handle
* Check whether object is local before reconstructing
* Enable checkpointing for distributed actor handles, refactor tests
* Fix local scheduler tests
* lint
* Address comments
* lint
* Skip tests that fail on new GCS
* style
* Don't put same object twice when setting the actor frontier
* Address Philipp's comments, cleaner fbs naming
* Enable scheduling with custom resource labels.
* Fix.
* Minor fixes and ref counting fix.
* Linting
* Use .data() instead of .c_str().
* Fix linting.
* Fix ResourcesTest.testGPUIDs test by waiting for workers to start up.
* Sleep in test so that all tasks are submitted before any completes.
* Check version info in ray start for non-head nodes.
* Small fix.
* Fix
* Push error to all drivers when worker has version mismatch.
* Linting
* Linting
* Fix
* Unify methods.
* Fix bug.
* Add basic functionality for Cython functions and actors
* Fix up per @pcmoritz comments
* Fixes per @richardliaw comments
* Fixes per @robertnishihara comments
* Forgot double quotes when updating masked_log
* Remove import typing for Python 2 compatibility
* Release GPU resources as soon as an actor exits.
* Add a test.
* Store local_scheduler_id and driver_id in the worker object instead of the actor object.
* adding support for the user-interpretable label(UIR)
* more plumbing for num_uirs further upstream; set to infty when specified on cmd line
* pass default num_uirs for actors; update GlobalStateAPI
* support num_uirs in ray.init()
* local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting
* global scheduler test updated
* Fix bug introduced by rebase.
* Rename UIR -> CustomResource and add test.
* Small changes and use constexpr instead of macros.
* Linting and some renaming.
* Reorder some code.
* Remove cpus_in_use and fix bug.
* Add another test and make a small change.
* Rephrase documentation about feature stability.
* Reconstruct actor state when local schedulers fail.
* Simplify construction of arguments to pass into default_worker.py from local scheduler.
* Remove deprecated ray.actor.
* Simplify actor reconstruction method.
* Fix linting.
* Small fixes.
* 4 space indentation for actor.py.
* 4 space indentation for worker.py.
* 4 space indentation for more files.
* 4 space indentation for some test files.
* Check indentation in Travis.
* 4 space indentation for some rl files.
* Fix failure test.
* Fix multi_node_test.
* 4 space indentation for more files.
* 4 space indentation for remaining files.
* Fixes.
* implement restarting workers after certain number of task executions
* Clean up python code.
* Don't start new worker when an actor disconnects.
* Move wait_for_pid_to_exit to test_utils.py.
* Add test.
* Fix linting errors.
* Fix linting.
* Fix typo.
* Clean up state when drivers exit.
* Remove unnecessary field in ActorMapEntry struct.
* Have monitor release GPU resources in Redis when driver exits.
* Enable multiple drivers in multi-node tests and test driver cleanup.
* Make redis GPU allocation a redis transaction and small cleanups.
* Fix multi-node test.
* Small cleanups.
* Make global scheduler take node_ip_address so it appears in the right place in the client table.
* Cleanups.
* Fix linting and cleanups in local scheduler.
* Fix removed_driver_test.
* Fix bug related to vector -> list.
* Fix linting.
* Cleanup.
* Fix multi node tests.
* Fix jenkins tests.
* Add another multi node test with many drivers.
* Fix linting.
* Make the actor creation notification a flatbuffer message.
* Revert "Make the actor creation notification a flatbuffer message."
This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39.
* Add comment explaining flatbuffer problems.