This PR adds a driver table for the new GCS, which enables cleanup functionality associated with monitoring driver death.
Some testing in `monitor_test.py` is restored, but redis sharding for xray is needed to enable remaining tests.
* Fix one of the stress tests, fix ray.global_state.client_table when called early on.
* Re-enable testWait.
* Convert stress_tests.py to pytest.
* Fix
* Add profile table and store profiling information there.
* Code for dumping timeline.
* Improve color scheme.
* Push timeline events on driver only for raylet.
* Improvements to profiling and timeline visualization
* Some linting
* Small fix.
* Linting
* Propagate node IP address through profiling events.
* Fix test.
* object_id.hex() should return byte string in python 2.
* Include gcs.fbs in node_manager.fbs.
* Remove flatbuffer definition duplication.
* Decode to unicode in Python 3 and bytes in Python 2.
* Minor
* Submit profile events in a batch. Revert some CMake changes.
* Fix
* Workaround test failure.
* Fix linting
* Linting
* Don't return anything from chrome_tracing_dump when filename is provided.
* Remove some redundancy from profile table.
* Linting
* Move TODOs out of docstring.
* Minor
* Fix documentation indentation.
* Add error table to GCS and push error messages through node manager.
* Add type to error data.
* Linting
* Fix failure_test bug.
* Linting.
* Enable one more test.
* Attempt to fix doc building.
* Restructuring
* Fixes
* More fixes.
* Move current_time_ms function into util.h.
* build_credis.sh: use an up-to-date credis commit.
* build_credis.sh: leveldb is updated, so update build cmds for it
* WIP: make monitor.py issue flush; switch gcs client to use credis
* Experimental: enable automatic GCS flushing with configurable policy.
* Fix linux compilation error
* Fix leveldb build
* Use optimized build for credis
* Address comments
* Attempt to fix tests
* Implement global state API for xray.
* Fix object table.
* Fixes for log structure.
* Implement cluster_resources.
* Add driver task to task table.
* Remove python flatbuffers code
* Get some global state API tests running.
* Python linting.
* Fix linting.
* Fix mock modules for doc
* Copy over flatbuffer bindings.
* Fix for tests.
* Linting
* Fix monitor crash.
* Add flake8 to Travis
* Add flake8-comprehensions
[flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that
checks for useless constructions.
* Use generators instead of lists where appropriate
A lot of the builtins can take in generators instead of lists.
This commit applies `flake8-comprehensions` to find them.
* Fix lint error
* Fix some string formatting
The rest can be fixed in another PR
* Fix compound literals syntax
This should probably be merged after #1963.
* dict() -> {}
* Use dict literal syntax
dict(...) -> {...}
* Rewrite nested dicts
* Fix hanging indent
* Add missing import
* Add missing quote
* fmt
* Add missing whitespace
* rm duplicate pip install
This is already installed in another file.
* Fix indent
* move `merge_dicts` into utils
* Bring up to date with `master`
* Add automatic syntax upgrade
* rm pyupgrade
In case users want to still use it on their own, the upgrade-syn.sh script was
left in the `.travis` dir.
* Use pep8 style
The original style file is actually just pep8 style, but with everything
spelled out. It's easier to use the `based_on_style` feature. Any overrides are
clearer that way.
* Improve yapf script
1. Do formatting in parallel
2. Lint RLlib
3. Use .style.yapf file
* Pull out expressions into variables
* Don't format rllib
* Don't allow splits in dicts
* Apply yapf
* Disallow single line if-statements
* Use arithmetic comparison
* Simplify checking for changed files
* Pull out expr into var
* Use set/dict literal syntax
Ran code through [pyupgrade](https://github.com/asottile/pyupgrade). This is
supported in every Python version 2.7+.
* Drop unnecessary string format specification
No need to specify 0,1.. if paramters are passed in order.
* Revert "Drop unnecessary string format specification"
This reverts commit efa5ec85d30ff69f34e5ed93e31343fea7647bcb.
* Undo changes to cloudpickle
Drop use of set literal until cloudpickle uses it.
* Reformat code with YAPF
We need to set up a git pre-push hook to automatically run this stuff.
* Treat actor creation like a regular task.
* Small cleanups.
* Change semantics of actor resource handling.
* Bug fix.
* Minor linting
* Bug fix
* Fix jenkins test.
* Fix actor tests
* Some cleanups
* Bug fix
* Fix bug.
* Remove cached actor tasks when a driver is removed.
* Add more info to taskspec in global state API.
* Fix cyclic import bug in tune.
* Fix
* Fix linting.
* Fix linting.
* Don't schedule any tasks (especially actor creaiton tasks) on local schedulers with 0 CPUs.
* Bug fix.
* Add test for 0 CPU case
* Fix linting
* Address comments.
* Fix typos and add comment.
* Add assertion and fix test.
* Define execution dependencies flatbuffer and add to Redis commands
* Convert TaskSpec to TaskExecutionSpec
* Add execution dependencies to Python bindings
* Submitting actor tasks uses execution dependency API instead of dummy argument
* Fix dependency getters and some cleanup for fetching missing dependencies
* C++ convention
* Make TaskExecutionSpec a C++ class
* Convert local scheduler to use TaskExecutionSpec class
* Convert some pointers to references
* Finish conversion to TaskExecutionSpec class
* fix
* Fix
* Fix memory errors?
* Cast flatbuffers GetSize to size_t
* Fixes
* add more retries in global scheduler unit test
* fix linting and cast fbb.GetSize to size_t
* Style and doc
* Fix linting and simplify from_flatbuf.
* Enable scheduling with custom resource labels.
* Fix.
* Minor fixes and ref counting fix.
* Linting
* Use .data() instead of .c_str().
* Fix linting.
* Fix ResourcesTest.testGPUIDs test by waiting for workers to start up.
* Sleep in test so that all tasks are submitted before any completes.
* Object table lookup returns vector of DBClientID instead of address strings
* Add node IP address to DBClient notification
* DB client cache stores entire DB client, convert addresses to std::string
* get cached db client returns the client
* Expose a call to initialize the redis cache
* Local scheduler filters out dead clients during reconstruction
* Remove node ip address from dbclient, use aux_address for plasma managers
* Get entire db client entry when not found in cache
* Fix common tests
* Fix address in tests
* Push error to driver if driver task did the put
* Address Robert's comments and cleanup
* Remove unused Redis command
* Fix db test
* WIP: removing OL, OI, TT on client exit; no saving yet.
* ray_redis_module.cc: update header comment.
* Cleanup: just the removal.
* Reformat via yapf: use pep8 style instead of google.
* Checkpoint addressing comments (partially)
* Add 'b' marker before strings (py3 compat)
* Add MonitorTest.
* Use `isort` to sort imports.
* Remove some loggings
* Fix flake8 noqa marker runtest.py
* Try to separate tests out to monitor_test.py
* Rework cleanup algorithm: correct logic
* Extend tests to cover multi-shard cases
* Add some small comments and formatting changes.
* Clone catapult and generate static html during setup.
* Include UI files in installation.
* Fix directory to clone catapult to and fix linting.
* Use absolute path.
* Make sure we find a sufficiently new version of python2 when building wheels.
* Copy the trace_viewer_full.html file to the local directory if it is not present.
* Make sure wheels fail to build if UI is not included.
* Embedded timeline
* Yeah
* Fixed arrows not showing up.
* Fixed arrows not showing up, and added check boxes for the kinds of dependencies that should be included in the trace.
* first
* Fixes
* Fixed typo in comments, added more comments. fixed linting.
* Added more comments.
* Formatting.
* fixes
* Fixed state.py linting.
* Fixed ui.py linting errors.
* Fixed linting errors.
* Renamed task dependencies and included instructions for viewing arrows.
* Fixed according to PR comments.
* Fixed bug.
* Undid changes to metadata blocks.
* Fixes according to comments.
* Fixed linting.
* Fixed linting.
* NOQA keyword added to link line.
* adding support for the user-interpretable label(UIR)
* more plumbing for num_uirs further upstream; set to infty when specified on cmd line
* pass default num_uirs for actors; update GlobalStateAPI
* support num_uirs in ray.init()
* local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting
* global scheduler test updated
* Fix bug introduced by rebase.
* Rename UIR -> CustomResource and add test.
* Small changes and use constexpr instead of macros.
* Linting and some renaming.
* Reorder some code.
* Remove cpus_in_use and fix bug.
* Add another test and make a small change.
* Rephrase documentation about feature stability.