Commit Graph

1914 Commits

Author SHA1 Message Date
Harrison Feng ca876c1ecb Make sure dashboard link can be clicked directly. (#6683) 2020-01-03 16:17:16 -08:00
Robert Nishihara 80e77f7025 Revert accidental changes to test file. (#6681) 2020-01-03 14:23:45 -08:00
Ujval Misra 5b40408678 [tune] Remove py2.7-specific code (#6665)
* Remove backwards compatability py2.7 code.

* Use exists_ok=True in ray

* nit

* nit

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-03 01:03:13 -08:00
Ujval Misra ca651af1d7 [tune] Async restores and S3/GCP-capable trial FT (#6376)
* Initial commit for asynchronous save/restore

* Set stage for cloud checkpointable trainable.

* Refactor log_sync and sync_client.

* Add durable trainable impl.

* Support delete in cmd based client

* Fix some tests and such

* Cleanup, comments.

* Use upload_dir instead.

* Revert files belonging to other PR in split.

* Pass upload_dir into trainable init.

* Pickle checkpoint at driver, more robust checkpoint_dir discovery.

* Cleanup trainable helper functions, fix tests.

* Addressed comments.

* Fix bugs from cluster testing, add parameterized cluster tests.

* Add trainable util test

* package_ref

* pbt_address

* Fix bug after running pbt example (_save returning dir).

* get cluster tests running, other bug fixes.

* raise_errors

* Fix deleter bug, add durable trainable example.

* Fix cluster test bugs.

* filelock

* save/restore bug fixes

* .

* Working cluster tests.

* Lint, revert to tracking memory checkpoints.

* Documentation, cleanup

* fixinitialsync

* fix_one_test

* Fix cluster test bug

* nit

* lint

* Revert tune md change

* Fix basename bug for directories.

* lint

* fix_tests

* nit_fix

* Add __init__ file.

* Move to utils package

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-02 20:40:53 -08:00
Robert Nishihara 92e44a5dc8 Deprecate redis_address argument in favor of address. (#6654) 2020-01-02 20:18:34 -08:00
Simon Mo 9fe90cdafc Fix async actor recursion limitation (#6672)
* Do not start threadpool when using async

* Turn function_executor into a generator

* Add new test for high concurrency and bump the default

* Set direct call
2020-01-02 19:45:13 -06:00
Robert Nishihara 39a3459886 Remove (object) from class declarations. (#6658) 2020-01-02 17:42:13 -08:00
Sven f1b56fa5ee PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). (#6650)
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).

* Fix LINT line-len errors.

* Fix LINT errors.

* Fix `tf_pg_policy` imports (formerly: `pg_policy`).

* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).

* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
  then built into the Bazel/Travis test suite.

* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.

* Fix remaining import errors for agents/pg/...

* Fix circular dependency in pg imports.

* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
Yunzhi Zhang 8a0a30b5f0 [Dashboard] display actor status and infeasible tasks (#6652)
* expose actor status and protobuf message of infeasible tasks

* move infeasible tasks into actor tree

* add pytest for displaying infeasible tasks info

* fix base64 decoding

* fix race condition after #6629 merged
2020-01-02 14:27:59 -08:00
Eric Liang 895f2727fb Add experimental parallel iterators API (#6644) 2020-01-02 13:45:26 -08:00
Ion 3dddbef6d9 Release cpu blocked (#6611) 2020-01-02 13:43:25 -08:00
Robert Nishihara 9baa002069 Remove deprecated global state. (#6655) 2019-12-31 22:40:47 -08:00
Zhijun Fu 91a98d2295 [rpc] refactor GRPC client (#6637)
* refactor RPC client

* remove unused code

* format

* fix

* resolve comments

* format

* update

* fix

* fix python pb build failure

* lint
2019-12-31 22:28:25 -08:00
Robert Nishihara 480206eef8 Remove some Python 2 compatibility code. (#6624) 2019-12-31 17:14:58 -08:00
Philipp Moritz ecddaafd94 Add actor table to global state API (#6629) 2019-12-31 15:11:59 -08:00
Robert Nishihara d2c6457832 Remove public facing references to --redis-address. (#6631) 2019-12-31 13:21:53 -08:00
Yunzhi Zhang 65acb54553 [Dashboard] Logical view backend for dashboard (#6590) 2019-12-30 13:08:08 -08:00
Philipp Moritz 735f282494 Use 0.9.0.dev0 as the version tag (#6630) 2019-12-30 10:14:07 -08:00
Edward Oakes 2a66529fb7 Add multiprocessing.Pool API (#6194) 2019-12-29 21:40:58 -06:00
Eric Liang e2bc489a18 Port webui nits from original pr that enables it (#6628)
* backport changes

* Update test_webui.py
2019-12-29 19:19:43 -08:00
Mitchell Stern 3e0f07468f Make JSON schema for projects more explicit (#6550) 2019-12-29 16:41:53 -08:00
Eric Liang 7c1e0e5715 Implement wait_local for wait (#6524) 2019-12-28 17:40:49 -08:00
Eric Liang 677004ee3d Add 'ray stat' command for debugging (#6622)
* wip

* wip

* wip

* iterate

* move

* fix thread safety
2019-12-28 14:40:32 -08:00
Robert Nishihara ff82613b66 Fix test_actor.py test_kill. (#6623) 2019-12-27 22:39:17 -08:00
alindkhare a76fadb899 [Serve] Adding BackendConfig (#6541) 2019-12-27 23:34:50 -06:00
Robert Nishihara 96f2f8ff10 Stop testing Python 2.7 and building Python 2.7 wheels. (#6601) 2019-12-27 20:47:49 -08:00
Robert Nishihara 8724e5ffd5 Start WebUI by default. (#6493) 2019-12-27 13:49:07 -08:00
Zhijun Fu 088ce2d1e1 Fix hang on actor creation task failure (#6617) 2019-12-27 10:48:17 -08:00
Eric Liang 46acb02aa4 Fix verbose shutdown error and test_env_with_subprocesses (#6614) 2019-12-26 22:43:39 -08:00
Eric Liang d3db9e9c1e By default, reconstruction should only be enabled for actor creation. (#6613)
* wip

* fix

* fix
2019-12-26 19:57:50 -08:00
zhu-eric 65297e65f0 Experimental Actor Pool (#6055)
* mod_table

* Example fix for gallery

* lint

* nit

* nit

* fix

* gallery

* remove table for now

* training, object store, tune, actors, advanced

* start tf code

* first cut tf

* yapf

* pytorch

* add torch example

* torch

* parallel

* tune

* tuning

* reviewsready

* finetune

* fix

* move_code

* update conf

* compile

* init hyperparameter

* Start images

* overview

* extra

* fix

* works

* update-ps-example

* param_actor

* fix

* examples

* simple

* simplify_pong

* flake8 and run hyperopt

* add comments

* add comments

* add suggestion

* add suggestion

* suggestions

* add suggestion

* add suggestions

* fixed in wrong area

* last edit

* finish changes

* add line

* format

* reset

* tests and docs

* fix tests

* bazelify

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 14:35:10 -08:00
inventormc 0dd8a60679 [tune] Usability errors PBT (#5972)
* update with upstream master

* check for function args in hyperparam_mutations pbt

* fix style for pbt

* remove_checkpoint

* Update pbt.py

* Update pbt.py

* fix

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 14:27:07 -08:00
Zhijun Fu d2bba596ab Fix actor reconstruction with direct call (#6570) 2019-12-26 10:59:50 +08:00
Yuhao Yang be23b3ac41 [sgd] show training result for examples (#6552) 2019-12-26 02:15:43 +01:00
Yuhao Yang df4533c649 [tune] demo exporting trained models in pbt examples (#6533) 2019-12-26 02:14:49 +01:00
Richard Liaw 93e8c85e72 [tune] Avoid duplication in TrialRunner execution (#6598)
* avoid_duplication

* Update python/ray/tune/ray_trial_executor.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2019-12-26 02:13:55 +01:00
Yuhao Yang 8707a721d9 [tune] update params for optimizer in reset_config (#6522)
* reset config update lr

* add default

* Update pbt_dcgan_mnist.py

* Update pbt_convnet_example.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 02:10:09 +01:00
Richard Liaw aa7b861332 [minor][tune] Support Type Hinting for py3 (#6571)
* fullargspec for new pyversion

* fi
2019-12-25 08:15:33 +01:00
Robert Nishihara f89d81896a Fix flaky test_gpu_ids test. (#6579) 2019-12-24 14:26:44 -08:00
Robert Nishihara 2f57391595 Fix bug when failing to import remote functions or actors with args and kwargs. (#6577) 2019-12-24 13:23:48 -08:00
Edward Oakes 6b1a57542e Add actor.__ray_kill__() to terminate actors immediately (#6523) 2019-12-23 23:12:57 -06:00
Yunzhi Zhang bac6f3b61e [Dashboard] Collecting worker stats in node manager and implement webui display in the backend (#6574) 2019-12-22 17:50:23 -08:00
mehrdadn 50fb26de68 Fix FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result. (#6568) 2019-12-22 13:02:34 -08:00
Chaokun Yang 7bbfa85c66 [Streaming] Streaming data transfer java (#6474) 2019-12-22 10:56:05 +08:00
Edward Oakes e50aa99be1 Reference counting for direct call submitted tasks (#6514)
Co-authored-by: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
2019-12-20 17:06:33 -08:00
Eric Liang de22cdb233 Reduce reporter CPU (#6553)
* wip

* remove

* Update ray_constants.py
2019-12-19 22:21:30 -08:00
Eric Liang e556b729c2 [direct call] Fix max_calls interaction with background tasks. (#6536) 2019-12-19 13:48:32 -08:00
Edward Oakes 41fa2e9604 Remove object id translation (#6531) 2019-12-19 12:47:49 -08:00
Simon Mo d807d0bab6 Serve small fixes (#6539)
* Tmp db

* Lint

* Turn on direct call for serve tests
2019-12-18 23:08:59 -08:00
alindkhare d78a1062db [Serve] Pluggable Queueing Policy (#6492) 2019-12-18 21:28:38 -08:00