Commit Graph

2114 Commits

Author SHA1 Message Date
Edward Oakes 7b609ca211 Remove instances of 'raise Exception' (#7523) 2020-03-10 17:51:22 -07:00
Stephanie Wang fdb528514b [core] Ref counting for actor handles (#7434)
* tmp

* Move Exit handler into CoreWorker, exit once owner's ref count goes to 0

* fix build

* Remove __ray_terminate__ and add test case for distributed ref counting

* lint

* Remove unused

* Fixes for detached actor, duplicate actor handles

* Remove unused

* Remove creation return ID

* Remove ObjectIDs from python, set references in CoreWorker

* Fix crash

* Fix memory crash

* Fix tests

* fix

* fixes

* fix tests

* fix java build

* fix build

* fix

* check status

* check status
2020-03-10 17:45:07 -07:00
Richard Liaw d192ef0611 [raysgd] Cleanup User API (#7384)
* Init fp16

* fp16 and schedulers

* scheduler linking and fp16

* to fp16

* loss scaling and documentation

* more documentation

* add tests, refactor config

* moredocs

* more docs

* fix logo, add test mode, add fp16 flag

* fix tests

* fix scheduler

* fix apex

* improve safety

* fix tests

* fix tests

* remove pin memory default

* rm

* fix

* Update doc/examples/doc_code/raysgd_torch_signatures.py

* fix

* migrate changes from other PR

* ok thanks

* pass

* signatures

* lint'

* Update python/ray/experimental/sgd/pytorch/utils.py

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* should address most comments

* comments

* fix this ci

* first_pass

* add overrides

* override

* fixing up operators

* format

* sgd

* constants

* rm

* revert

* save

* failures

* fixes

* trainer

* run test

* operator

* code

* op

* ok done

* operator

* sgd test fixes

* ok

* trainer

* format

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update doc/source/raysgd/raysgd_pytorch.rst

* docstring

* dcgan

* doc

* commits

* nit

* testing

* revert

* Start renaming pytorch to torch

* Rename PyTorchTrainer to TorchTrainer

* Rename PyTorch runners to Torch runners

* Finish renaming API

* Rename to torch in tests

* Finish renaming docs + tests

* Run format + fix DeprecationWarning

* fix

* move tests up

* benchmarks

* rename

* remove some args

* better metrics output

* fix up the benchmark

* benchmark-yaml

* horovod-benchmark

* benchmarks

* Remove benchmark code for cleanups

* makedatacreator

* relax

* metrics

* autosetsampler

* profile

* movements

* OK

* smoothen

* fix

* nitdocs

* loss

* comments

* fix

* fix

* runner_tests

* codes

* example

* fix_test

* fix

* tests

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Maksim Smolin <maximsmol@gmail.com>
2020-03-10 08:41:42 -07:00
Anthony Yu 89ec4adb72 [tune] Dragonfly Optimizer (#5955)
* Add sample example

* Copy relevant lines of ask from inherited Optimizer

* Ignore strategy

* Additional changes

* Add DragonflySearch for tune connector for Dragonfly

* Add example and fix small errors

* lint

* Remove skopt references

* Update example based off of Dragonfly changes

* Edit example for final Dragonfly edits

* Formatting and documentation edits

* Add documentation and add to test pipeline

* Address PR comments

* Fix Jenkins test

* Adjust Dragonfly to PR#7366

* Lint

* fix_tests

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-10 08:40:36 -07:00
Eric Liang 90e23a5c43 [iterators] Add duplicate() call and fix broken test case (#7510) 2020-03-09 17:18:52 -07:00
Edward Oakes 4ab80eafb9 Deprecate use_pickle flag (#7474) 2020-03-09 16:03:56 -07:00
Edward Oakes 0c254295b0 Remove experimental.signal API (#7477)
* Remove experimental.signal API

* fix test
2020-03-09 16:03:36 -07:00
Ujval Misra 023d4c02a9 [tune] Prevent deletion of checkpoint from user-initiated resto… (#7501)
* Fix restore bug

* Add test

* Lint

* Indent
2020-03-09 15:53:10 -07:00
Edward Oakes b4e2d5317e Remove experimental.NoReturn (#7475) 2020-03-09 11:09:36 -07:00
Stephanie Wang 95bb0c5357 Upgrade plasma to latest version, use synchronous Seal (#7470)
* Upgrade arrow to master

* fix build

* todo

* lint

* Fix hanging test
2020-03-09 10:30:44 -07:00
Eric Liang a644060daa [rllib] First pass at pipeline implementation of DQN (#7433)
* wip iters

* add test

* speed up

* update docs

* document it

* support serial sampling

* add test

* spacing

* annotate it

* update

* rename to pipeline

* comment

* iter2 wip

* update

* update

* context test

* update

* fix

* fix

* a3c pipeline

* doc

* update

* move timer

* comment

* add piepline test

* fix

* clean up

* document

* iter s

* wip dqn

* wip

* wip

* metrics

* metrics rename

* metrics ctx

* wip

* constants

* add todo

* suppport .union

* wip

* support union

* remove prints

* add todo

* remove auto timer

* fix up

* fix pipeline test

* typing

* fix breakage

* remove bad assert

* wip

* fix multiagent example

* fixapply

* update a3c

* remove a2c pl

* 0 workers

* wip

* wip

* share metrics

* wip

* wip

* doc

* fix weight sync and global var updates

* mode

* fix

* fix

* doc

* fix
2020-03-07 14:47:58 -08:00
Landcold7 beb9b02dbd Add numba test (#7298) (#7487) 2020-03-07 11:12:25 -08:00
Richard Liaw 115468de2c [tune] Repeated evals (#7366)
* easyrepeat

* done

* suggest

* doc

* ok

* commit

* Apply suggestions from code review

Co-Authored-By: Ujval Misra <misraujval@gmail.com>

* Apply suggestions from code review

Co-Authored-By: Ujval Misra <misraujval@gmail.com>

* Apply suggestions from code review

* ok

* docs

Co-authored-by: Ujval Misra <misraujval@gmail.com>
2020-03-07 11:08:23 -08:00
mehrdadn a8bda9b551 Fix incorrect handling of command-lines (#7439) 2020-03-06 15:51:49 -08:00
Sven Mika 510c850651 [RLlib] SAC add discrete action support. (#7320)
* Exploration API (+EpsilonGreedy sub-class).

* Exploration API (+EpsilonGreedy sub-class).

* Cleanup/LINT.

* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).

* Add `error` option to deprecation_warning().

* WIP.

* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.

* WIP.

* LINT.

* WIP.

* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.

* Fix bug in sampler.py in case Policy has self.exploration = None

* Update rllib/agents/dqn/dqn.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* Update rllib/agents/trainer.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* Change requests.

* LINT

* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set

* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).

* Update rllib/evaluation/worker_set.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Review fixes.

* Fix default value for DQN's exploration spec.

* LINT

* Fix recursion bug (wrong parent c'tor).

* Do not pass timestep to get_exploration_info.

* Update tf_policy.py

* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.

* Bug fix tf-action-dist

* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).

* Switch off exploration when getting action probs from off-policy-estimator's policy.

* LINT

* Fix test_checkpoint_restore.py.

* Deprecate all SAC exploration (unused) configs.

* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.

* WIP.

* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).

* WIP.

* Trigger re-test (flaky checkpoint-restore test).

* WIP.

* WIP.

* Add test case for deterministic action sampling in PPO.

* bug fix.

* Added deterministic test cases for different Agents.

* Fix problem with TupleActions in dynamic-tf-policy.

* Separate supported_spaces tests so they can be run separately for easier debugging.

* LINT.

* Fix autoregressive_action_dist.py test case.

* Re-test.

* Fix.

* Remove duplicate py_test rule from bazel.

* LINT.

* WIP.

* WIP.

* SAC fix.

* SAC fix.

* WIP.

* WIP.

* WIP.

* FIX 2 examples tests.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* LINT.

* Renamed test file.

* WIP.

* Add unittest.main.

* Make action_dist_class mandatory.

* fix

* FIX.

* WIP.

* WIP.

* Fix.

* Fix.

* Fix explorations test case (contextlib cannot find its own nullcontext??).

* Force torch to be installed for QMIX.

* LINT.

* Fix determine_tests_to_run.py.

* Fix determine_tests_to_run.py.

* WIP

* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).

* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).

* Rename some stuff.

* Rename some stuff.

* WIP.

* update.

* WIP.

* Gumbel Softmax Dist.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP

* WIP.

* WIP.

* Hypertune.

* Hypertune.

* Hypertune.

* Lock-in.

* Cleanup.

* LINT.

* Fix.

* Update rllib/policy/eager_tf_policy.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/agents/sac/sac_policy.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/agents/sac/sac_policy.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/models/tf/tf_action_dist.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/models/tf/tf_action_dist.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Fix items from review comments.

* Add dm_tree to RLlib dependencies.

* Add dm_tree to RLlib dependencies.

* Fix DQN test cases ((Torch)Categorical).

* Fix wrong pip install.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-03-06 10:37:12 -08:00
Eric Liang 476b5c6196 [Parallel Iterators] Allow for operator chaining after repartition (#7268)
* bug fix repartition

* change add_transform from private to inner

* formatting

* addressing comments

* formatting
2020-03-04 14:42:52 -08:00
Philipp Moritz de0c99876e Fix fate_share not being passed to Redis shards (#7432) 2020-03-04 11:29:45 -08:00
Edward Oakes 0abcca258f Add entries to in-memory store on Put() (#7085) 2020-03-04 10:17:27 -08:00
Philipp Moritz fb1c1e2d27 Revert "Keep cloudpickle up-to-date with the upstream (#7406)" (#7437)
This reverts commit f6883bf725.
2020-03-03 18:36:15 -08:00
Maksim Smolin 3a134c7224 [RaySGD] Rename PyTorch API endpoints to start with Torch (#7425)
* Start renaming pytorch to torch

* Rename PyTorchTrainer to TorchTrainer

* Rename PyTorch runners to Torch runners

* Finish renaming API

* Rename to torch in tests

* Finish renaming docs + tests

* Run format + fix DeprecationWarning

* fix

* move tests up

* rename

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-03 16:44:42 -08:00
Siyuan (Ryans) Zhuang f6883bf725 Keep cloudpickle up-to-date with the upstream (#7406) 2020-03-03 13:52:54 -08:00
Edward Oakes b0bf5450c2 Fix flaky multiprocessing tests (#7413) 2020-03-03 15:07:59 -06:00
ijrsvt fb76092d75 Re-route asyncio plasma code path through raylet instead of direct plasma connection (#7234) 2020-03-03 15:43:46 -05:00
Edward Oakes 04ec599441 Use ray.kill() in multiprocessing.Pool (#7409) 2020-03-03 12:49:13 -06:00
Allen b74eb5fce6 Capture output for commands run by the autoscaler (#7381) 2020-03-03 10:19:21 -08:00
mehrdadn 4d42664b2a Use prctl(PR_SET_PDEATHSIG) on Linux instead of reaper (#7150) 2020-03-03 11:45:42 -06:00
ijrsvt 584645cc7d Fix Experimental Async API (#7391) 2020-03-02 22:24:20 -06:00
Edward Oakes 580b017b43 Fix flaky global GC tests (#7407) 2020-03-02 21:03:01 -06:00
Edward Oakes 9e9f1962c7 Enable test_actor_pool in CI (#7405) 2020-03-02 20:24:36 -06:00
Edward Oakes 2b6f00724a Enable test_joblib in CI (#7404) 2020-03-02 20:03:27 -06:00
Edward Oakes d69fe54f6d Temporarily skip testEndToEndReporting (#7402) 2020-03-02 18:27:34 -06:00
Siyuan (Ryans) Zhuang 0792b5cb93 Fix the numpy ndarray subclass serialization bug (#7392) 2020-03-01 23:05:59 -08:00
Richard Liaw 48cdca843f [raysgd] Custom training operator (#7211) 2020-03-01 21:22:48 -08:00
Eric Liang 3c6b94f3f5 [rllib] Enable performance metrics reporting for RLlib pipelines, add A3C (#7299) 2020-02-28 16:44:17 -08:00
Richard Liaw fb73d51d4d [tune] fix hparams for tbx (#7312)
* fix

* test_hist

* remove unnecessary value check

* pbt

* queue

* skip_for_now

* Apply suggestions from code review
2020-02-28 11:51:56 -08:00
Richard Liaw ca40b0fcc6 [tune][minor] Avoid throwing error when gpu check fails (#7362) 2020-02-28 11:32:44 -08:00
Edward Oakes f321eaec9b Working but not passing test (#7358) 2020-02-28 12:57:28 -06:00
mehrdadn fb0bc7b947 Partially revert "[Core/RLlib] Move log_once from rllib to ray.util. (#7273)" (#7361)
This partially reverts commit 357232d124.

The addition of python/__init__.py broke the build on Windows. However, this is difficult to notice because Bazel doesn't seem to notice this dependency. You first have to go to a commit that fails on this issue, and then try to re-build this commit, so that Bazel actually performs a rebuild.

A useful command-line for triggering the exact build i:

bazel build --compile_one_dependency //:python/ray/_raylet.pyx
2020-02-28 10:27:45 -08:00
Edward Oakes 93fe4b0b58 Change actor.__ray_kill__() to ray.kill(actor) (#7360) 2020-02-28 11:55:13 -06:00
Richard Liaw 3fc162f93c [tune] Add Unit Test for nested PBT + Jenkins (#7324) 2020-02-27 18:17:11 -08:00
mehrdadn 8730996682 Windows changes (#7315) 2020-02-27 15:14:10 -08:00
Edward Oakes ced062319d Decrease test_object_manager put size to avoid OOMs in CI (#7355) 2020-02-27 11:08:10 -08:00
Edward Oakes cbf55d69a6 Remove serialized from_random object ids in tests (#7340) 2020-02-27 11:04:06 -08:00
Edward Oakes bd9411f849 Call TriggerGlobalGC when the plasma store is full (#7337) 2020-02-27 11:01:49 -08:00
Sven Mika 357232d124 [Core/RLlib] Move log_once from rllib to ray.util. (#7273)
* Move log_once from rllib to tune.

* Move log_once from rllib to tune.

* LINT.

* Move to ray.util.debug.
2020-02-27 10:40:44 -08:00
Edward Oakes d9027acaf2 Deprecate non-direct-call API (#7336) 2020-02-27 10:37:23 -08:00
Edward Oakes 55ccfb6089 Fix asyncio actor race condition (#7335) 2020-02-27 10:16:04 -08:00
Edward Oakes ee0f71e398 Add __commit__ field to ray package in wheels (#7305) 2020-02-26 17:54:22 -08:00
Edward Oakes 2ad9bc5684 Move plasma retry logic into plasma store provider (#7328) 2020-02-26 16:57:02 -08:00
Eric Liang b310661338 Add internal_api.global_gc() method, which triggers gc.collect() on all workers (#7327) 2020-02-26 14:09:29 -08:00