Commit Graph

1045 Commits

Author SHA1 Message Date
Edward Oakes 888f0a2c60 [serve] Use ray.experimental.metrics (#10185) 2020-08-19 13:03:22 -05:00
Arya Irani f733d2648b [docs] fix typo in deployment.rst (#10074)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-18 00:05:18 -07:00
Richard Liaw 927a073226 [tune] Update node syncing documentation (#10126) 2020-08-17 18:08:27 -07:00
Sven Mika fe0bdb23ff [RLlib] Attention Net/Transformers docs improvement. 2020-08-17 13:07:17 -07:00
Ian Rodney a079f46c25 [autoscaler]/[docker] Cleanup YAMLs & Use RAY docker images (#10108) 2020-08-17 09:49:28 -07:00
krfricke 8f0f7371a0 [tune] Added Kubernetes syncer and sync client (#10097)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-16 14:09:28 -07:00
Amog Kamsetty f87a4aa45d [Tune] Pbt Function API (#9958)
* adding function convnet example

* add unit test

* update test

* update example

* wip

* move error from experiment to tune

* wip

* Fix checkpoint deletion

* updating code

* adding smoke test

* updating pbt guide

* formatting

* fix build

* add best checkpoint analysis util

* update test

* add comments

* remove class api

* fix example

* add setup and teardown to tests

* formatting

* Update python/ray/tune/tests/test_trial_scheduler_pbt.py

Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-14 17:52:30 -07:00
Sven Mika 66d204e078 [RLlib] Model documentation enhancements. (#10011) 2020-08-13 13:36:40 +02:00
krfricke 16486a8df3 [tune] Add OptunaSearcher wrapper around Optuna samplers (#10044)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-12 16:13:22 -07:00
Eric Liang 7e3e4cd321 [rllib] Execution plan API documentation (#10000)
* wip

* updte

* comments
2020-08-11 23:58:41 -07:00
Richard Liaw 5560272556 [cli] install nightly wheels via ray install-nightly (#10054) 2020-08-11 20:08:22 -07:00
yncxcw 32cd94b750 [Core] Do not convert gpu id to int (#9744)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-11 12:09:46 -07:00
Amog Kamsetty 4f8fef134e [Tune] Remove checkpoint_at_end from tune+serve docs (#10034) 2020-08-11 09:05:57 -07:00
PidgeyBE 6ad2fc4831 [autoscaler] Service and Ingress per worker pod (#9359) 2020-08-10 14:13:52 -05:00
SangBin Cho 39088ab6f2 [Stats] Metrics Export User Interface Part 2 (Prometheus Service Discovery) (#9970)
* In progress.

* In Progress.

* Finish the working version.

* Write a documentation.

* Addressed code review.

* Fix lint error.

* Lint.

* Addressed code review. Make test less flaky.

* Use a random port for ray start.

* Modify doc.

* Make write atomic.
2020-08-07 21:59:24 -07:00
Barak Michener 8e76796fd0 ci: Redo format.sh --all script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00
krfricke dee3322ab0 [tune] Ray Tune + Serve end-to-end integration example (#9908)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-07 15:32:49 -07:00
krfricke 0ef8224446 [tune] PBT replay utility scheduler (#9953)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-07 12:41:49 -07:00
Eric Liang 668f555755 [rllib] Clean up outdated docs #9915 2020-08-06 18:29:04 -07:00
SangBin Cho ec2f1a225e [Stats] Metrics Export User Interface Part 1 (#9913)
* Metrics export port expose done.

* Support exposing metrics port + metrics agent service discovery through ray.nodes()

* Formatting.

* Added a doc.

* Linting.

* Change the location of metrics agent port.

* Addressed code review.

* Addressed code review.
2020-08-06 16:16:29 -07:00
Max Fitton 538ad04e96 [Dashboard] Update ActorState in dashboard to support new actor states (#9855)
* Update ActorState in dashboard to support new actor states

* Update dashboard documentation for new states

* Add missing state to doc

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-08-05 10:35:18 -07:00
Amog Kamsetty 5af7d24f66 [Tune] Transformer blog example (#9789)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-04 22:05:01 -07:00
Ameer Haj Ali 65a2886b0a Docs For on Prem Cluster manager (#9873) 2020-08-04 11:31:09 -07:00
kisuke95 28b1f7710c [Core] Error info pubsub (Remove ray.errors API) (#9665) 2020-08-04 14:04:29 +08:00
Richard Liaw c6404e8cf6 [tune] Search alg checkpointing during training (#9803)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
2020-08-03 15:07:31 -07:00
krfricke c741d1cf9c [tune] stdout/stderr logging redirection (#9817)
* Add `log_to_file` parameter, pass to Trainable config, redirect stdout/stderr.

* Add logging handler to root ray logger

* Added test for `log_to_file` parameter

* Added logs, reuse test

* Revert debug change

* Update logdir on reset, flush streams after each train() step

* Remove magic keys from visible config

Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-03 11:18:34 -07:00
Hao Chen 6fb6bd3e61 Refine Java "Ray Core Walkthrough" doc (#9836) 2020-07-31 15:35:43 +08:00
krfricke 619e44e54a [tune] Added WandbLogger (#9725)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-30 13:09:03 -07:00
Richard Liaw 0c3b9ebeef [tune/sgd] Document func_trainable and add checkpoint context (#9739)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-07-30 09:46:37 -07:00
Hao Chen 260bc52254 Java doc: "Ray Core Walkthrough" page (#8595) 2020-07-30 11:13:38 +08:00
Bill Chambers 067c2752f8 [TUNE] Tune Docs re-organization (#9600)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-07-29 11:22:44 -07:00
Bill Chambers 2e9d748100 [Cluster Launcher] Re Org the cluster launcher pages. (#9687) 2020-07-27 13:47:06 -07:00
Justin Terry 0d67602051 Update rllib-algorithms.rst (#9640) 2020-07-24 19:35:28 -07:00
Bill Chambers 22d446bc2b [Serve] Fix Formatting, stale docs (#9617) 2020-07-24 13:34:32 -07:00
Amog Kamsetty 03709d67cb [Tune Docs] Logging doc fix (#9691) 2020-07-24 11:20:52 -07:00
Richard Liaw a49eb1d168 [tune] survey (#9670) 2020-07-24 11:08:05 -07:00
Dean Wampler 5242b3b1a2 Refinements to the Serve documentation (#9587)
Co-authored-by: Dean Wampler <dean@concurrentthought.com>
2020-07-24 10:46:28 -07:00
Stephanie Wang f2705e2c73 [core] Enable object reconstruction for retryable actor tasks (#9557)
* Test actor plasma reconstruction

* Allow resubmission of actor tasks

* doc

* Test for actor constructor

* Kill PID before removing node

* Kill pid before node
2020-07-23 21:15:12 -07:00
Robert Nishihara 06c3518aa1 Drop support for Python 3.5. (#9622)
* Drop support for Python 3.5.

* Update setup.py
2020-07-23 19:26:06 -07:00
Simon Mo d8fd74d528 [Serve] Document Metric Infrastructure (#9389) 2020-07-21 14:52:18 -07:00
mehrdadn c5cde65bc6 Add bazel to the PATH in setup.py (#9590)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-21 13:35:29 -07:00
Robert Nishihara 1fa305cf8b [doc] [minor] Make API docs easier to find. (#9604) 2020-07-21 11:40:06 -07:00
Max Fitton 051973ad23 Add dashboard dependencies to default ray installation (#9447) 2020-07-20 12:53:08 -05:00
Stephanie Wang b351d13940 [core] Add flag to enable object reconstruction during ray start (#9488)
* Add flag

* doc

* Fix tests
2020-07-17 10:13:14 -07:00
Sven Mika 78dfed2683 [RLlib] Issue 8384: QMIX doesn't learn anything. (#9527) 2020-07-17 12:14:34 +02:00
krfricke 5a40299d42 [tune] extend PTL template (GPU, typing fixes, tensorboard) (#9451)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-15 10:30:20 -07:00
krfricke deba082cb4 [tune] PyTorch CIFAR10 example (#9338)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-13 23:16:05 -07:00
Richard Liaw a567f7977c [tune] Put examples under proper version control (#9427)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
2020-07-13 18:01:10 -07:00
Richard Liaw 7abf7a0109 [docs] Render ActorPool documentation, etc (#9433) 2020-07-13 17:59:22 -07:00
Hao Chen d49dadf891 Change Python's ObjectID to ObjectRef (#9353) 2020-07-10 17:49:04 +08:00