Commit Graph

1660 Commits

Author SHA1 Message Date
Simon Mo fa1214c44a [Serve] First iteration of the serve doc (#5834)
* Address comments

* Lint

* Add py3 warning
2019-10-03 15:14:09 -07:00
Philipp Moritz 0dee225ce1 Make it possible to run ray examples as projects (#5816) 2019-10-03 14:52:37 -07:00
Edward Oakes 972dddd776 [autoscaler] Kubernetes autoscaler backend (#5492)
* Add Kubernetes NodeProvider to autoscaler

* Split off SSHCommandRunner

* Add KubernetesCommandRunner

* Cleanup

* More config options

* Check if auth present

* More auth checks

* Better output

* Always bootstrap config

* All working

* Add k8s-rsync comment

* Clean up manual k8s examples

* Fix up submit.yaml

* Automatically configure permissisons

* Fix get_node_provider arg

* Fix permissions

* Fill in empty auth

* Remove ray-cluster from this PR

* No hard dep on kubernetes library

* Move permissions into autoscaler config

* lint

* Fix indentation

* namespace validation

* Use cluster name tag

* Remove kubernetes from setup.py

* Comment in example configs

* Same default autoscaling config as aws

* Add Kubernetes quickstart

* lint

* Revert changes to submit.yaml (other PR)

* Install kubernetes in travis

* address comments

* Improve autoscaling doc

* kubectl command in setup

* Force use_internal_ips

* comments

* backend env in docs

* Change namespace config

* comments

* comments

* Fix yaml test
2019-10-03 10:17:00 -07:00
Ujval Misra 9df6eda84f [tune] Add error case for member functions passed as stopping c… (#5823) 2019-10-03 09:49:03 -07:00
Si-Yuan 2fb7d7846f Initial implementation of Cython pickle5 support (#5725) 2019-10-03 09:20:26 -07:00
Philipp Moritz 9a71d6ce3a Build dashboard only once in the wheel build and make sure caching is working for wheel builds (#5784)
* build dashboard only once

* update

* debug

* caching?

* update

* update
2019-10-02 16:29:11 -07:00
Stephanie Wang dc80e6be3d Add screen argument (#5808) 2019-10-01 15:18:19 -07:00
Edward Oakes 963bbe8bbd Move profiling to c++ (#5771)
* Move profiling to c++

* comments

* Fix tests

* Start after constructor

* fix comment

* always init logging

* Fix logging

* fix logging issue

* shared_ptr for profiler

* DEBUG -> WARNING

* fix killed_ init

* Fix flaky checkpointing tests

* Fix checkpoint test logic

* Fix exception matching

* timeout exception

* Fix import

* fix build

* use boost::asio

* fix double const

* Properly reset async_wait

* remove SIGINT

* Change error message

* increase timeout

* small nits

* Don't trap on SIGINT

* -v for tune

* Fix test
2019-10-01 10:06:25 -07:00
Eric Liang 81ee887f91 Preserve the original exception type when converting to RayTaskError (#5799) 2019-09-28 17:03:15 -07:00
Eric Liang 493364d3bd [autoscaler] Add unit tests for stopped node caching, fix flaky tests (#5793) 2019-09-27 22:36:09 -07:00
Edward Oakes 86610a30c9 [flaky test] Fix flaky checkpointing tests (#5791)
* Fix flaky checkpointing tests

* Fix checkpoint test logic

* Fix exception matching

* timeout exception

* Fix import

* fix build
2019-09-27 11:03:07 -07:00
Richard Liaw baf85c6665 [tune/sgd] Fix Jenkins (#5765) 2019-09-27 09:59:08 -07:00
Eric Liang b5da32df78 Bump Ray version in documentation to dev5 (#5794) 2019-09-27 00:19:17 -07:00
Philipp Moritz 01d6362472 Serialize StringIO with pickle (#5781) 2019-09-26 12:55:14 -07:00
Eric Liang 5ecb02fb80 Release 0.7.5 updates (#5727) 2019-09-26 10:30:37 -07:00
Robert Nishihara 18ce7bda2b Fix flaky test_actors_and_tasks_with_gpus_version_two test. (#5756) 2019-09-25 11:47:47 -07:00
Edward Oakes d499601bd7 Fix flaky checkpoint tests (#5778) 2019-09-25 10:55:17 -07:00
Ujval Misra a4659a8f8b [tune] Add support for function-based stopping condition (#5754) 2019-09-23 18:39:00 -07:00
Mitchell Stern b03147e7bf Update call to py-spy to conform to new API (#5758) 2019-09-23 14:52:23 -07:00
Edward Oakes 61e5d674be Push driver task in core worker (#5752) 2019-09-23 10:53:55 -05:00
Edward Oakes 62bc30c1cf Validate redis address parameters (#5746)
* Validate redis address params

* Fix comment

* Add check
2019-09-23 10:52:34 -05:00
Mitchell Stern 98dcc1d440 [Dashboard] Add initial version of new dashboard (#5730) 2019-09-23 08:50:40 -07:00
Eric Liang 56ab9a00bb [autoscaler] cache stopped nodes, no screen on attach (#5741) 2019-09-22 17:30:35 -07:00
Philipp Moritz 5f5873b182 [Projects] Start multiple sessions via session start (#5740) 2019-09-22 01:36:23 -07:00
Robert Nishihara 1cfadf032e Properly test Python wheels in Travis. (#5749) 2019-09-21 18:03:10 -07:00
Richard Liaw e00071721a [tune] tf2.0 testing and supporting callables (#5738) 2019-09-21 17:01:14 -07:00
Hersh Godse d17b35494d [tune] Save/Restore for Suggestion Algs (#5719) 2019-09-21 11:11:57 -07:00
Vince Jankovics 7e214fd95e [tune] TensorBoard HParams for TF2.0 (#5678) 2019-09-21 11:06:34 -07:00
Philipp Moritz a6dd794818 [Projects] Fix template path (#5716) 2019-09-16 19:58:54 -07:00
Philipp Moritz e4e1a57ca5 [Projects] Allow named sessions (#5706) 2019-09-16 13:00:46 -07:00
Richard Liaw 2b2eb4debb [tune] Checkpoint and Sync at end (#5699) 2019-09-15 15:58:58 -07:00
Robert Nishihara baac370099 Deprecate old global state API. (#5484)
* Deprecate old global state API.

* Remove unnecessary returns.
2019-09-15 09:13:15 -07:00
Eric Liang 09968a3c55 [revert] Disable monitor error logging to stdout #5692 2019-09-14 22:32:48 -07:00
Edward Oakes a8888c5ff4 [flaky test] Fix test_calling_start_ray_head (#5644) 2019-09-14 22:27:45 -07:00
Robert Nishihara 74a34b736d Call ray.put in ray.init() to speed up first object store access. (#5685) 2019-09-14 21:27:32 -07:00
Simon Mo 1560ace65a Use set comprehensions (#5707) 2019-09-14 15:44:25 -07:00
Edward Oakes a5d7de6aaf [core worker] Python core worker normal task submission (#5566) 2019-09-14 13:02:53 -07:00
Simon Mo 5f88823c49 [Serve] Rewrite Ray.Serve From Scratch (#5562)
* Commit and format files

* address stylistic concerns

* Replcae "Usage" by "Example" in doc

* Rename srv to serve

* Add serve to CI process; Fix 3.5 compat

* Improve determine_tests_to_run.py

* Quick cosmetic for determien_tests

* Address comments

* Address comments

* Address comment

* Fix typos and grammar

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update python/ray/experimental/serve/global_state.py

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Use __init__ for Query and WorkIntent class

* Remove dataclasses dependency

* Rename oid to object_id for clarity

* Rename produce->enqueue_request, consume->dequeue_request

* Address last round of comment
2019-09-13 21:36:56 -07:00
Si-Yuan 4c964c0941 Initial implementation for pickle5 support (#5611) 2019-09-13 17:54:14 -07:00
Simon Mo fc9f03cd96 Fix queue actor init in setup_queue_actor fixture (#5676) 2019-09-13 12:35:44 -07:00
Eric Liang 3ed18d0b59 Fix edge case in autoscaler with poor bin packing (#5702)
* fix edge case

* fix for general case
2019-09-13 11:46:10 -07:00
Stephanie Wang 1d4a11a433 Only use git repo if .git exists (#5701) 2019-09-13 11:34:34 -07:00
Edward Oakes 07c4c6367a [core worker] Python core worker object interface (#5272) 2019-09-12 23:07:46 -07:00
Edward Oakes ee5db5b67f Raise error if space in redis password (#5673) 2019-09-11 20:58:39 -07:00
Eric Liang faeaa34bdd Deflake cluster heartbeat test (#5552) 2019-09-11 12:26:04 -07:00
Eric Liang 2fdefe19b7 Take into account queue length in autoscaling (#5684) 2019-09-11 11:31:35 -07:00
Philipp Moritz 9ce6dd9b88 [Projects] Add "session execute" (#5681) 2019-09-11 00:50:05 -07:00
Hersh Godse 336aef1774 [tune] Save and Restore for bayesopt (#5623) 2019-09-10 13:11:59 -07:00
Simon Mo 147e7d46ec [Flaky tests] FIx test fork (#5671)
* Start testing test_fork

Maybe queue actor takes too long to initialize, that's why we are
seeing "Many python processes started" since most of the python
tasks are blocked on ray.get

* Add a comment
2019-09-09 19:21:20 -07:00
Richard Liaw 0010f54378 Update Cloudpickle (#5643) 2019-09-09 17:17:29 -07:00