Commit Graph

1721 Commits

Author SHA1 Message Date
Edward Oakes f8a6ed7832 Spawn processes in background sessions (#6008)
Allows us to properly handle KeyboardInterrupts in interactive python interpreters.
2019-10-25 13:01:35 -07:00
Edward Oakes 1ce521a7f3 Remove task context from python worker (#5987)
Removes duplicated state between the python and C++ workers. Also cleans up the serialization codepaths a bit.
2019-10-25 07:38:33 -07:00
Ujval Misra cf16b2f0c4 Add timesteps and remove ID from progress output (#5999) 2019-10-25 00:48:42 -07:00
Eric Liang 4edae7ea2b Speed up task submissions a bit (#5992) 2019-10-25 00:10:37 -07:00
Edward Oakes 6f27d881bd Fix core worker shutdown errors (#6004) 2019-10-24 22:29:05 -07:00
Edward Oakes 71a2f4c63d fix comment (#6006) 2019-10-24 18:07:49 -07:00
Edward Oakes c73fdb7425 Ignore errors in ObjectID.__dealloc__ (#5997) 2019-10-24 16:48:47 -07:00
Philipp Moritz 09d05bb3fa Reduce actor submission python overhead (#5949) 2019-10-23 00:11:32 -07:00
Edward Oakes 02931e08f3 [core worker] Python core worker task execution (#5783)
Executes tasks via the event loop in the C++ core worker. Also properly handles signals (including KeyboardInterrupt), so ctrl-C in a python interactive shell works now (if connecting to an existing cluster).
2019-10-22 20:15:59 -07:00
Siyuan (Ryans) Zhuang 95241f6686 Fix the incorrect serialization behavior with pickle (#5960) 2019-10-22 18:08:36 -07:00
Richard Liaw 81dd0dfb0a [tune] fix conditional identifier (#5971)
* fix conditional identifier

* fix

* doc
2019-10-22 02:00:49 -07:00
Richard Liaw 252a5d13ed [sgd/tune][minor] more tf ports (#5953) 2019-10-21 16:46:16 -07:00
Mitchell Stern 235dec8aa3 [Dashboard] Remove token authentication from dashboard (#5888) 2019-10-21 12:48:48 -07:00
Richard Liaw 26a724c5e6 [core] Support kwargs and positionals in Ray remote calls (#5606) 2019-10-20 22:40:54 -07:00
Edward Oakes fc56872012 Send active object IDs to the raylet (#5803)
* Send active object IDs to the raylet

* comment

* comments

* dedup

* signed int in config

* comments

* Remove object ID from monitor

* Fix test

* re-add check

* fix cast

* check if core worker

* Add comment

* Reservoir sampling

* Fix lint

* Pointer return

* tmp

* Fix merge

* Initialize object ids properly

* Fix lint
2019-10-20 22:05:28 -07:00
Simon Mo 6b36ef1138 [Serve] Ensure strict traffic splitting (#5929)
* [Serve] Ensure strict traffic splitting

* Fix test
2019-10-20 20:18:14 -07:00
Stephanie Wang bc4a0de4da Fix multiple drivers for named actors and add test (#5956) 2019-10-20 16:04:21 -07:00
Richard Liaw 74852c80cb [docs] Improve more serialization Errors (#5658) 2019-10-20 14:06:00 -07:00
Richard Liaw 91acecc9f9 [tune][minor] gpu warning (#5948)
* gpu

* formaat

* defaults

* format_and_check

* better registration

* fix

* fix

* trial

* foramt

* tune
2019-10-19 17:09:48 -07:00
Philipp Moritz d23696de17 Introduce flag to use pickle for serialization (#5805) 2019-10-18 22:29:36 -07:00
Philipp Moritz 29eee7f970 Forward multiple ports for autoscaler (#5893) 2019-10-18 16:50:46 -07:00
Richard Liaw 48ba484640 [tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support (#5931) 2019-10-18 13:50:42 -07:00
Stephanie Wang 697f765efc Refactor CoreWorker to remove TaskInterface (#5924)
* Remove TaskInterface

* Remove Status return value

* Remove CActorHandle, some return values, TaskSubmitter

* lint

* doc

* doc

* fix build

* lint

* Return Status, guarded by annotation, fail tasks for RECONSTRUCTING actors

* fix

* move annotation

* revert

* Fix core worker test

* nits
2019-10-18 00:03:57 -04:00
Stephanie Wang 3ac8592dcf Remove actor handle IDs (#5889)
* Remove actor handle ID from main ActorHandle constructor

* Set the actor caller ID when calling submit task instead of in the actor handle

* Remove ActorHandle::Fork, remove actor handle ID from protobuf

* Make inner actor handle const, remove new_actor_handles

* Move caller ID into the common task spec, start refactoring raylet

* Some fixes for forking actor handles

* Store ActorHandle state in CoreWorker, only expose actor ID to Python

* Remove some unused fields

* lint

* doc

* fix merge

* Remove ActorHandleID from python/cpp

* doc

* Fix core worker test

* Move actor table subscription to CoreWorker, reset actor handles on actor failure

* lint

* Remove GCS client from direct actor

* fix tests

* Fix

* Fix tests for raylet codepath

* Fix local mode

* Fix multithreaded test

* Fix AsyncSubscribe issue...

* doc

* fix serve

* Revert bazel
2019-10-17 12:36:34 -04:00
Philipp Moritz 32b2907457 Update max resource label and give better error message (#5916) 2019-10-16 22:37:01 -07:00
Peter Schafhalter 6c11b534c8 [Autoscaler] Update AWS Deep Learning AMI to version 24.3 (#5932) 2019-10-16 16:50:54 -07:00
Richard Liaw 9f23620412 [tune] tf2.0 mnist example (#5898)
* tfmnistexample

* tfmnist

* add_to_ci

* format

* exampledownlaod

* fix
2019-10-15 22:25:01 -07:00
Eric Liang 6843a01a7f Automatically create custom node id resource (#5882)
* node id

* comment

* comments

* fix tests
2019-10-15 21:31:11 -07:00
Richard Liaw c52bb0621d [tune] Support TF2.0 on Keras Callback (#5912) 2019-10-15 10:49:50 -07:00
Eric Liang 69d5c1b53a remove evil redirects (#5919) 2019-10-14 19:41:04 -07:00
Camille Couturier 320cba313f [tune] Explicitly set scheduler in run() (#5871)
* Explicitely set scheduler in run()

* Better formatting/indentation (after running format.sh)

* Remove accidental paste in parameters definitions.

* format
2019-10-14 15:44:59 -07:00
Philipp Moritz 8fd23c0c3f Add back TensorFlow test (#5885) 2019-10-14 11:26:02 -07:00
Richard Liaw 20c0cdee4f [autoscaler] Worker-Head termination + Better Scale-up message (#5909) 2019-10-14 10:37:50 -07:00
Edward Oakes abbfe7392f Bump dev version to 0.8.0.dev6 (#5906) 2019-10-14 11:36:13 +01:00
Richard Liaw 1650f7b174 [tune] Remove TF MNIST example + add TrialRunner hook to execut… (#5868)
* remove test

* add trial runner

* remvoerestore

* Remove other mnist examples

* tunetest

* revert

* v1

* Revert "v1"

This reverts commit c8bddaf2db7a8270c43c02021cac0e75df15ed20.

* Revert "revert"

This reverts commit b58f56884a0c288d3a6f997d149ab4d496ddd7a3.

* errors

* format
2019-10-13 20:33:56 -07:00
Richard Liaw 52e5c9b22d [tune] CPU-Only Head Node support (#5900)
* trialqueue

* add tests
2019-10-13 20:31:42 -07:00
Eric Liang 2cbc67f3d5 Fix test_dying_worker_get (#5908) 2019-10-13 18:06:28 -07:00
Richard Liaw 0f24509c30 [autoscaler] uptime redirect fix (#5907)
* small change

* comment
2019-10-13 23:25:15 +01:00
Edward Oakes 6eaa8e31fa [autoscaler] Revert to double-spawning updater threads (#5903)
* [autoscaler] Revert to double-spawning threads

* Use log prefix

* add comment
2019-10-13 20:00:06 +01:00
Simon Mo 97a786cf11 [Serve] Remove handle passing in tail recursion (#5894)
* Remove handle pass in tail recursion

* Quick fix

* Fix worker timeout issue
2019-10-12 20:13:20 -07:00
Eric Liang 0e8c3c0346 Don't wrap RayError with RayTaskError (#5870) 2019-10-11 11:00:08 -07:00
Edward Oakes 779f91523b [autoscaler] Fix quoting (#5891) 2019-10-11 00:40:26 -07:00
Simon Mo 4b99cb429e [Serve] Hotfix: Fix actor handle hashing in metric monitoring (#5886) 2019-10-11 00:31:42 -07:00
Robert Nishihara 523c764c25 Python 2 compatibility. (#5887) 2019-10-10 19:09:25 -07:00
Eric Liang c3b2ae26c5 Fix str of RayTaskError (#5878)
* fix key error

* fix
2019-10-10 16:53:18 -07:00
Mitchell Stern 195ca43e9c [Dashboard] Improve handling of logs and errors in dashboard backend (#5857)
* Improve handling of logs and errors in dashboard backend

* Update nested dict comprehension for clarity
2019-10-10 11:59:54 -07:00
Eric Liang 1a8ac3db46 Implement fair task queueing to prevent task starvation (#5851)
* initial commit

* lint

* clarify

* add feature flag

* comment

* add timeout to test

* fix print

* comment

* use id for scheduling class

* lint

* dad warn

* flake
2019-10-08 21:04:25 -07:00
Richard Liaw 1181924077 [tune][minor] formatting examples, fix travis (#5869)
* formatting

* formatting
2019-10-08 17:58:43 -07:00
Ujval Misra a851d7eb87 [tune] Readable trial progress output (#5822)
* Cleaner, tabulated progress output.

* Minor HTML changes, trial ID instead of name

* Revert basic variant changes

* Cleanup, address richard's comments, add progress_reporter.py

* Add tabulate dependency

* Added more info to table, auto-hide columns with no data.

* lint

* Address comments

* Replace experiment tag w/ trial ID

* Fixed tests.

* Fixed test

* Added requirement

* Fix formatting
2019-10-08 16:38:39 -07:00
Philipp Moritz 24b79fd0a6 temporarily remove tensorflow test (#5866) 2019-10-08 14:13:54 -07:00