Commit Graph

6812 Commits

Author SHA1 Message Date
Kai Yang d8b50a5018 Fix GcsClient resource map (#5171) 2019-07-11 16:05:12 +08:00
Qing Wang f2293243cc [ID Refactor] Shorten the length of JobID to 4 bytes (#5110)
* WIP

* Fix

* Add jobid test

* Fix

* Add python part

* Fix

* Fix tes

* Remove TODOs

* Fix C++ tests

* Lint

* Fix

* Fix exporting functions in multiple ray.init

* Fix java test

* Fix lint

* Fix linting

* Address comments.

* FIx

* Address and fix linting

* Refine and fix

* Fix

* address

* Address comments.

* Fix linting

* Fix

* Address

* Address comments.

* Address

* Address

* Fix

* Fix

* Fix

* Fix lint

* Fix

* Fix linting

* Address comments.

* Fix linting

* Address comments.

* Fix linting

* address comments.

* Fix
2019-07-11 14:25:16 +08:00
Hao Chen 88365d4112 Fix Java MultithreadingTest (#5170) 2019-07-11 13:40:40 +08:00
Kai Yang 43b6513d19 [GCS] Move node resource info from client table to resource table (#5050) 2019-07-11 13:17:19 +08:00
Richard Liaw 691c9733f9 [tune] Document trainable attributes and enable user-checkpoint… (#4868) 2019-07-10 18:51:11 -07:00
Philipp Moritz e6a81d40a5 [stability] Make task result for RemoveTask optional (#5146)
* make task result for RemoveTask optional

* lint

* update

* update

* update

* rename

* lint
2019-07-10 13:33:41 -07:00
Hao Chen 0c34749779 Use bazel disk cache for all CI jobs (#5144) 2019-07-10 22:03:45 +08:00
Richard Liaw 0b540ab492 [tune] Test example checkpointing (#4728) 2019-07-10 01:58:26 -07:00
Joey Jiang e55c8ca165 Fix crash because of the reference to deleted variable in grpc server call (#5158) 2019-07-10 14:06:21 +08:00
Edward Oakes 2b7b7c7547 Add linting pre-push hook (#5154) 2019-07-09 21:49:12 -07:00
Eric Liang 5ab5017c67 [rllib] Fix impala stress test (#5101)
* add copy

* upgrade to tf 1.14

* update

* reduce count to workaround https://github.com/ray-project/ray/issues/5125

* Update impala.py

* placeholder

* comments

* update
2019-07-09 20:22:30 -07:00
Joey Jiang 5733690aa6 Add success and fail callback of grpc sending reply (#5141) 2019-07-09 17:03:57 +08:00
Eric Liang 5aec750107 Add warning/error if object store memory exceeds available memory (#4893)
* exclude

* format

* add warning

* hatch

* reduce mem usage

* reduce object store mem

* set obj mem
2019-07-08 21:37:08 -07:00
Stefan Pantic dfc94ce7bc [rllib]Add entropy coeff decay (#5043) 2019-07-08 18:30:32 -07:00
Daniel Edgecumbe eeb67db861 [autoscaler] Log AWS NodeProvider create_instances (#4998)
* autoscaler: Log on AWS NodeProvider create_instances

* logging
2019-07-08 13:22:26 -07:00
Hao Chen 8a30b93e42 Define common data structures with protobuf. (#5121) 2019-07-08 22:41:37 +08:00
Joey Jiang b4e51c8aa1 Support clang-format whose version is not 7.0 (#5139) 2019-07-08 17:15:09 +08:00
Sam Toyer 7ad854d4c6 [tune] Use traceback.format_tb() (fixes #5135) (#5136) 2019-07-08 01:13:06 -07:00
Joey Jiang 274233962f Remove unused connection file in object manager (#5123) 2019-07-08 10:59:36 +08:00
Eric Liang 893744b3be [rllib] Revert "use make template" which seems to break DQN/Atari (#5134)
* Revert "use make template"

This reverts commit 291e9e0031c6e315fe24e5b4973dea375fe73918.

* debug vars
2019-07-07 19:51:26 -07:00
Morgan Giraud 7e020e7183 [tune] tune.run keep_checkpoints_num (#5117)
* Add missing argument keep_checkpoints_num to tune

* expose keep checkpoints
2019-07-07 17:14:56 -07:00
Edward Oakes 8f53364097 Improve local_mode (#5060) 2019-07-07 17:10:50 -07:00
Eric Liang 932d6b2517 [rllib] Port IMPALA to ModelV2/build_tf_policy (#5130)
* port vtrace

* fix vf

* fix vs

* fix the example

* wip ddpg

* fix tests

* fix tests

* remove ddpg model

* comments

* set vf share layers True by default

* typo

* fix test
2019-07-07 15:06:41 -07:00
Richard Liaw 6a14f1a540 [autoscaler] Small fixes for local cluster usability (#4864) 2019-07-06 21:55:18 -07:00
Richard Liaw 1798d4f077 [autoscaler] Add hard kill and monitor commands (#5082)
* Add hard kill and monitor commands

* better_commands

* Update python/ray/scripts/scripts.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2019-07-06 21:52:55 -07:00
Eric Liang 445bcb29b0 [hotfix] fix backward compat with older yaml libraries 2019-07-06 20:41:28 -07:00
Eric Liang c15ed3ac55 [rllib] Shuffle RNN sequences in PPO as well (#5129)
* shuffle seq

* fix test
2019-07-06 20:40:49 -07:00
Brandon Bertelsen c04b69902c Updates for #5072 (#5091) 2019-07-06 16:05:50 -07:00
Eric Liang 0448847a02 Update protobuf version (#5128) 2019-07-06 15:59:55 -07:00
Aleksei Petrenko 09bde397c9 Multiagent experiment resume (#5102)
* Fixed problem with multiagent experiment resume

* Applied format script

* fix lint
2019-07-06 11:38:17 -07:00
Dušan Josipović e9b88dcbed [wingman -> tune] Add system performance tracking (#4924) 2019-07-06 00:57:35 -07:00
Richard Liaw c3e9d94b18 [tune][minor] Reduce checkpointing frequency (#4859) 2019-07-06 00:54:24 -07:00
Kim Jeong Ju 4b56a5eb27 [tune] missing torch.load in mnist_pytorch_trainable.py (#5103) 2019-07-06 00:14:41 -07:00
Philipp Moritz c5253cc300 Add job table to state API (#5076) 2019-07-06 00:05:48 -07:00
Richard Liaw 53d5a8a45f [tune] Fix sort (#5111)
* fix sort

* fix tune list-experiments

* Update python/ray/tune/tests/test_commands.py
2019-07-05 16:05:10 -07:00
Joey Jiang 4183303a2f Add bazel build options for plasma to use glog (#5108) 2019-07-05 19:00:19 +08:00
Robert Nishihara 9cc4cc6a52 Fail format.sh if yapf/flake8 versions are incorrect. (#5083) 2019-07-04 23:22:01 -07:00
Zhijun Fu 54d5969cea [grpc] Add grpc server to worker (#5054)
* refactor grpc server

* format

* change GetTask() to PushTask()

* change PushTask to AssignTask

* format

* update

* fix test

* format

* Update src/ray/rpc/worker_client.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update BUILD.bazel

* Update src/ray/core_worker/task_execution.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* update

* format

* address comments

* format

* Update src/ray/rpc/worker/worker_server.h

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Update src/ray/protobuf/worker.proto

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* format

* fix

* format
2019-07-04 20:16:42 +08:00
ztangent 41a16c55ef [tune] Fixed bug with joining experiment_path twice. (#5106) 2019-07-03 22:48:07 -07:00
Patrick 1a543a6571 [serve] add missing __init__.py file under serve/utils (#4609)
* bugfix: add missing serve/utils __init__.py file

* Update __init__.py

* lint
2019-07-03 17:27:59 -07:00
Richard Liaw 0dbb6c4911 [tune] PBT perturbing after first iteration (#5097) 2019-07-03 17:27:26 -07:00
Eric Liang 34d054ff19 [rllib] ModelV2 API (#4926) 2019-07-03 15:59:47 -07:00
Kristian Hartikainen 9e0192bc0b [tune] Change the log syncing behavior (#4450)
* Change the log syncing behavior

* fix up abstractions for syncer

* Finished checkpoint syncing

* Code

* Set of changes to get things running

* Fixes for log syncing

* Fix parts

* Lint and other fixes

* fix some test

* Remove extra parsing functionality

* some test fixes

* Fix up cloud syncing

* Another thing to do

* Fix up tests and local sync

Changes LogSync into a mixin, and adds tests for different
functionalities.

* Fix up tests, start on local migration

* fix distributed migrations

* comments

* formatting

* Better checkpoint directory handling

* fix tests

* fix tests

* fix click

* comments

* formatting comments

* formatting and comments

* sync function deprecations

* syncfunction

* Add documentation for Syncing and Uploading

* nit

* BaseSyncer as base for Mixin in edge case

* more docs

* clean up assertions

* validate

* nit

* Update test_cluster.py

* betterdoc

* Update tune-usage.rst

* cleanup

* nit
2019-07-02 20:46:00 -07:00
Stephanie Wang 71d4637b75 [core worker] Refactor CoreWorker member classes (#5062)
* Move store client mutex inside CoreWorkerPlasmaStoreProvider

* Move PlasmaClient inside CoreWorkerStoreProvider

* Remove CoreWorkerObjectInterface's ref to CoreWorker

* Remove WorkerLanguage

* Remove CoreWorkerTaskInterface's ref to CoreWorker

* Remove CoreWorkerTaskExecutionInterface's ref to CoreWorker

* lint

* move comment

* Fix build

* Fix build
2019-07-02 15:30:30 -07:00
Kai Yang 1cf7728f35 [Core worker] Serialize ActorHandle in core worker. Make ActorHandle thread safe. (#5034)
* Serialize ActorHandle in core worker. Make ActorHandle thread safe.

* Address comments

* Address comments

* Address comments

* Address comments

* lint

* Address comments

* Address comments

* Address comments

* Address comments

* Minor update

* Address comments

* lint
2019-07-02 16:48:43 +08:00
Eric Liang 904dcf081d Switch cluster longevity tests to DLAMI, fix ray up verbosity (#5084)
* fix

* add branch commit

* comments

* Update ci/long_running_tests/.gitignore

Co-Authored-By: Robert Nishihara <robertnishihara@gmail.com>
2019-07-02 00:19:05 -07:00
Qing Wang 247f95b3ff Refine RegisterClientRequest message to make it clearer. (#5057)
* transfor driver task id Explicitly

* Refins

* Fix and add comment.

* add more

* Fix

* Fix

* Add comments

* Fix
2019-07-02 14:26:19 +08:00
Philipp Moritz a6a02fccd0 Do not compile redis twice (#5074) 2019-07-01 15:42:54 -07:00
Philipp Moritz 4e82313891 Update to latest arrow (#5011) 2019-06-30 20:36:36 -07:00
Simon Mo 0c4dd3c401 Use bazel disk cache with travis (#5068) 2019-06-30 17:57:48 -07:00