Commit Graph

185 Commits

Author SHA1 Message Date
Edward Oakes 5f843cd998 Clean up stress_testing_config.yaml (#6738)
* Clean up stress_testing_config.yaml

* comment
2020-01-07 17:05:07 -06:00
Sven f1b56fa5ee PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). (#6650)
* Unifying the code for PGTrainer/Policy wrt tf vs torch.
Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch).

* Fix LINT line-len errors.

* Fix LINT errors.

* Fix `tf_pg_policy` imports (formerly: `pg_policy`).

* Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer.
Retire `PGAgent` class (use PGTrainer instead).

* - Move PG test into agents/pg/tests directory.
- All test cases will be located near the classes that are tested and
  then built into the Bazel/Travis test suite.

* Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c
the function is not a tf-specific one.

* Fix remaining import errors for agents/pg/...

* Fix circular dependency in pg imports.

* Add pg tests to Jenkins test suite.
2020-01-02 16:08:03 -08:00
mehrdadn f4b29dae9c Perform Bazel install directly in Windows CI (#6653) 2019-12-31 20:48:08 -08:00
Robert Nishihara d2c6457832 Remove public facing references to --redis-address. (#6631) 2019-12-31 13:21:53 -08:00
Philipp Moritz 735f282494 Use 0.9.0.dev0 as the version tag (#6630) 2019-12-30 10:14:07 -08:00
Robert Nishihara 96f2f8ff10 Stop testing Python 2.7 and building Python 2.7 wheels. (#6601) 2019-12-27 20:47:49 -08:00
Robert Nishihara eb0813ea35 Re-enable UI tests for wheels. (#6602) 2019-12-26 22:34:56 -08:00
Philipp Moritz eaee672b7f Revert "Perform Bazel install directly in Windows CI (#6529)" (#6593)
This reverts commit c5f141013b.
2019-12-24 16:39:07 -08:00
micafan 687de41273 [GCS] refactor the GCS Client Node Interface (#6010) 2019-12-24 20:36:37 +08:00
mehrdadn c5f141013b Perform Bazel install directly in Windows CI (#6529) 2019-12-22 16:14:51 -08:00
Chaokun Yang 7bbfa85c66 [Streaming] Streaming data transfer java (#6474) 2019-12-22 10:56:05 +08:00
Simon Mo 26ec500ef9 Implement async get for direct actor call (#6339) 2019-12-18 11:50:21 -08:00
Eric Liang 6725a61bda Release 0.8.0 test logs (#6512) 2019-12-17 15:56:50 -08:00
Eric Liang 1a1324d2a2 Bump version from 0.8.0.dev6 -> 0.9.0.dev (#6508) 2019-12-16 23:57:42 -08:00
Edward Oakes b1e83d83d1 Print summaries for stress tests (#6498) 2019-12-16 14:14:48 -08:00
Mitchell Stern 1531c21dbd [Dashboard] Add remaining features from old dashboard (#6489)
* [Dashboard] Add remaining features from old dashboard

* Fix linting errors

* Set cluster uptime statistic to N/A

* Use proper singular or plural words for workers column

* Ignore .js, .jsx, .ts, .tsx files in check-git-clang-format-output.sh

* Fix bash quote issue
2019-12-16 11:21:18 -08:00
Richard Liaw 5719a05757 [sgd] Add support for multi-model multi-optimizer training (#6317) 2019-12-15 15:19:45 -08:00
Philipp Moritz f5d10eea0b [Projects] Refactor cluster specification (#6488) 2019-12-14 22:43:06 -08:00
Yuhao Yang ad4da17899 [Tune] Add example and tutorial for DCGAN (#6400) 2019-12-13 14:15:44 -08:00
Eric Liang be5dd8eb5e Enable direct calls by default (#6367)
* wip

* add

* timeout fix

* const ref

* comments

* fix

* fix

* Move actor state into actor handle

* comments 2

* enable by default

* temp reorder

* some fixes

* add debug code

* tmp

* fix

* wip

* remove dbg

* fix compile

* fix

* fix check

* remove non direct tests

* Increment ref count before resolving value

* rename

* fix another bug

* tmp

* tmp

* Fix object pinning

* build change

* lint

* ActorManager

* tmp

* ActorManager

* fix test component failures

* Remove old code

* Remove unused

* fix

* fix

* fix resources

* fix advanced

* eric's diff

* blacklist

* blacklist

* cleanup

* annotate

* disable tests for now

* remove

* fix

* fix

* clean up verbosity

* fix test

* fix concurrency test

* Update .travis.yml

* Update .travis.yml

* Update .travis.yml

* split up analysis suite

* split up trial runner suite

* fix detached direct actors

* fix

* split up advanced tesT

* lint

* fix core worker test hang

* fix bad check fail which breaks test_cluster.py in tune

* fix some minor diffs in test_cluster

* less workers

* make less stressful

* split up test

* retry flaky tests

* remove old test flags

* fixes

* lint

* Update worker_pool.cc

* fix race

* fix

* fix bugs in node failure handling

* fix race condition

* fix bugs in node failure handling

* fix race condition

* nits

* fix test

* disable heartbeatS

* disable heartbeatS

* fix

* fix

* use worker id

* fix max fail

* debug exit

* fix merge, and apply [PATCH] fix concurrency test

* [patch] fix core worker test hang

* remove NotifyActorCreation, and return worker on completion of actor creation task

* remove actor diied callback

* Update core_worker.cc

* lint

* use task manager

* fix merge

* fix deadlock

* wip

* merge conflits

* fix

* better sysexit handling

* better sysexit handling

* better sysexit handling

* check id

* better debug

* task failed msg

* task failed msg

* retry failed tasks with delay

* retry failed tasks with delay

* clip deps

* fix

* fix core worker tests

* fix task manager test

* fix all tests

* cleanup

* set to 0 for direct tests

* dont check worker id for ownership rpc

* dont check worker id for ownership rpc

* debug messages

* add comment

* remove debug statements

* nit

* check worker id

* fix test

* owner

* fix tests
2019-12-13 13:58:04 -08:00
Edward Oakes 032e8553c7 use numpy in long-running tests (#6448) 2019-12-11 17:53:30 -08:00
alindkhare 76e678d775 [Serve] Added deadline awareness (#6442)
* [Serve] Added deadline awareness

Added deadline awareness while enqueuing a query
Using Blist sorted-list implementation (ascending order) to get queries according to their specified deadlines. [buffer_queues]
Exposed slo_ms via handle/http request
Added slo example 
The queries in example will be executed in almost the opposite order of which they are fired
Added slo pytest
Added check for slo_ms to not be negative
Included the changes suggested

* Linting Corrections

* Adding the code changes suggested by format.sh

* Added the suggested changes

Added justification for blist
Added blist in travis/ci/install-dependencies.sh

* Fixed linting issues

* Added blist to ray/doc/requirements-doc.txt
2019-12-11 16:41:54 -08:00
Simon Mo c61db84b8d Bump dev6->dev7 for two files not changed yet. (#6428) 2019-12-10 20:58:14 -08:00
Chaokun Yang 6272907a57 [Streaming] Streaming data transfer and python integration (#6185) 2019-12-10 20:33:24 +08:00
Victor Le 4e24c805ee AlphaZero and Ranked reward implementation (#6385) 2019-12-07 12:08:40 -08:00
Edward Oakes f63b64310a Bump version to 0.8.0.dev7 (#6303) 2019-12-05 18:33:54 -08:00
Philipp Moritz a454c815f1 Fix long running stress tests (#6374) 2019-12-05 18:29:41 -08:00
Philipp Moritz dd27bfbb75 Rename .rayproject to ray-project (#6278) 2019-12-05 16:15:42 -08:00
Eric Liang 4c6739476b [rllib] Raise an error if GPUs are enabled but not tf.test.is_gpu_available() (#6365) 2019-12-05 10:13:54 -08:00
Simon Mo 31113aeded Use rayproject repo (#6353) 2019-12-03 22:36:40 -08:00
Eric Liang e5863d7914 Force tune tests to run in direct call mode (#6301)
* force tune direct mode

* force tune

* fix

* Update run_multi_node_tests.sh
2019-11-27 19:58:33 -08:00
Simon Mo dd80c6e6d4 Hotfix make docker images building optional (#6309)
* Make docker build optional

* Fix syntax error
2019-11-27 20:52:21 -06:00
Simon Mo 22b305223a Build Docker Containers for Linux Wheels (#6233) 2019-11-27 17:05:36 -08:00
Edward Oakes 141d667cee Fix bash syntax error in test-wheels.sh (#6290) 2019-11-26 13:15:54 -06:00
Edward Oakes 7f8de61441 [hotfix] Remove python/ray/tests/__init__.py (#6279)
* Remove python/ray/tests/__init__.py for bazel

* Comment out checks
2019-11-25 17:04:20 -08:00
Eric Liang 64a3a7239e Set RAY_FORCE_DIRECT=1 for run_rllib_tests, test_basic (#6171) 2019-11-25 14:12:11 -08:00
Eric Liang 7917bbef78 Set progress report interval for bazel explicitly (#6262)
* set progress internval

* add keep alive

* add keepalive

* remove cat

* smaller time

* squash error

* reduce log spam
2019-11-24 22:37:59 -08:00
Eric Liang 53641f1f74 Move more unit tests to bazel (#6250)
* move more unit tests to bazel

* move to avoid conflict

* fix lint

* fix deps

* seprate

* fix failing tests

* show tests

* ignore mismatch

* try combining bazel runs

* build lint

* remove tests from install

* fix test utils

* better config

* split up

* exclusive

* fix verbosity

* fix tests class

* cleanup

* remove flaky

* fix metrics test

* Update .travis.yml

* no retry flaky

* split up actor

* split basic test

* split up trial runner test

* split stress

* fix basic test

* fix tests

* switch to pytest runner for main

* make microbench not fail

* move load code to py3

* test is no longer package

* bazel to end
2019-11-24 11:43:34 -08:00
Simon Mo 9f0d005ce6 Use jobs 50 (#6255) 2019-11-24 00:32:38 -08:00
Simon Mo f53f576120 Quiet Wget (#6244) 2019-11-22 14:32:14 -08:00
Simon Mo c4132b501b [CI] Add Remote Caching (#6210) 2019-11-21 11:36:36 -08:00
Eric Liang f3f86385d6 Minimal implementation of direct task calls (#6075) 2019-11-12 11:45:28 -08:00
Philipp Moritz ccbcc4bafa Use GRCP and Bazel 1.0 (#6002) 2019-11-08 15:58:28 -08:00
daiyaanarfeen 8f6d73a93a [sgd] Extend distributed pytorch functionality (#5675)
* raysgd

* apply fn

* double quotes

* removed duplicate TimerStat

* removed duplicate find_free_port

* imports in pytorch_trainer

* init doc

* ray.experimental

* remove resize example

* resnet example

* cifar

* Fix up after kwargs

* data_dir and dataloader_workers args

* formatting

* loss

* init

* update code

* lint

* smoketest

* better_configs

* fix

* fix

* fix

* train_loader

* fixdocs

* ok

* ok

* fix

* fix_update

* fix

* fix

* done

* fix

* fix

* fix

* small

* lint

* fix

* fix

* fix_test

* fix

* validate

* fix

* fi
2019-11-05 11:16:46 -08:00
Richard Liaw e94bebb1de [tune] Fix Jenkins tests (#6028) 2019-11-01 16:42:04 -07:00
Simon Mo c8d7065bf3 [CI] Use rerunfailures instead of flaky (#6061)
* Use rerunfailures instead of flaky

* Lint
2019-11-01 13:59:03 -07:00
Philipp Moritz f7455839bf Expose raylet info to dashboard (#6045) 2019-10-31 17:36:59 -07:00
Simon Mo 4c4342c165 Bring back pytest-sugar (#6038)
* Add cloudpickle as doc requirements

* Bring back pytest-sugar

* Revert "Add cloudpickle as doc requirements"

This reverts commit 2206e9e62ee20d93638e115f07a3fc933cbad9a3.
2019-10-28 20:24:28 -07:00
Stephanie Wang eb41c945a1 Add gRPC endpoint to raylet to expose metrics (#6005) 2019-10-26 16:37:39 -07:00
Richard Liaw 48ba484640 [tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support (#5931) 2019-10-18 13:50:42 -07:00