48 Commits

Author SHA1 Message Date
Rohit Rawat c1f8d22a50 Fix examples Dockerfile (#10908)
Co-authored-by: Rohit Rawat <rohit.rawat@hpe.com>
2020-09-19 11:05:30 -07:00
Ian Rodney 5d4d67c47d [docker] Mirror Functionality of CI scripts & Fix docs (#10349)
* first-pass

* add back build examples

* remove unnecessary test

* add gcc and more formatting

* doc fixing

* small fixes
2020-08-31 10:57:17 -07:00
Simon Mo f1ede1099f [Hotfix] Pin opencv-python-headless==4.3.0.36 (#10049) 2020-08-11 15:58:18 -07:00
Simon Mo c218f2eff6 [docker] Build docker in Travis PR & Fix image build failing (#9787)
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
2020-08-03 16:37:15 -07:00
Richard Liaw 9f3e9e7e9f [tune] Add more intensive tests (#7667)
* make_heavier_tests

* help
2020-04-20 11:14:44 -07:00
Servon 5c274fe631 [Tune] Add ZOOpt search algorithm (#7960)
* add zoopt

* add zoopt search algo

* add zoopt

* fix zoopt

* add zoopt requirements

* fix zoopt

* remove generated guides

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-04-15 21:13:29 -07:00
Simon Mo 59867dad75 Move Jenkins test to Github action (#7342) 2020-04-09 10:27:19 -07:00
Anthony Yu 89ec4adb72 [tune] Dragonfly Optimizer (#5955)
* Add sample example

* Copy relevant lines of ask from inherited Optimizer

* Ignore strategy

* Additional changes

* Add DragonflySearch for tune connector for Dragonfly

* Add example and fix small errors

* lint

* Remove skopt references

* Update example based off of Dragonfly changes

* Edit example for final Dragonfly edits

* Formatting and documentation edits

* Add documentation and add to test pipeline

* Address PR comments

* Fix Jenkins test

* Adjust Dragonfly to PR#7366

* Lint

* fix_tests

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-10 08:40:36 -07:00
Sven Mika bc637a2546 [Tune Jenkins tests] Add dm_tree to docker. (#7500)
* Fix.

* Rollback.

* Add dm_tree to docker examples and tune_test containers.
2020-03-07 23:16:00 -08:00
Richard Liaw 8a9bd18606 [tune] Remove keras dependency (#6827) 2020-01-18 23:24:42 -08:00
Richard Liaw 62cbc043b4 [tune] tbx logger (#6133)
* tbx

* add_hparams

* fix_hparams

* ok

* ok

* fix

* ok

* fix
2019-11-15 08:45:44 -08:00
daiyaanarfeen 8f6d73a93a [sgd] Extend distributed pytorch functionality (#5675)
* raysgd

* apply fn

* double quotes

* removed duplicate TimerStat

* removed duplicate find_free_port

* imports in pytorch_trainer

* init doc

* ray.experimental

* remove resize example

* resnet example

* cifar

* Fix up after kwargs

* data_dir and dataloader_workers args

* formatting

* loss

* init

* update code

* lint

* smoketest

* better_configs

* fix

* fix

* fix

* train_loader

* fixdocs

* ok

* ok

* fix

* fix_update

* fix

* fix

* done

* fix

* fix

* fix

* small

* lint

* fix

* fix

* fix_test

* fix

* validate

* fix

* fi
2019-11-05 11:16:46 -08:00
Richard Liaw e94bebb1de [tune] Fix Jenkins tests (#6028) 2019-11-01 16:42:04 -07:00
Richard Liaw 48ba484640 [tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support (#5931) 2019-10-18 13:50:42 -07:00
Richard Liaw e54c487d18 [hotfix] Docker (#5809)
* configspace

* reorder
2019-09-30 16:39:00 -07:00
Richard Liaw baf85c6665 [tune/sgd] Fix Jenkins (#5765) 2019-09-27 09:59:08 -07:00
Richard Liaw cdc9227f1b [tune] ASHA xgboost and lightgbm examples (#5500) 2019-08-22 10:37:59 -07:00
Richard Liaw d7b309223b [tune] MLFlow Logger (#5438) 2019-08-14 15:58:18 -07:00
Lisa Dunlap b7d0733362 [tune] Implement BOHB (#5382) 2019-08-13 12:32:07 -07:00
Kristian Hartikainen 13fb9fe3db [rllib] Feature/soft actor critic v2 (#5328)
* Add base for Soft Actor-Critic

* Pick changes from old SAC branch

* Update sac.py

* First implementation of sac model

* Remove unnecessary SAC imports

* Prune unnecessary noise and exploration code

* Implement SAC model and use that in SAC policy

* runs but doesn't learn

* clear state

* fix batch size

* Add missing alpha grads and vars

* -200 by 2k timesteps

* doc

* lazy squash

* one file

* ignore tfp

* revert done
2019-08-01 23:37:36 -07:00
Richard Liaw bd8aceb896 [ci] Change Jenkins to py3 (#5022)
* conda3

* integration

* add nevergrad, remotedata

* pytest 0.3.1

* otherdockers

* setup

* tune
2019-06-24 21:50:37 -07:00
Philipp Moritz 2e342ef71f Fix tensorflow-1.14 installation in jenkins (#5007) 2019-06-21 11:04:40 -07:00
Robert Nishihara 4c80177d6f Unpin gym in Python 2 since gym 0.12 was released. (#4291) 2019-03-07 15:59:30 -08:00
Philipp Moritz ba52caff37 Make Bazel the default build system (#3898) 2019-02-23 11:58:59 -08:00
Adi Zimmerman dac1969647 [tune] Add Nevergrad to Tune (#3985) 2019-02-12 11:00:04 -08:00
Adi Zimmerman 9797028a91 [tune] Add scikit-optimize to Tune (#3924) 2019-02-11 17:06:02 -08:00
Robert Nishihara a654152f9c Pin gym version in Python 2 tests. (#3973) 2019-02-06 23:56:14 -08:00
Andrew Tan 8323419a6d [tune] Add SigOpt Integration (#3844) 2019-02-03 18:23:57 -08:00
Peter Schafhalter 62a0a7bdc7 [tune] Add BayesOpt (#3864)
Adds BayesOpt as a Tune suggestion algorithm.
2019-01-31 16:54:17 -08:00
Eric Liang 32473cf22e [rllib] Basic Offline Data IO API (#3473) 2018-12-12 13:57:48 -08:00
Richard Liaw 784a6399b0 [tune] Node Fault Tolerance (#3238)
This PR introduces single-node fault tolerance for Tune.

## Previous behavior:
 - Actors will be restarted without checking if resources are available. This can lead to problems if we lose resources.

## New behavior:
 - RUNNING trials will be resumed on another node on a best effort basis (meaning they will run if resources available). 
 - If the cluster is saturated, RUNNING trials on that failed node will become PENDING and queued.
 - During recovery, TrialSchedulers and SearchAlgorithms should receive notification of this (via `trial_runner.stop_trial`) so that they don’t wait/block for a trial that isn’t running.


Remaining questions:
 -  Should `last_result` be consistent during restore?
Yes; but not for earlier trials (trials that are yet to be checkpointed).

 - Waiting for some PRs to merge first (#3239)

Closes #2851.
2018-11-21 12:38:16 -08:00
Richard Liaw f9b58d7b02 [tune] Tweaks to Trainable and Verbosity (#2889) 2018-10-11 23:42:13 -07:00
Richard Liaw 8e8c733696 [tune] Fix Categorical Space + Add Keras Example (#2401)
Previously did not properly resolve categorical variables for HyperOpt.
2018-07-17 23:52:52 +02:00
Richard Liaw 0048e77093 [rllib] RLlib CLI (#2375) 2018-07-12 19:12:04 +02:00
Alok Singh fd234e3171 [rllib] Fix A3C PyTorch implementation (#2036)
* Use F.softmax instead of a pointless network layer

Stateless functions should not be network layers.

* Use correct pytorch functions

* Rename argument name to out_size

Matches in_size and makes more sense.

* Fix shapes of tensors

Advantages and rewards both should be scalars, and therefore a list of them
should be 1D.

* Fmt

* replace deprecated function

* rm unnecessary Variable wrapper

* rm all use of torch Variables

Torch does this for us now.

* Ensure that values are flat list

* Fix shape error in conv nets

* fmt

* Fix shape errors

Reshaping the action before stepping in the env fixes a few errors.

* Add TODO

* Use correct filter size

Works when `self.config['model']['channel_major'] = True`.

* Add missing channel major

* Revert reshape of action

This should be handled by the agent or at least in a cleaner way that doesn't
break existing envs.

* Squeeze action

* Squeeze actions along first dimension

This should deal with some cases such as cartpole where actions are scalars
while leaving alone cases where actions are arrays (some robotics tasks).

* try adding pytorch tests

* typo

* fixup docker messages

* Fix A3C for some envs

Pendulum doesn't work since it's an edge case (expects singleton arrays, which
`.squeeze()` collapses to scalars).

* fmt

* nit flake

* small lint
2018-05-30 10:48:11 -07:00
Robert Nishihara 3c76461b22 Remove smart_open install. (#1943) 2018-04-23 23:18:09 -07:00
Richard Liaw 888e70f1be [tune] HyperOpt Support (v2) (#1763) 2018-04-04 11:08:26 -07:00
butchcom 936bebef99 [rllib] Upgrade to OpenAI Gym 0.10.3 (#1601) 2018-03-06 00:31:02 -08:00
Robert Nishihara 7187f9fe56 Pin gym version to 0.9.5 in tests. (#1490) 2018-01-31 15:50:25 -08:00
Philipp Moritz 26125e1547 Fixing the jenkins tests (#1299)
* trying to fix jenkins tests

* comment out more tests

* remove pytorch stuff

* use non-monotonic clock (monotonic not supported on python 2.7)

* whitespace
2017-12-07 17:03:58 -08:00
Richard Liaw afdc87323f [rllib] PyTorch Models for A3C (#1187)
* fixing policy

* Compute Action is singular, fixed weird issue with arrays

* remove vestige

* extraneous ipdb

* Can Drop in Pytorch Model

* lint

* introducing models

* fix base policy

* Missed this from last time

* lint

* removedolds

* getting vision working

* LINT

* trying to fix test dependencies

* requiremnets

* try

* tryconda

* yes

* shutup

* flake_passes

* changes

* removing weight initializer for lstm for now

* unused

* adam

* clip

* zero

* properscaling

* weight

* try

* fix up pytorch visionnet

* bias correction

* fix model

* same visionnet

* matching_bad_things

* test

* try locking

* fixing_linear

* naming

* lint

* FORJENKINS

* clouds

* lint

* Lint + removed dependencies

* removed dependencies

* format
2017-11-12 00:20:33 -08:00
Robert Nishihara d5eec0c2cd Pin opencv-python version to 3.2.0.8 in dockerfile. (#926) 2017-09-03 23:51:59 -07:00
Robert Nishihara 80e8426b5e Test example applications and rllib in jenkins tests. (#707)
* Test example applications in Jenkins.

* Fix default upload_dir argument for Algorithm class.

* Fix evolution strategies.

* Comment out policy gradient example which doesn't seem to work.

* Set --env-name for evolution strategies.
2017-07-16 18:51:33 +00:00
Johann Schleier-Smith 4f6100b67f fix docker build bug (#207) 2017-01-18 23:23:34 -08:00
Johann Schleier-Smith 8bb87a4f6b updated Docker files (#171)
* updated Docker files

* single Docker RUN for apt-get installs and cleanup

* stylistic cleanup
2016-12-31 17:21:33 -08:00
Robert Nishihara 91f16a3df0 Migrate repositories to ray-project. (#438)
* Migrate repositories to ray-project.

* Update numbuf to the migrated version.
2016-09-17 00:52:05 -07:00
Johann Schleier-Smith 583df08957 Docker builds on Travis (#343)
* attempt to build on travis using docker

* run tests in foreground

* add examples to travis tests

* test from current checkout

* attempt to fix docker version issues

* try build with xenial

* attempt docker upgrade

* avoid hang on configuration files

* matrix osx and linux w/ docker

* restore non-test docker builds

* fix typo

* tuning and cleanup

* add missing file

* comment cleanup
2016-08-02 17:03:28 -07:00
Johann Schleier-Smith 79e4a5a00e Ray with Docker (#324)
* Ray with Docker

* cleanup based on comments

* rename docker user to ray-user

* add examples docker image

* working toward reliable Docker devel image

* adjust ray-user uid for Linux builds on AWS

* update documentation

* reduced dependencies for examples

* updated Docker documentation

* experimental notice on developing with Docker
2016-08-01 16:44:11 -07:00