wassname/ray - ray - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/ray.git synced 2026-06-30 22:20:31 +08:00

Author	SHA1	Message	Date
Robert Nishihara	c9d70f0dda	Remove num_local_schedulers argument from ray.worker._init. (#3704 ) * Remove num_local_schedulers argument from ray.worker._init. * Fix * Fix tests.	2019-01-07 12:44:49 -08:00
Eric Liang	e78562b2e8	[rllib] Misc fixes: set lr for PG, better error message for LSTM/PPO, fix multi-agent/APEX (#3697 ) * fix * update test * better error * compute * eps fix * add get_policy() api * Update agent.py * better err msg * fix * pass in rew	2019-01-06 19:37:35 -08:00
Hao Chen	df0733cafb	Skip test_multiple_recursive (#3683 ) This test often hangs or fails in CI. Skip it for now to unblock other PRs.	2019-01-06 13:24:29 -08:00
Richard Liaw	8934e37a78	[tune] Change log handling for Tune (#3661 ) Also provides a small retry mechanism for a transient error as reported by #3340. Closes #3653.	2019-01-06 13:20:10 -08:00
mattearllongshot	681e8cd3fd	[autoscaler] Add an initial_workers option (#3530 ) ## What do these changes do? This option goes along with `min_workers`, and `max_workers`. When the cluster is first brought up (or when it is refreshed with a subsequent `ray up`) this number of nodes will be started. It's a workaround for issues of scaling (see related issues) where it can take a long time (or forever in the case where the head node has `--num-cpus 0`) to scale up a cluster in response to increasing demand. ## Related issue number Workaround for https://github.com/ray-project/ray/issues/3339 and https://github.com/ray-project/ray/issues/2106	2019-01-05 17:58:42 -08:00
Robert Nishihara	067976ad3d	Push a warning to all users when large number of workers have been started. (#3645 ) * Push a warning to all users when large number of workers have been started. * Add test. * Fix bug. * Give warning when worker starts instead of when worker registers. * Fix * Fix tests	2019-01-05 13:27:32 -08:00
Wang Qing	692fdc6bc3	[Java] Allow actor handle to be serialized without forking (#3686 )	2019-01-06 00:29:08 +08:00
Eric Liang	03fe760616	[rllib] Model self loss isn't included in all algorithms (#3679 )	2019-01-04 22:30:35 -08:00
Richard Liaw	960a943503	[tune] Fault Tolerance: handle lost checkpoints by restart (#3657 ) Checks that node failure with lost checkpoints does not crash. Also adds test.	2019-01-04 22:05:27 -08:00
Eric Liang	7db1f3be2a	[tune] resume=False by default but print a tip to set resume="prompt" + jenkins fix (#3681 )	2019-01-04 17:23:19 -08:00
Kristian Hartikainen	747b117929	[tune] Tweak/allow nested pbt mutations (#3455 ) * Fix warning text in pbt logger * Allow nested mutations in pbt by recursing explore function * Add test for nested pbt mutation * Update pbt explore to only call custom explore on top level * fix test	2019-01-04 13:51:11 -08:00
Robert Nishihara	cd80891ddb	Try to figure out the memory limit in a docker container. (#3605 ) * Try to figure out the memory limit in a docker container. * Update comment * Fix * Fix	2019-01-03 23:07:24 -08:00
Robert Nishihara	586a5c9ffa	Limit default redis max memory to 10GB. (#3630 ) * Limit Redis max memory to 10GB/shard by default. * Update stress tests. * Reorganize * Update * Add minimum cap size for object store and redis. * Small test update.	2019-01-03 13:23:54 -08:00
Yuhong Guo	4b23a34c93	Fix multi-thread problem of function manager and Jenkins test (#3648 )	2019-01-03 17:05:13 +08:00
Yuhong Guo	ad2287ebe9	Fix new boost libs failure in cache-lib mode and add test to cover collect_dependent_libs.sh (#3627 ) * Fix building breaks and add lib collection to Travis. * Fix arrow build * Fix version mismatch problem	2019-01-02 23:51:11 -08:00
Eric Liang	ca864faece	[rllib] Documentation for I/O API and multi-agent support / cleanup (#3650 )	2019-01-03 15:15:36 +08:00
opherlieber	2177e2f410	[rllib] Agent: Allow unknown subkeys for custom_resources_per_worker (#3639 ) * RLLib Agent: Allow unknown subkeys for custom_resources_per_worker * Update agent.py	2019-01-03 14:19:59 +08:00
Eric Liang	47d36d7bd6	[rllib] Refactor pytorch custom model support (#3634 )	2019-01-03 13:48:33 +08:00
Robert Nishihara	b6bcd18d65	Split profile table among many keys in the GCS. (#3676 ) * Divide profile table among many keys in GCS. * Fix, and remove --collect-profiling-data arg. * Remove reference in doc.	2019-01-02 21:33:01 -08:00
Yuhong Guo	93e9d2b82c	Improve backend log: env variable setting and format refine. (#3662 ) * Improve backend logging * Address comment * Fix Raul's comment	2019-01-01 21:45:29 -08:00
Eric Liang	b8a9e3f106	[rllib] Remove uses of sgd_stepsize => lr (#3667 ) * lr * Update example-evolution-strategies.rst	2019-01-01 12:01:27 +08:00
Si-Yuan	93d54110f8	Prevent overriding faulthandler settings (#3668 ) This change ensures that Ray set up fault handlers only if it has not been enabled by other applications. Otherwise some applications could face strange issues when using Ray, and some unittests using xml runners will fail.	2018-12-31 16:36:26 -08:00
Yuhong Guo	c9b8ecca51	Add RayParams to refactor the parameters used by ray python. (#3558 )	2018-12-29 22:04:27 +08:00
Devin Petersohn	eb1e5fa2cf	Fixing Python2 compatibility issues. Adding inline docs (#3656 )	2018-12-28 22:53:28 -08:00
Richard Liaw	aad3c50e2d	[tune] Cluster Fault Tolerance (#3309 ) This PR introduces cluster-level fault tolerance for Tune by checkpointing global state. This occurs with relatively high frequency and allows users to easily resume experiments when the cluster crashes. Note that this PR may affect automated workflows due to auto-prompting, but this is resolvable.	2018-12-29 11:42:25 +08:00
Zhijun Fu	382b138fc7	fix code issues in object manager that are reported by scanning tool (#3649 ) Fix some code issues found by code scanning tool: 1. Macro compares unsigned to 0(NO_EFFECT) CWE570: An unsigned value can never be less than 0 This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "this->create_buffer_state_[object_id].num_seals_remaining >= 0UL". ~/ray/src/ray/object_manager/object_buffer_pool.cc: ray::ObjectBufferPool::SealChunk(const ray::UniqueID &, unsigned long) 2. Inferred misuse of enum(MIXED_ENUMS) CWE398: An integer expression which was inferred to have an enum type is mixed with a different enum type This case, "static_cast(ray::object_manager::protocol::MessageType::PushRequest)", implies the effective type of "message_type" is "ray::object_manager::protocol::MessageType". ~/ray/src/ray/object_manager/object_manager.cc: ray::ObjectManager::ProcessClientMessage(std::shared_ptr> &, long, const unsigned char *)	2018-12-28 14:38:59 -08:00
Zhijun Fu	3df1e1c471	Add missing lock in FreeObjects of object buffer pool (#3647 ) Object manager uses multi-threading for transferring objects between different nodes, the plasma client used in object_buffer_pool_ needs to be protected by lock. We have met crashes caused by missing lock in FreeObjects() interface, this PR fixes that issue.	2018-12-28 11:47:31 -08:00
Wang Qing	c59b506c6e	[Java] Support calling Ray APIs from multiple threads (#3646 )	2018-12-28 17:44:31 +08:00
Hao Chen	0b682d043e	Fix memory leak in PyRayletCient (#3640 ) 1) if using `PyObject_GetIter`, the caller must call `Py_DECREF` to avoid memory leak. But with `PyList_GetItem`, `Py_DECREF` isn't needed. 2) the `Py_BuildValue` call in `wait` doesn't need to increment ref count.	2018-12-27 17:39:02 -08:00
Hao Chen	62af2f25be	Fix test_multiple_actor_reconstruction failure (#3641 ) * Fix test_multiple_actor_reconstruction failure * add comment	2018-12-27 13:57:52 -08:00
Richard Liaw	ac792d70c8	[rllib] Add starcraft multiagent env as example (#3542 )	2018-12-27 10:00:32 +08:00
Tianming Xu	b4f61dfd50	[rllib] Export policy model checkpoint (#3637 ) * Export policy model checkpoint * update comment	2018-12-27 08:43:06 +09:00
Richard Liaw	6e2d7a9ba1	[tune] Support Configuration Merging (#3584 ) * merge configs * deep merge * lint * add resolve * test	2018-12-26 20:07:11 +09:00
Stan Wang	4ce3818be5	Average aggregated gradients before put in plasma store (#3631 )	2018-12-26 20:03:11 +09:00
Wang Qing	4cde971916	[Java] Print the log message slowly. (#3633 )	2018-12-26 16:33:21 +08:00
Yuhong Guo	1b98fb8238	Fix Jenkins test failures and function descriptor bug. (#3569 ) ## What do these changes do? 1. Fix the Jenkins test failure by add driver id to Actor GCS Key. 2. Move `object_manager_test.py` from Jenkins to Travis.	2018-12-25 23:31:44 -08:00
Wang Qing	a971b73bbe	[Java] Fix the issue when waiting an empty list or a null pointer (#3632 )	2018-12-26 11:29:29 +08:00
Hao Chen	f4011754d6	Fix: ServerConnection should be closed before being removed (#3626 ) Otherwise, in the event of a remote raylet crashing, the connection might be held by boost asio forever, and the pending callbacks will never get invoked. See also #3586.	2018-12-25 11:01:53 -08:00
Robert Nishihara	5426234cd8	Update documentation to reflect 0.6.1 release. (#3622 )	2018-12-24 11:10:04 -08:00
Robert Nishihara	1e8cdb5421	Update release documentation. (#3587 ) * Update release instructions. * Add note about wheels. * Fix * Update * update example * Update RELEASE_PROCESS.rst	2018-12-24 11:09:09 -08:00
nam-cern	3d8f56409b	Ensure numpy is at least 1.10.4 in setup.py (#2462 ) In the build script, numpy is specifically set at 1.10.4. We should also ensure that it is indeed the case in `setup.py`.	2018-12-24 11:01:25 -08:00
Eric Liang	9f63119a83	[rllib] Allow development without needing to compile Ray (#3623 ) * wip * lint * wip * wip * rename * wip * Cleaner handling of cli prompt	2018-12-24 18:08:23 +09:00
Devin Petersohn	c13b2685f5	[modin] Append to path to avoid namespace collision on development branches (#3621 )	2018-12-23 23:58:56 -08:00
Si-Yuan	a1995ff3b0	Resize logo in README. (#3619 )	2018-12-23 22:59:23 -08:00
Alexey Tumanov	9b8d7573fe	bump version from 0.6.0 to 0.6.1 (#3610 ) ray-0.6.1	2018-12-23 17:03:42 -08:00
Robert Nishihara	bb7ca3bae7	Upgrade flatbuffers version to 1.10.0. (#3559 ) * Upgrade flatbuffers version to 1.10.0. * Temporarily change ray.utils.decode for backwards compatibility.	2018-12-23 14:56:34 -08:00
Robert Nishihara	ddd4c842f1	Initialize some variables in constructor instead of header file. (#3617 ) * Initialize some variables in constructor instead of header file	2018-12-23 02:44:23 -08:00
Alexey Tumanov	bada42c334	object store notification mgr: fix using uninitialized variables (#3592 ) Initialize private class variables to avoid valgrind errors. They are used before initialization.	2018-12-22 19:51:22 -08:00
Philipp Moritz	e578a38116	Fix TensorFlow and PyTorch compatibility (#3574 ) * remove tensorflow workaround * update docker * add boost threads * add date_time, too * change link order * cosmetics	2018-12-22 13:25:48 -08:00
Tianming Xu	deb26b954e	[rllib] Export tensorflow model of policy graph (#3585 ) * Export tensorflow model of policy graph * Add tests,examples,pydocs and infer extra signatures from existing methods * Add example usage in export_policy_model comment * Fix lint error * Fix lint error * Fix lint error	2018-12-22 17:35:25 +09:00

... 88 89 90 91 92 ...

6812 Commits