diff --git a/doc/source/serve/advanced.rst b/doc/source/serve/advanced.rst index 8b107b8b3..64177b7a7 100644 --- a/doc/source/serve/advanced.rst +++ b/doc/source/serve/advanced.rst @@ -16,7 +16,7 @@ the properties of a particular backend. Scaling Out =========== -To scale out a backend to multiple workers, simplify configure the number of replicas. +To scale out a backend to multiple workers, simply configure the number of replicas. .. code-block:: python @@ -32,7 +32,7 @@ This will scale up or down the number of workers that can accept requests. Using Resources (CPUs, GPUs) ============================ -To assign hardware resource per worker, you can pass resource requirements to +To assign hardware resources per worker, you can pass resource requirements to ``ray_actor_options``. To learn about options to pass in, take a look at :ref:`Resources with Actor` guide. @@ -173,7 +173,7 @@ Session Affinity ---------------- Splitting traffic randomly among backends for each request is is general and simple, but it can be an issue when you want to ensure that a given user or client is served by the same backend repeatedly. -To address this, Serve offers a "shard key" can be specified for each request that will deterministically map to a backend. +To address this, a "shard key" can be specified for each request that will deterministically map to a backend. In practice, this should be something that uniquely identifies the entity that you want to consistently map, like a client ID or session ID. The shard key can either be specified via the X-SERVE-SHARD-KEY HTTP header or :mod:`handle.options(shard_key="key") `. diff --git a/doc/source/serve/architecture.rst b/doc/source/serve/architecture.rst index ac078fe63..ac98d5c68 100644 --- a/doc/source/serve/architecture.rst +++ b/doc/source/serve/architecture.rst @@ -52,7 +52,7 @@ FAQ How does Serve handle fault tolerance? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Application errors like exceptions in your model evaluation code is catched and +Application errors like exceptions in your model evaluation code are caught and wrapped. A 500 status code will be returned with the traceback information. The worker replica will be able to continue to handle requests. diff --git a/doc/source/serve/deployment.rst b/doc/source/serve/deployment.rst index 8c0cf672e..da8068213 100644 --- a/doc/source/serve/deployment.rst +++ b/doc/source/serve/deployment.rst @@ -110,7 +110,6 @@ these two models. While this is a simple operation, you may want to see :ref:`serve-split-traffic` for more information. One thing you may want to consider as well is :ref:`session-affinity` which gives you the ability to ensure that queries from users/clients always get mapped to the same backend. -versions. Now that we're up and running serving two models in production, let's query our results several times to see some results. You'll notice that we're now splitting diff --git a/doc/source/serve/tutorials/batch.rst b/doc/source/serve/tutorials/batch.rst index d8fc13a3d..1183a22c3 100644 --- a/doc/source/serve/tutorials/batch.rst +++ b/doc/source/serve/tutorials/batch.rst @@ -4,7 +4,7 @@ Batching Tutorial ================= In this guide, we will deploy a simple vectorized adder that takes -a batch of queries and add them at once. In particular, we show: +a batch of queries and adds them at once. In particular, we show: - How to implement and deploy Ray Serve model that accepts batches. - How to configure the batch size. @@ -60,7 +60,7 @@ the input value, convert them into an array, and use NumPy to add 1 to each elem Let's deploy it. Note that in the ``config`` section of ``create_backend``, we are specifying the maximum batch size via ``config={"max_batch_size": 4}``. This -configuration option limits the maximum possible batch size send to the backend. +configuration option limits the maximum possible batch size sent to the backend. .. note:: Ray Serve performs *opportunistic batching*. When a worker is free to evaluate diff --git a/doc/source/serve/tutorials/pytorch.rst b/doc/source/serve/tutorials/pytorch.rst index 214637557..62c7b2f2a 100644 --- a/doc/source/serve/tutorials/pytorch.rst +++ b/doc/source/serve/tutorials/pytorch.rst @@ -12,7 +12,7 @@ In particular, we show: Please see the :doc:`../key-concepts` to learn more general information about Ray Serve. This tutorial requires Pytorch and Torchvision installed in your system. Ray Serve -is framework agnostic and work with any version of PyTorch. +is framework agnostic and works with any version of PyTorch. .. code-block:: bash diff --git a/doc/source/serve/tutorials/tensorflow.rst b/doc/source/serve/tutorials/tensorflow.rst index 73bc577dc..4ce9a2278 100644 --- a/doc/source/serve/tutorials/tensorflow.rst +++ b/doc/source/serve/tutorials/tensorflow.rst @@ -11,7 +11,7 @@ In particular, we show: Please see the :doc:`../key-concepts` to learn more general information about Ray Serve. -Ray Serve is framework agnostic you can use any version of Tensorflow. +Ray Serve is framework agnostic -- you can use any version of Tensorflow. However, for this tutorial, we use Tensorflow 2 and Keras. Please make sure you have Tensorflow 2 installed. diff --git a/python/ray/serve/api.py b/python/ray/serve/api.py index 6bf5140e3..f469e6b69 100644 --- a/python/ray/serve/api.py +++ b/python/ray/serve/api.py @@ -358,7 +358,7 @@ def start(detached: bool = False, this to "0.0.0.0". One HTTP server will be started on each node in the Ray cluster. http_port (int): Port for HTTP server. Defaults to 8000. - http_middleswares (list): A list of Starlette middlewares that will be + http_middlewares (list): A list of Starlette middlewares that will be applied to the HTTP servers in the cluster. """ # Initialize ray if needed. diff --git a/python/ray/serve/examples/doc/quickstart_class.py b/python/ray/serve/examples/doc/quickstart_class.py index 890aed094..7a8c3d7ff 100644 --- a/python/ray/serve/examples/doc/quickstart_class.py +++ b/python/ray/serve/examples/doc/quickstart_class.py @@ -18,4 +18,4 @@ client.create_backend("counter", Counter) client.create_endpoint("counter", backend="counter", route="/counter") requests.get("http://127.0.0.1:8000/counter").json() -# > {"current_counter": self.count} +# > {"current_counter": 0}