From b4ab1a5375066390d45e1ddd16d5b1aeaf9866b5 Mon Sep 17 00:00:00 2001 From: Victor Grau Serrat Date: Tue, 21 Nov 2017 22:53:50 -0700 Subject: [PATCH] DOC: remake of beginner tutorial --- docs/source/beginner-tutorial.rst | 250 ++++++++++++++++++++---------- docs/source/install.rst | 14 +- 2 files changed, 173 insertions(+), 91 deletions(-) diff --git a/docs/source/beginner-tutorial.rst b/docs/source/beginner-tutorial.rst index 6848a806..f79c2c7e 100644 --- a/docs/source/beginner-tutorial.rst +++ b/docs/source/beginner-tutorial.rst @@ -5,9 +5,8 @@ Basics ~~~~~~ Catalyst is an open-source algorithmic trading simulator for crypto -assets written in Python. - -The source can be found at: https://github.com/enigmampc/catalyst +assets written in Python. The source code can be found at: +https://github.com/enigmampc/catalyst Some benefits include: @@ -25,8 +24,7 @@ Some benefits include: build profitable, data-driven investment strategies. This tutorial assumes that you have Catalyst correctly installed, see the -:doc:`installation instructions ` if you haven't set up -Catalyst yet. +:doc:`Install` section if you haven't set up Catalyst yet. Every ``catalyst`` algorithm consists of at least two functions you have to define: @@ -40,10 +38,12 @@ Before the start of the algorithm, ``catalyst`` calls the need to access from one algorithm iteration to the next. After the algorithm has been initialized, ``catalyst`` calls the -``handle_data()`` function once for each event. At every call, it passes -the same ``context`` variable and an event-frame called ``data`` -containing the current trading bar with open, high, low, and close -(OHLC) prices as well as volume for each crypto asset in your universe. +``handle_data()`` function on each iteration, that's one per day (daily) or +once every minute (minute), depending on the frequency we choose to run our +simulation. On every iteration, ``handle_data()`` passes the same ``context`` +variable and an event-frame called ``data`` containing the current trading bar +with open, high, low, and close (OHLC) prices as well as volume for each +crypto asset in your universe. .. For more information on these functions, see the `relevant part of the .. Quantopian docs `. @@ -51,8 +51,8 @@ containing the current trading bar with open, high, low, and close My first algorithm ~~~~~~~~~~~~~~~~~~ -Lets take a look at a very simple algorithm from the ``examples`` -directory: `buy_btc_simple.py `_: +Lets take a look at a very simple algorithm from the ``examples`` directory: +`buy_btc_simple.py `_: .. code-block:: python @@ -70,9 +70,9 @@ directory: `buy_btc_simple.py `__. -Running the algorithm -~~~~~~~~~~~~~~~~~~~~~ - -To can now test this algorithm on crypto data, ``catalyst`` provides three -interfaces: - -- A command-line interface, -- ``IPython Notebook`` magic, -- and :func:`~catalyst.run_algorithm`. - Ingesting data -^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~ -In previous versions of Catalyst you needed to manually ingest data before running -your algorithm to make it available at runtime. Starting with version 0.3, the -algorithm will automagically ingest the data it needs the first time that encounters -a data request for data that it doesn't have. +Before you can backtest your algorithm, you first need to load the historical +pricing data that Catalyst needs to run your simulation through a process called +``ingestion``. When you ingest data, Catalyst downloads that data in compressed +form from the Enigma servers (which eventually will migrate to the Enigma Data +Marketplace), and stores it locally to make it available at runtime. -Still, we believe it is important for you to have a high-level understanding -of how data is managed: +In order to ingest data, you need to run a command like the following: + +.. code-block:: bash + + catalyst ingest-exchange -x bitfinex -i btc_usd + +This instructs Catalyst to download pricing data from the ``Bitfinex`` exchange +for the ``btc_usd`` currency pair (this follows from the simple algorithm +presented above where we want to trade ``btc_usd``), and we're choosing to test +our algorithm using historical pricing data from the Bitfinex exchange. By +default, Catalyst assumes that you want data with ``daily`` frequency (one candle +bar per day). If you want instead ``minute`` frequency (one candle bar for every +minute), you would need to specify it as follows: + +.. code-block:: bash + + catalyst ingest-exchange -x bitfinex -i btc_usd -f minute + +.. parsed-literal:: + + Ingesting exchange bundle bitfinex... + [====================================] Ingesting daily price data on bitfinex: 100% + +We believe it is important for you to have a high-level understanding of how +data is managed, hence the following overview: - Pricing data is split and packaged into ``bundles``: chunks of data organized as time series that are kept up to date daily on Enigma's servers. Catalyst - downloads the bundles that needs at any given time, and reconstructs the whole - dataset in your hard drive. + downloads the requested bundles and reconstructs the full dataset in your + hard drive. -- Pricing data is provided in ``daily`` and ``minute`` resolution. Those are different - bundle datasets, and are managed separately. +- Pricing data is provided in ``daily`` and ``minute`` resolution. Those are + different bundle datasets, and are managed separately. -- Bundles are exchange-specific, as the pricing data is specific to the trades that - happen in each exchange. You can optionally specify which exchange you want pricing - data from. +- Bundles are exchange-specific, as the pricing data is specific to the trades + that happen in each exchange. As a result, you can must specify which + exchange you want pricing data from when ingesting data -- Catalyst keeps track of all the downloaded bundles, so that it only has to download - them once, and will do incremental updates as needed. +- Catalyst keeps track of all the downloaded bundles, so that it only has to + download them once, and will do incremental updates as needed. -- When running in ``live trading`` mode, Catalyst will first look for historical - pricing data in the locally stored bundles. If there is anything missing, Catalyst will - hit the exchange for the most recent data, and merge it with the local bundle to make - it available for future iterations. +- When running in ``live trading`` mode, Catalyst will first look for + historical pricing data in the locally stored bundles. If there is anything + missing, Catalyst will hit the exchange for the most recent data, and merge + it with the local bundle to optimize the number of requests it needs to make + to the exchange. -If you want to learn more, check out the :ref:`ingesting data ` section -for more detail. +The ``ingest-exchange`` command in catalyst offers additional parameters to +further tweak the data ingestion process. You can learn more by running the +following from the command line: + +.. code-block:: bash + + catalyst ingest-exchange --help + +Running the algorithm +~~~~~~~~~~~~~~~~~~~~~ + +You can now test your algorithm using cryptoassets' historical pricing data, +``catalyst`` provides three interfaces: + +- A command-line interface (CLI), +- the ``IPython Notebook`` magic, +- and a :func:`~catalyst.run_algorithm` that you can call from other + Python scripts. + +We'll start with the CLI, and introduce the ``IPython Notebook`` below. Some of +the :doc:`example algorithms ` provide instructions on how to run +them both from the CLI, and using the :func:`~catalyst.run_algorithm` function. Command line interface ^^^^^^^^^^^^^^^^^^^^^^ -After you installed Catalyst you should be able to execute the following -from your command line (e.g. ``cmd.exe`` on Windows, or the Terminal app -on OSX). Displaying here a simplified output for eductional purposes: +After you installed Catalyst, you should be able to execute the following +from your command line (e.g. ``cmd.exe`` or the ``Anaconda Prompt`` on Windows, +or the Terminal application on MacOS). .. code-block:: bash $ catalyst --help +This is the resulting output, simplified for eductional purposes: + .. parsed-literal:: Usage: catalyst [OPTIONS] COMMAND [ARGS]... @@ -158,10 +195,11 @@ on OSX). Displaying here a simplified output for eductional purposes: live Trade live with the given algorithm. run Run a backtest for the given algorithm. -There are three main modes you can run on Catalyst. The first being ``ingest-exchange`` -for data ingestion, which we have summarized in the previous section. The second -is ``live`` to use your algorithm to trade live against a given exchange, and the -third mode ``run`` is to backtest your algorithm before trading live with it. +There are three main modes you can run on Catalyst. The first being +``ingest-exchange`` for data ingestion, which we have covered in the previous +section. The second is ``live`` to use your algorithm to trade live against a +given exchange, and the third mode ``run`` is to backtest your algorithm before +trading live with it. Let's start with backtesting, so run this other command to learn more about the available options: @@ -210,22 +248,24 @@ the available options: As you can see there are a couple of flags that specify where to find your -algorithm (``-f``) as well as a parameter to specify which exchange to use. -There are also arguments for the date range to run the algorithm over -(``--start`` and ``--end``). Finally, you'll want to save the performance -metrics of your algorithm so that you can analyze how it performed. This is -done via the ``--output`` flag and will cause it to write the performance -``DataFrame`` in the pickle Python file format. Note that you can also define -a configuration file with these parameters that you can then conveniently pass -to the ``-c`` option so that you don't have to supply the command line args -all the time (see the .conf files in the examples directory). +algorithm (``-f``) as well as a the ``-x`` flag to specify which exchange to +use. There are also arguments for the date range to run the algorithm over +(``--start`` and ``--end``). You also need to set the base currency for your +algorithm through the ``-c`` flag, and the ``--capital_base``. All the +aforementioned parameters are required. Optionally, you will want to save the +performance metrics of your algorithm so that you can analyze how it performed. +This is done via the ``--output`` flag and will cause it to write the +performance ``DataFrame`` in the pickle Python file format. Note that you can +also define a configuration file with these parameters that you can then +conveniently pass to the ``-c`` option so that you don't have to supply the +command line args all the time. Thus, to execute our algorithm from above and save the results to ``buy_btc_simple_out.pickle`` we would call ``catalyst run`` as follows: .. code-block:: python - catalyst run -f buy_btc_simple.py -x bitfinex --start 2016-1-1 --end 2017-9-30 -o buy_btc_simple_out.pickle + catalyst run -f buy_btc_simple.py -x bitfinex --start 2016-1-1 --end 2017-9-30 -c usd --capital-base 100000 -o buy_btc_simple_out.pickle .. parsed-literal:: @@ -253,17 +293,25 @@ slippage model that ``catalyst`` uses). .. see the `Quantopian docs `__ .. for more information). -Let's take a quick look at the performance ``DataFrame``. For this, we -use ``pandas`` from inside the IPython Notebook and print the first ten -rows. Note that ``catalyst`` makes heavy usage of -`pandas `_, especially for data input and -outputting so it's worth spending some time to learn it. + +Let's take a quick look at the performance ``DataFrame``. For this, we write +different Python script--let's call it ``print_results.py``--and we make use of +the fantastic ``pandas`` library to print the first ten rows. Note that +``catalyst`` makes heavy usage of `pandas `_, +especially for data analysis and outputting so it's worth spending some time to +learn it. .. code-block:: python import pandas as pd perf = pd.read_pickle('buy_btc_simple_out.pickle') # read in perf DataFrame - perf.head() + print(perf.head()) + +Which we execute by running: + +.. code-block:: bash + + $ python print_results.py .. raw:: html @@ -429,30 +477,48 @@ and allows us to plot the price of bitcoin. For example, we could easily examine now how our portfolio value changed over time compared to the bitcoin price. -.. code-block:: python - - %load_ext catalyst +Now we will run the simulation again, but this time we extend our original +algorithm with the addition of the ``analyze()`` function. Somewhat analogously +as how ``initialize()`` gets called once before the start of the algorith, +``analyze()`` gets called once at the end of the algorithm, and receives two +variables: ``context``, which we discussed at the very beginning, and ``perf``, +which is the pandas dataframe containing the performance data for our algorithm +that we reviewed above. Inside the ``analyze()`` function is where we can +analyze and visualize the results of our strategy. Here's the revised simple +algorithm (note the addition of Line 1, and Lines 11-18) .. code-block:: python - %pylab inline - figsize(12, 12) import matplotlib.pyplot as plt + from catalyst.api import order, record, symbol - ax1 = plt.subplot(211) - perf.portfolio_value.plot(ax=ax1) - ax1.set_ylabel('portfolio value') - ax2 = plt.subplot(212, sharex=ax1) - perf.btc.plot(ax=ax2) - ax2.set_ylabel('bitcoin price') + def initialize(context): + context.asset = symbol('btc_usd') -.. parsed-literal:: + def handle_data(context, data): + order(context.asset, 1) + record(btc = data.current(context.asset, 'price')) - Populating the interactive namespace from numpy and matplotlib + def analyze(context, perf): + ax1 = plt.subplot(211) + perf.portfolio_value.plot(ax=ax1) + ax1.set_ylabel('portfolio value') + ax2 = plt.subplot(212, sharex=ax1) + perf.btc.plot(ax=ax2) + ax2.set_ylabel('bitcoin price') + plt.show() -.. parsed-literal:: +Here we make use of the external visualization library called +`matplotlib `_, which you might recall we installed +alongside enigma-catalyst (with the exception of the ``Conda`` install, where it +was included by default inside the conda environment we created). If for any +reason you don't have it installed, you can add it by running: - +.. code-block:: python + + (catalyst)$ pip install matplotlib + +If everything works well, you'll see the following chart: .. image:: https://s3.amazonaws.com/enigmaco-docs/github.io/buy_btc_simple_graph.png @@ -460,6 +526,22 @@ Our algorithm performance as assessed by the ``portfolio_value`` closely matches that of the bitcoin price. This is not surprising as our algorithm only bought bitcoin every chance it got. + If you get an error when invoking matplotlib to visualize the performance + results refer to `MacOS + Matplotlib `_. + Alternatively, some users have reported the following error when running an algo + in a Linux environment: + + .. parsed-literal:: + + ImportError: No module named _tkinter, please install the python-tk package + + Which can easily solved by running (in Ubuntu/Debian-based systems): + + .. code-block:: python + + sudo apt install python-tk + + Access to previous prices using ``history`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/install.rst b/docs/source/install.rst index 53ccbf29..e95eea65 100644 --- a/docs/source/install.rst +++ b/docs/source/install.rst @@ -80,7 +80,7 @@ Once either Conda or MiniConda has been set up you can install Catalyst: 4. Activate the environment (which you need to do every time you start a new session to run Catalyst): - **Linux or OSX:** + **Linux or MacOS:** .. code-block:: bash @@ -125,7 +125,7 @@ with the following steps: 3. Activate the environment: - **Linux or OSX:** + **Linux or MacOS:** .. code-block:: bash @@ -358,11 +358,11 @@ beginning of this page. MacOS Requirements ------------------ -The version of Python shipped with OSX by default is generally out of date, +The version of Python shipped with MacOS by default is generally out of date, and has a number of quirks because it's used directly by the operating system. For these reasons, many developers choose to install and use a separate Python installation. The `Hitchhiker's Guide to Python`_ provides an excellent guide -to `Installing Python on OSX `_, +to `Installing Python on MacOS `_, which explains how to install Python with the `Homebrew`_ manager. Assuming you've installed Python with Homebrew, you'll also likely need the @@ -372,17 +372,17 @@ following brew packages: $ brew install freetype pkg-config gcc openssl -OSX + virtualenv + matplotlib +MacOS + virtualenv + matplotlib ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A note about using matplotlib in virtual enviroments on OSX: it may be +A note about using matplotlib in virtual enviroments on MacOS: it may be necessary to run .. code-block:: bash echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc -in order to override the default ``macosx`` backend for your system, which +in order to override the default ``MacOS`` backend for your system, which may not be accessible from inside the virtual environment. This will allow Catalyst to open matplotlib charts from within a virtual environment, which is useful for displaying the performance of your backtests. To learn more