Merge remote-tracking branch 'origin/develop' into develop

This commit is contained in:
fredfortier
2017-11-22 16:03:29 -05:00
2 changed files with 173 additions and 91 deletions
+166 -84
View File
@@ -5,9 +5,8 @@ Basics
~~~~~~
Catalyst is an open-source algorithmic trading simulator for crypto
assets written in Python.
The source can be found at: https://github.com/enigmampc/catalyst
assets written in Python. The source code can be found at:
https://github.com/enigmampc/catalyst
Some benefits include:
@@ -25,8 +24,7 @@ Some benefits include:
build profitable, data-driven investment strategies.
This tutorial assumes that you have Catalyst correctly installed, see the
:doc:`installation instructions <install>` if you haven't set up
Catalyst yet.
:doc:`Install<install>` section if you haven't set up Catalyst yet.
Every ``catalyst`` algorithm consists of at least two functions you have to
define:
@@ -40,10 +38,12 @@ Before the start of the algorithm, ``catalyst`` calls the
need to access from one algorithm iteration to the next.
After the algorithm has been initialized, ``catalyst`` calls the
``handle_data()`` function once for each event. At every call, it passes
the same ``context`` variable and an event-frame called ``data``
containing the current trading bar with open, high, low, and close
(OHLC) prices as well as volume for each crypto asset in your universe.
``handle_data()`` function on each iteration, that's one per day (daily) or
once every minute (minute), depending on the frequency we choose to run our
simulation. On every iteration, ``handle_data()`` passes the same ``context``
variable and an event-frame called ``data`` containing the current trading bar
with open, high, low, and close (OHLC) prices as well as volume for each
crypto asset in your universe.
.. For more information on these functions, see the `relevant part of the
.. Quantopian docs <https://www.quantopian.com/help#api-toplevel>`.
@@ -51,8 +51,8 @@ containing the current trading bar with open, high, low, and close
My first algorithm
~~~~~~~~~~~~~~~~~~
Lets take a look at a very simple algorithm from the ``examples``
directory: `buy_btc_simple.py <https://github.com/enigmampc/catalyst/blob/master/catalyst/examples/buy_btc_simple.py>`_:
Lets take a look at a very simple algorithm from the ``examples`` directory:
`buy_btc_simple.py <https://github.com/enigmampc/catalyst/blob/master/catalyst/examples/buy_btc_simple.py>`_:
.. code-block:: python
@@ -70,9 +70,9 @@ directory: `buy_btc_simple.py <https://github.com/enigmampc/catalyst/blob/master
As you can see, we first have to import some functions we would like to
use. All functions commonly used in your algorithm can be found in
``catalyst.api``. Here we are using :func:`~catalyst.api.order()` which takes two
arguments: a cryptoasset object, and a number specifying how many assets you would
like to order (if negative, :func:`~catalyst.api.order()` will sell/short
``catalyst.api``. Here we are using :func:`~catalyst.api.order()` which takes
twoarguments: a cryptoasset object, and a number specifying how many assets you
wouldlike to order (if negative, :func:`~catalyst.api.order()` will sell/short
assets). In this case we want to order 1 bitcoin at each iteration.
.. For more documentation on ``order()``, see the `Quantopian docs
@@ -88,61 +88,98 @@ a bitcoin in the ``data`` event frame.
.. (for more information see `here <https://www.quantopian.com/help#api-event-properties>`__.
Running the algorithm
~~~~~~~~~~~~~~~~~~~~~
To can now test this algorithm on crypto data, ``catalyst`` provides three
interfaces:
- A command-line interface,
- ``IPython Notebook`` magic,
- and :func:`~catalyst.run_algorithm`.
Ingesting data
^^^^^^^^^^^^^^
~~~~~~~~~~~~~~
In previous versions of Catalyst you needed to manually ingest data before running
your algorithm to make it available at runtime. Starting with version 0.3, the
algorithm will automagically ingest the data it needs the first time that encounters
a data request for data that it doesn't have.
Before you can backtest your algorithm, you first need to load the historical
pricing data that Catalyst needs to run your simulation through a process called
``ingestion``. When you ingest data, Catalyst downloads that data in compressed
form from the Enigma servers (which eventually will migrate to the Enigma Data
Marketplace), and stores it locally to make it available at runtime.
Still, we believe it is important for you to have a high-level understanding
of how data is managed:
In order to ingest data, you need to run a command like the following:
.. code-block:: bash
catalyst ingest-exchange -x bitfinex -i btc_usd
This instructs Catalyst to download pricing data from the ``Bitfinex`` exchange
for the ``btc_usd`` currency pair (this follows from the simple algorithm
presented above where we want to trade ``btc_usd``), and we're choosing to test
our algorithm using historical pricing data from the Bitfinex exchange. By
default, Catalyst assumes that you want data with ``daily`` frequency (one candle
bar per day). If you want instead ``minute`` frequency (one candle bar for every
minute), you would need to specify it as follows:
.. code-block:: bash
catalyst ingest-exchange -x bitfinex -i btc_usd -f minute
.. parsed-literal::
Ingesting exchange bundle bitfinex...
[====================================] Ingesting daily price data on bitfinex: 100%
We believe it is important for you to have a high-level understanding of how
data is managed, hence the following overview:
- Pricing data is split and packaged into ``bundles``: chunks of data organized
as time series that are kept up to date daily on Enigma's servers. Catalyst
downloads the bundles that needs at any given time, and reconstructs the whole
dataset in your hard drive.
downloads the requested bundles and reconstructs the full dataset in your
hard drive.
- Pricing data is provided in ``daily`` and ``minute`` resolution. Those are different
bundle datasets, and are managed separately.
- Pricing data is provided in ``daily`` and ``minute`` resolution. Those are
different bundle datasets, and are managed separately.
- Bundles are exchange-specific, as the pricing data is specific to the trades that
happen in each exchange. You can optionally specify which exchange you want pricing
data from.
- Bundles are exchange-specific, as the pricing data is specific to the trades
that happen in each exchange. As a result, you can must specify which
exchange you want pricing data from when ingesting data
- Catalyst keeps track of all the downloaded bundles, so that it only has to download
them once, and will do incremental updates as needed.
- Catalyst keeps track of all the downloaded bundles, so that it only has to
download them once, and will do incremental updates as needed.
- When running in ``live trading`` mode, Catalyst will first look for historical
pricing data in the locally stored bundles. If there is anything missing, Catalyst will
hit the exchange for the most recent data, and merge it with the local bundle to make
it available for future iterations.
- When running in ``live trading`` mode, Catalyst will first look for
historical pricing data in the locally stored bundles. If there is anything
missing, Catalyst will hit the exchange for the most recent data, and merge
it with the local bundle to optimize the number of requests it needs to make
to the exchange.
If you want to learn more, check out the :ref:`ingesting data <ingesting-data>` section
for more detail.
The ``ingest-exchange`` command in catalyst offers additional parameters to
further tweak the data ingestion process. You can learn more by running the
following from the command line:
.. code-block:: bash
catalyst ingest-exchange --help
Running the algorithm
~~~~~~~~~~~~~~~~~~~~~
You can now test your algorithm using cryptoassets' historical pricing data,
``catalyst`` provides three interfaces:
- A command-line interface (CLI),
- the ``IPython Notebook`` magic,
- and a :func:`~catalyst.run_algorithm` that you can call from other
Python scripts.
We'll start with the CLI, and introduce the ``IPython Notebook`` below. Some of
the :doc:`example algorithms <example-algos>` provide instructions on how to run
them both from the CLI, and using the :func:`~catalyst.run_algorithm` function.
Command line interface
^^^^^^^^^^^^^^^^^^^^^^
After you installed Catalyst you should be able to execute the following
from your command line (e.g. ``cmd.exe`` on Windows, or the Terminal app
on OSX). Displaying here a simplified output for eductional purposes:
After you installed Catalyst, you should be able to execute the following
from your command line (e.g. ``cmd.exe`` or the ``Anaconda Prompt`` on Windows,
or the Terminal application on MacOS).
.. code-block:: bash
$ catalyst --help
This is the resulting output, simplified for eductional purposes:
.. parsed-literal::
Usage: catalyst [OPTIONS] COMMAND [ARGS]...
@@ -158,10 +195,11 @@ on OSX). Displaying here a simplified output for eductional purposes:
live Trade live with the given algorithm.
run Run a backtest for the given algorithm.
There are three main modes you can run on Catalyst. The first being ``ingest-exchange``
for data ingestion, which we have summarized in the previous section. The second
is ``live`` to use your algorithm to trade live against a given exchange, and the
third mode ``run`` is to backtest your algorithm before trading live with it.
There are three main modes you can run on Catalyst. The first being
``ingest-exchange`` for data ingestion, which we have covered in the previous
section. The second is ``live`` to use your algorithm to trade live against a
given exchange, and the third mode ``run`` is to backtest your algorithm before
trading live with it.
Let's start with backtesting, so run this other command to learn more about
the available options:
@@ -210,22 +248,24 @@ the available options:
As you can see there are a couple of flags that specify where to find your
algorithm (``-f``) as well as a parameter to specify which exchange to use.
There are also arguments for the date range to run the algorithm over
(``--start`` and ``--end``). Finally, you'll want to save the performance
metrics of your algorithm so that you can analyze how it performed. This is
done via the ``--output`` flag and will cause it to write the performance
``DataFrame`` in the pickle Python file format. Note that you can also define
a configuration file with these parameters that you can then conveniently pass
to the ``-c`` option so that you don't have to supply the command line args
all the time (see the .conf files in the examples directory).
algorithm (``-f``) as well as a the ``-x`` flag to specify which exchange to
use. There are also arguments for the date range to run the algorithm over
(``--start`` and ``--end``). You also need to set the base currency for your
algorithm through the ``-c`` flag, and the ``--capital_base``. All the
aforementioned parameters are required. Optionally, you will want to save the
performance metrics of your algorithm so that you can analyze how it performed.
This is done via the ``--output`` flag and will cause it to write the
performance ``DataFrame`` in the pickle Python file format. Note that you can
also define a configuration file with these parameters that you can then
conveniently pass to the ``-c`` option so that you don't have to supply the
command line args all the time.
Thus, to execute our algorithm from above and save the results to
``buy_btc_simple_out.pickle`` we would call ``catalyst run`` as follows:
.. code-block:: python
catalyst run -f buy_btc_simple.py -x bitfinex --start 2016-1-1 --end 2017-9-30 -o buy_btc_simple_out.pickle
catalyst run -f buy_btc_simple.py -x bitfinex --start 2016-1-1 --end 2017-9-30 -c usd --capital-base 100000 -o buy_btc_simple_out.pickle
.. parsed-literal::
@@ -253,17 +293,25 @@ slippage model that ``catalyst`` uses).
.. see the `Quantopian docs <https://www.quantopian.com/help#ide-slippage>`__
.. for more information).
Let's take a quick look at the performance ``DataFrame``. For this, we
use ``pandas`` from inside the IPython Notebook and print the first ten
rows. Note that ``catalyst`` makes heavy usage of
`pandas <http://pandas.pydata.org/>`_, especially for data input and
outputting so it's worth spending some time to learn it.
Let's take a quick look at the performance ``DataFrame``. For this, we write
different Python script--let's call it ``print_results.py``--and we make use of
the fantastic ``pandas`` library to print the first ten rows. Note that
``catalyst`` makes heavy usage of `pandas <http://pandas.pydata.org/>`_,
especially for data analysis and outputting so it's worth spending some time to
learn it.
.. code-block:: python
import pandas as pd
perf = pd.read_pickle('buy_btc_simple_out.pickle') # read in perf DataFrame
perf.head()
print(perf.head())
Which we execute by running:
.. code-block:: bash
$ python print_results.py
.. raw:: html
@@ -429,30 +477,48 @@ and allows us to plot the price of bitcoin. For example, we could easily
examine now how our portfolio value changed over time compared to the
bitcoin price.
.. code-block:: python
%load_ext catalyst
Now we will run the simulation again, but this time we extend our original
algorithm with the addition of the ``analyze()`` function. Somewhat analogously
as how ``initialize()`` gets called once before the start of the algorith,
``analyze()`` gets called once at the end of the algorithm, and receives two
variables: ``context``, which we discussed at the very beginning, and ``perf``,
which is the pandas dataframe containing the performance data for our algorithm
that we reviewed above. Inside the ``analyze()`` function is where we can
analyze and visualize the results of our strategy. Here's the revised simple
algorithm (note the addition of Line 1, and Lines 11-18)
.. code-block:: python
%pylab inline
figsize(12, 12)
import matplotlib.pyplot as plt
from catalyst.api import order, record, symbol
ax1 = plt.subplot(211)
perf.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('portfolio value')
ax2 = plt.subplot(212, sharex=ax1)
perf.btc.plot(ax=ax2)
ax2.set_ylabel('bitcoin price')
def initialize(context):
context.asset = symbol('btc_usd')
.. parsed-literal::
def handle_data(context, data):
order(context.asset, 1)
record(btc = data.current(context.asset, 'price'))
Populating the interactive namespace from numpy and matplotlib
def analyze(context, perf):
ax1 = plt.subplot(211)
perf.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('portfolio value')
ax2 = plt.subplot(212, sharex=ax1)
perf.btc.plot(ax=ax2)
ax2.set_ylabel('bitcoin price')
plt.show()
.. parsed-literal::
Here we make use of the external visualization library called
`matplotlib <https://matplotlib.org/>`_, which you might recall we installed
alongside enigma-catalyst (with the exception of the ``Conda`` install, where it
was included by default inside the conda environment we created). If for any
reason you don't have it installed, you can add it by running:
<matplotlib.text.Text at 0x10eaeadd0>
.. code-block:: python
(catalyst)$ pip install matplotlib
If everything works well, you'll see the following chart:
.. image:: https://s3.amazonaws.com/enigmaco-docs/github.io/buy_btc_simple_graph.png
@@ -460,6 +526,22 @@ Our algorithm performance as assessed by the ``portfolio_value`` closely
matches that of the bitcoin price. This is not surprising as our algorithm
only bought bitcoin every chance it got.
If you get an error when invoking matplotlib to visualize the performance
results refer to `MacOS + Matplotlib <install.html#macos-virtualenv-matplotlib>`_.
Alternatively, some users have reported the following error when running an algo
in a Linux environment:
.. parsed-literal::
ImportError: No module named _tkinter, please install the python-tk package
Which can easily solved by running (in Ubuntu/Debian-based systems):
.. code-block:: python
sudo apt install python-tk
Access to previous prices using ``history``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+7 -7
View File
@@ -80,7 +80,7 @@ Once either Conda or MiniConda has been set up you can install Catalyst:
4. Activate the environment (which you need to do every time you start a new
session to run Catalyst):
**Linux or OSX:**
**Linux or MacOS:**
.. code-block:: bash
@@ -125,7 +125,7 @@ with the following steps:
3. Activate the environment:
**Linux or OSX:**
**Linux or MacOS:**
.. code-block:: bash
@@ -358,11 +358,11 @@ beginning of this page.
MacOS Requirements
------------------
The version of Python shipped with OSX by default is generally out of date,
The version of Python shipped with MacOS by default is generally out of date,
and has a number of quirks because it's used directly by the operating system.
For these reasons, many developers choose to install and use a separate Python
installation. The `Hitchhiker's Guide to Python`_ provides an excellent guide
to `Installing Python on OSX <http://docs.python-guide.org/en/latest/>`_,
to `Installing Python on MacOS <http://docs.python-guide.org/en/latest/>`_,
which explains how to install Python with the `Homebrew`_ manager.
Assuming you've installed Python with Homebrew, you'll also likely need the
@@ -372,17 +372,17 @@ following brew packages:
$ brew install freetype pkg-config gcc openssl
OSX + virtualenv + matplotlib
MacOS + virtualenv + matplotlib
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A note about using matplotlib in virtual enviroments on OSX: it may be
A note about using matplotlib in virtual enviroments on MacOS: it may be
necessary to run
.. code-block:: bash
echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc
in order to override the default ``macosx`` backend for your system, which
in order to override the default ``MacOS`` backend for your system, which
may not be accessible from inside the virtual environment. This will allow
Catalyst to open matplotlib charts from within a virtual environment, which
is useful for displaying the performance of your backtests. To learn more