Catalyst Beginner Tutorial

Basics

Catalyst is an open-source algorithmic trading simulator for crypto assets written in Python.

The source can be found at: https://github.com/enigmampc/catalyst

Some benefits include:

  • Support for several of the top crypto-exchanges by trading volume.
  • Realistic: slippage, transaction costs, order delays.
  • Stream-based: Process each event individually, avoids look-ahead bias.
  • Batteries included: Common transforms (moving average) as well as common risk calculations (Sharpe).
  • Developed and continuously updated by Enigma MPC which is building the Enigma data marketplace protocol as well as Catalyst, the first application that will run on our protocol. Powered by our financial data marketplace, Catalyst empowers users to share and curate data and build profitable, data-driven investment strategies.

This tutorial assumes that you have Catalyst correctly installed, see the installation instructions if you haven’t set up Catalyst yet.

Every catalyst algorithm consists of at least two functions you have to define:

  • initialize(context)
  • handle_data(context, data)

Before the start of the algorithm, catalyst calls the initialize() function and passes in a context variable. context is a persistent namespace for you to store variables you need to access from one algorithm iteration to the next.

After the algorithm has been initialized, catalyst calls the handle_data() function once for each event. At every call, it passes the same context variable and an event-frame called data containing the current trading bar with open, high, low, and close (OHLC) prices as well as volume for each crypto asset in your universe.

My first algorithm

Lets take a look at a very simple algorithm from the examples directory: buy_btc_simple.py:

from catalyst.api import order, record, symbol


def initialize(context):
    context.asset = symbol('btc_usd')


def handle_data(context, data):
    order(context.asset, 1)
    record(btc = data.current(context.asset, 'price'))

As you can see, we first have to import some functions we would like to use. All functions commonly used in your algorithm can be found in catalyst.api. Here we are using order() which takes two arguments: a cryptoasset object, and a number specifying how many assets you would like to order (if negative, order() will sell/short assets). In this case we want to order 1 bitcoin at each iteration.

Finally, the record() function allows you to save the value of a variable at each iteration. You provide it with a name for the variable together with the variable itself: varname=var. After the algorithm finished running you will have access to each variable value you tracked with record() under the name you provided (we will see this further below). You also see how we can access the current price data of a bitcoin in the data event frame.

Running the algorithm

To can now test this algorithm on crypto data, catalyst provides three interfaces:

  • A command-line interface,
  • IPython Notebook magic,
  • and run_algorithm().

Ingesting data

In previous versions of Catalyst you needed to manually ingest data before running your algorithm to make it available at runtime. Starting with version 0.3, the algorithm will automagically ingest the data it needs the first time that encounters a data request for data that it doesn’t have.

Still, we believe it is important for you to have a high-level understanding of how data is managed:

  • Pricing data is split and packaged into bundles: chunks of data organized as time series that are kept up to date daily on Enigma’s servers. Catalyst downloads the bundles that needs at any given time, and reconstructs the whole dataset in your hard drive.
  • Pricing data is provided in daily and minute resolution. Those are different bundle datasets, and are managed separately.
  • Bundles are exchange-specific, as the pricing data is specific to the trades that happen in each exchange. You can optionally specify which exchange you want pricing data from.
  • Catalyst keeps track of all the downloaded bundles, so that it only has to download them once, and will do incremental updates as needed.
  • When running in live trading mode, Catalyst will first look for historical pricing data in the locally stored bundles. If there is anything missing, Catalyst will hit the exchange for the most recent data, and merge it with the local bundle to make it available for future iterations.

If you want to learn more, check out the ingesting data section for more detail.

Command line interface

After you installed Catalyst you should be able to execute the following from your command line (e.g. cmd.exe on Windows, or the Terminal app on OSX). Displaying here a simplified output for eductional purposes:

$ catalyst --help
Usage: catalyst [OPTIONS] COMMAND [ARGS]...

  Top level catalyst entry point.

Options:
  --version               Show the version and exit.
  --help                  Show this message and exit.

Commands:
  ingest-exchange  Ingest data for the given exchange.
  live             Trade live with the given algorithm.
  run              Run a backtest for the given algorithm.

There are three main modes you can run on Catalyst. The first being ingest-exchange for data ingestion, which we have summarized in the previous section. The second is live to use your algorithm to trade live against a given exchange, and the third mode run is to backtest your algorithm before trading live with it.

Let’s start with backtesting, so run this other command to learn more about the available options:

$ catalyst run --help
Usage: catalyst run [OPTIONS]

  Run a backtest for the given algorithm.

Options:
  -f, --algofile FILENAME         The file that contains the algorithm to run.
  -t, --algotext TEXT             The algorithm script to run.
  -D, --define TEXT               Define a name to be bound in the namespace
                                  before executing the algotext. For example
                                  '-Dname=value'. The value may be any python
                                  expression. These are evaluated in order so
                                  they may refer to previously defined names.
  --data-frequency [daily|minute]
                                  The data frequency of the simulation.
                                  [default: daily]
  --capital-base FLOAT            The starting capital for the simulation.
                                  [default: 10000000.0]
  -b, --bundle BUNDLE-NAME        The data bundle to use for the simulation.
                                  [default: poloniex]
  --bundle-timestamp TIMESTAMP    The date to lookup data on or before.
                                  [default: <current-time>]
  -s, --start DATE                The start date of the simulation.
  -e, --end DATE                  The end date of the simulation.
  -o, --output FILENAME           The location to write the perf data. If this
                                  is '-' the perf will be written to stdout.
                                  [default: -]
  --print-algo / --no-print-algo  Print the algorithm to stdout.
  -x, --exchange-name [poloniex|bitfinex|bittrex]
                                  The name of the targeted exchange
                                  (supported: bitfinex, bittrex, poloniex).
  -n, --algo-namespace TEXT       A label assigned to the algorithm for data
                                  storage purposes.
  -c, --base-currency TEXT        The base currency used to calculate
                                  statistics (e.g. usd, btc, eth).
  --help                          Show this message and exit.

As you can see there are a couple of flags that specify where to find your algorithm (-f) as well as a parameter to specify which exchange to use. There are also arguments for the date range to run the algorithm over (--start and --end). Finally, you’ll want to save the performance metrics of your algorithm so that you can analyze how it performed. This is done via the --output flag and will cause it to write the performance DataFrame in the pickle Python file format. Note that you can also define a configuration file with these parameters that you can then conveniently pass to the -c option so that you don’t have to supply the command line args all the time (see the .conf files in the examples directory).

Thus, to execute our algorithm from above and save the results to buy_btc_simple_out.pickle we would call catalyst run as follows:

catalyst run -f buy_btc_simple.py -x bitfinex --start 2016-1-1 --end 2017-9-30 -o buy_btc_simple_out.pickle
INFO: run_algo: running algo in backtest mode
INFO: exchange_algorithm: initialized trading algorithm in backtest mode
INFO: Performance: Simulated 639 trading days out of 639.
INFO: Performance: first open: 2016-01-01 00:00:00+00:00
INFO: Performance: last close: 2017-09-30 23:59:00+00:00

run first calls the initialize() function, and then streams the historical asset price day-by-day through handle_data(). After each call to handle_data() we instruct catalyst to order 1 bitcoin. After the call of the order() function, catalyst enters the ordered stock and amount in the order book. After the handle_data() function has finished, catalyst looks for any open orders and tries to fill them. If the trading volume is high enough for this asset, the order is executed after adding the commission and applying the slippage model which models the influence of your order on the stock price, so your algorithm will be charged more than just the asset price. (Note, that you can also change the commission and slippage model that catalyst uses).

Let’s take a quick look at the performance DataFrame. For this, we use pandas from inside the IPython Notebook and print the first ten rows. Note that catalyst makes heavy usage of pandas, especially for data input and outputting so it’s worth spending some time to learn it.

import pandas as pd
perf = pd.read_pickle('buy_btc_simple_out.pickle') # read in perf DataFrame
perf.head()
algo_volatility algorithm_period_return alpha benchmark_period_return benchmark_volatility beta btc capital_used ending_cash ending_exposure ... short_exposure short_value shorts_count sortino starting_cash starting_exposure starting_value trading_days transactions treasury_period_return
2016-01-01 23:59:00+00:00 NaN 0.000000e+00 NaN -0.010937 NaN NaN 433.979999 0.000000 1.000000e+07 0.00 ... 0 0 0 NaN 1.000000e+07 0.00 0.00 1 [] 0.0227
2016-01-02 23:59:00+00:00 0.000011 -9.536708e-07 -0.000170 -0.006480 0.173338 -0.000062 432.700000 -442.236708 9.999558e+06 432.70 ... 0 0 0 -11.224972 1.000000e+07 0.00 0.00 2 [{u'order_id': u'7869f7828fa140328eb40477bb7de... 0.0227
2016-01-03 23:59:00+00:00 0.000011 -2.328842e-06 -0.000176 -0.026512 0.197857 0.000009 428.390000 -437.831716 9.999120e+06 856.78 ... 0 0 0 -12.754262 9.999558e+06 432.70 432.70 3 [{u'order_id': u'be62ff77760c4599abaac43be9cc9... 0.0227
2016-01-04 23:59:00+00:00 0.000011 -2.380954e-06 -0.000139 -0.008640 0.269790 0.000020 432.900000 -442.441116 9.998677e+06 1298.70 ... 0 0 0 -11.287205 9.999120e+06 856.78 856.78 4 [{u'order_id': u'd6dca79513214346a646079213526... 0.0224
2016-01-05 23:59:00+00:00 0.000011 -3.650729e-06 -0.000158 -0.021426 0.245989 0.000024 431.840000 -441.357754 9.998236e+06 1727.36 ... 0 0 0 -12.333847 9.998677e+06 1298.70 1298.70 5 [{u'order_id': u'505275d6646a41f3856b22b16678d... 0.0225

There is a row for each trading day, starting on the first day of our simulation Jan 1st, 2016. In the columns you can find various information about the state of your algorithm. The column btc was placed there by the record() function mentioned earlier and allows us to plot the price of bitcoin. For example, we could easily examine now how our portfolio value changed over time compared to the bitcoin price.

%load_ext catalyst
%pylab inline
figsize(12, 12)
import matplotlib.pyplot as plt

ax1 = plt.subplot(211)
perf.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('portfolio value')
ax2 = plt.subplot(212, sharex=ax1)
perf.btc.plot(ax=ax2)
ax2.set_ylabel('bitcoin price')
Populating the interactive namespace from numpy and matplotlib
<matplotlib.text.Text at 0x10eaeadd0>
https://s3.amazonaws.com/enigmaco-docs/github.io/buy_btc_simple_graph.png

Our algorithm performance as assessed by the portfolio_value closely matches that of the bitcoin price. This is not surprising as our algorithm only bought bitcoin every chance it got.

Access to previous prices using history

Working example: Dual Moving Average Cross-Over

The Dual Moving Average (DMA) is a classic momentum strategy. It’s probably not used by any serious trader anymore but is still very instructive. The basic idea is that we compute two rolling or moving averages (mavg) – one with a longer window that is supposed to capture long-term trends and one shorter window that is supposed to capture short-term trends. Once the short-mavg crosses the long-mavg from below we assume that the stock price has upwards momentum and long the stock. If the short-mavg crosses from above we exit the positions as we assume the stock to go down further.

As we need to have access to previous prices to implement this strategy we need a new concept: History

data.history() is a convenience function that keeps a rolling window of data for you. The first argument is the number of bars you want to collect, the second argument is the unit (either '1d' for '1m' but note that you need to have minute-level data for using 1m). This is a function we use in the handle_data() section:

%%catalyst --start 2016-4-1 --end 2017-9-30 -x bitfinex

from catalyst.api import order, record, symbol, order_target

def initialize(context):
   context.i = 0
   context.asset = symbol('btc_usd')

def handle_data(context, data):
   # Skip first 150 days to get full windows
   context.i += 1
   if context.i < 150:
       return

   # Compute averages
   # data.history() has to be called with the same params
   # from above and returns a pandas dataframe.
   short_mavg = data.history(context.asset, 'price', bar_count=50, frequency="1d").mean()
   long_mavg = data.history(context.asset, 'price', bar_count=150, frequency="1d").mean()

   # Trading logic
   if short_mavg > long_mavg:
       # order_target orders as many shares as needed to
       # achieve the desired number of shares.
       order_target(context.asset, 100)
   elif short_mavg < long_mavg:
       order_target(context.asset, 0)

   # Save values for later inspection
   record(btc=data.current(context.asset, 'price'),
          short_mavg=short_mavg,
          long_mavg=long_mavg)

def analyze(context, perf):
   import matplotlib.pyplot as plt
   fig = plt.figure(figsize=(12,12))
   ax1 = fig.add_subplot(211)
   perf.portfolio_value.plot(ax=ax1)
   ax1.set_ylabel('portfolio value in $')

   ax2 = fig.add_subplot(212)
   perf['btc'].plot(ax=ax2)
   perf[['short_mavg', 'long_mavg']].plot(ax=ax2)

   perf_trans = perf.ix[[t != [] for t in perf.transactions]]
   buys = perf_trans.ix[[t[0]['amount'] > 0 for t in perf_trans.transactions]]
   sells = perf_trans.ix[
       [t[0]['amount'] < 0 for t in perf_trans.transactions]]
   ax2.plot(buys.index, perf.short_mavg.ix[buys.index],
            '^', markersize=10, color='m')
   ax2.plot(sells.index, perf.short_mavg.ix[sells.index],
            'v', markersize=10, color='k')
   ax2.set_ylabel('price in $')
   plt.legend(loc=0)
   plt.show()

Here we are explicitly defining an analyze() function that gets automatically called once the backtest is done.

Although it might not be directly apparent, the power of history() (pun intended) can not be under-estimated as most algorithms make use of prior market developments in one form or another. You could easily devise a strategy that trains a classifier with scikit-learn which tries to predict future market movements based on past prices (note, that most of the scikit-learn functions require numpy.ndarrays rather than pandas.DataFrames, so you can simply pass the underlying ndarray of a DataFrame via .values).

We also used the order_target() function above. This and other functions like it can make order management and portfolio rebalancing much easier.

Conclusions

We hope that this tutorial gave you a little insight into the architecture, API, and features of catalyst. For next steps, check out some of the examples. The natural next step would be too look into the buy_and_hodl example, which is a more elaborated and realistic version of the buy_btc_simple example presented in this tutorial.

Feel free to ask questions on the #catalyst_dev channel of our Discord group and report problems on our GitHub issue tracker.