Catalyst Beginner Tutorial¶
Basics¶
Catalyst is an open-source algorithmic trading simulator for crypto assets written in Python. The source code can be found at: https://github.com/enigmampc/catalyst
Some benefits include:
- Support for several of the top crypto-exchanges by trading volume.
- Realistic: slippage, transaction costs, order delays.
- Stream-based: Process each event individually, avoids look-ahead bias.
- Batteries included: Common transforms (moving average) as well as common risk calculations (Sharpe).
- Developed and continuously updated by Enigma MPC which is building the Enigma data marketplace protocol as well as Catalyst, the first application that will run on our protocol. Powered by our financial data marketplace, Catalyst empowers users to share and curate data and build profitable, data-driven investment strategies.
This tutorial assumes that you have Catalyst correctly installed, see the Install section if you haven’t set up Catalyst yet.
Every catalyst algorithm consists of at least two functions you have to
define:
initialize(context)handle_data(context, data)
Before the start of the algorithm, catalyst calls the
initialize() function and passes in a context variable.
context is a persistent namespace for you to store variables you
need to access from one algorithm iteration to the next.
After the algorithm has been initialized, catalyst calls the
handle_data() function on each iteration, that’s one per day (daily) or
once every minute (minute), depending on the frequency we choose to run our
simulation. On every iteration, handle_data() passes the same context
variable and an event-frame called data containing the current trading bar
with open, high, low, and close (OHLC) prices as well as volume for each
crypto asset in your universe.
My first algorithm¶
Lets take a look at a very simple algorithm from the examples directory:
buy_btc_simple.py:
from catalyst.api import order, record, symbol
def initialize(context):
context.asset = symbol('btc_usd')
def handle_data(context, data):
order(context.asset, 1)
record(btc = data.current(context.asset, 'price'))
As you can see, we first have to import some functions we would like to
use. All functions commonly used in your algorithm can be found in
catalyst.api. Here we are using order() which takes
twoarguments: a cryptoasset object, and a number specifying how many assets you
wouldlike to order (if negative, order() will sell/short
assets). In this case we want to order 1 bitcoin at each iteration.
Finally, the record() function allows you to save the value
of a variable at each iteration. You provide it with a name for the variable
together with the variable itself: varname=var. After the algorithm
finished running you will have access to each variable value you tracked
with record() under the name you provided (we will see this
further below). You also see how we can access the current price data of
a bitcoin in the data event frame.
Ingesting data¶
Before you can backtest your algorithm, you first need to load the historical
pricing data that Catalyst needs to run your simulation through a process called
ingestion. When you ingest data, Catalyst downloads that data in compressed
form from the Enigma servers (which eventually will migrate to the Enigma Data
Marketplace), and stores it locally to make it available at runtime.
In order to ingest data, you need to run a command like the following:
catalyst ingest-exchange -x bitfinex -i btc_usd
This instructs Catalyst to download pricing data from the Bitfinex exchange
for the btc_usd currency pair (this follows from the simple algorithm
presented above where we want to trade btc_usd), and we’re choosing to test
our algorithm using historical pricing data from the Bitfinex exchange. By
default, Catalyst assumes that you want data with daily frequency (one candle
bar per day). If you want instead minute frequency (one candle bar for every
minute), you would need to specify it as follows:
catalyst ingest-exchange -x bitfinex -i btc_usd -f minute
Ingesting exchange bundle bitfinex...
[====================================] Ingesting daily price data on bitfinex: 100%
We believe it is important for you to have a high-level understanding of how data is managed, hence the following overview:
- Pricing data is split and packaged into
bundles: chunks of data organized as time series that are kept up to date daily on Enigma’s servers. Catalyst downloads the requested bundles and reconstructs the full dataset in your hard drive. - Pricing data is provided in
dailyandminuteresolution. Those are different bundle datasets, and are managed separately. - Bundles are exchange-specific, as the pricing data is specific to the trades that happen in each exchange. As a result, you can must specify which exchange you want pricing data from when ingesting data
- Catalyst keeps track of all the downloaded bundles, so that it only has to download them once, and will do incremental updates as needed.
- When running in
live tradingmode, Catalyst will first look for historical pricing data in the locally stored bundles. If there is anything missing, Catalyst will hit the exchange for the most recent data, and merge it with the local bundle to optimize the number of requests it needs to make to the exchange.
The ingest-exchange command in catalyst offers additional parameters to
further tweak the data ingestion process. You can learn more by running the
following from the command line:
catalyst ingest-exchange --help
Running the algorithm¶
You can now test your algorithm using cryptoassets’ historical pricing data,
catalyst provides three interfaces:
- A command-line interface (CLI),
- the
IPython Notebookmagic, - and a
run_algorithm()that you can call from other Python scripts.
We’ll start with the CLI, and introduce the IPython Notebook below. Some of
the example algorithms provide instructions on how to run
them both from the CLI, and using the run_algorithm() function.
Command line interface¶
After you installed Catalyst, you should be able to execute the following
from your command line (e.g. cmd.exe or the Anaconda Prompt on Windows,
or the Terminal application on MacOS).
$ catalyst --help
This is the resulting output, simplified for eductional purposes:
Usage: catalyst [OPTIONS] COMMAND [ARGS]...
Top level catalyst entry point.
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
ingest-exchange Ingest data for the given exchange.
live Trade live with the given algorithm.
run Run a backtest for the given algorithm.
There are three main modes you can run on Catalyst. The first being
ingest-exchange for data ingestion, which we have covered in the previous
section. The second is live to use your algorithm to trade live against a
given exchange, and the third mode run is to backtest your algorithm before
trading live with it.
Let’s start with backtesting, so run this other command to learn more about the available options:
$ catalyst run --help
Usage: catalyst run [OPTIONS]
Run a backtest for the given algorithm.
Options:
-f, --algofile FILENAME The file that contains the algorithm to run.
-t, --algotext TEXT The algorithm script to run.
-D, --define TEXT Define a name to be bound in the namespace
before executing the algotext. For example
'-Dname=value'. The value may be any python
expression. These are evaluated in order so
they may refer to previously defined names.
--data-frequency [daily|minute]
The data frequency of the simulation.
[default: daily]
--capital-base FLOAT The starting capital for the simulation.
[default: 10000000.0]
-b, --bundle BUNDLE-NAME The data bundle to use for the simulation.
[default: poloniex]
--bundle-timestamp TIMESTAMP The date to lookup data on or before.
[default: <current-time>]
-s, --start DATE The start date of the simulation.
-e, --end DATE The end date of the simulation.
-o, --output FILENAME The location to write the perf data. If this
is '-' the perf will be written to stdout.
[default: -]
--print-algo / --no-print-algo Print the algorithm to stdout.
-x, --exchange-name [poloniex|bitfinex|bittrex]
The name of the targeted exchange
(supported: bitfinex, bittrex, poloniex).
-n, --algo-namespace TEXT A label assigned to the algorithm for data
storage purposes.
-c, --base-currency TEXT The base currency used to calculate
statistics (e.g. usd, btc, eth).
--help Show this message and exit.
As you can see there are a couple of flags that specify where to find your
algorithm (-f) as well as a the -x flag to specify which exchange to
use. There are also arguments for the date range to run the algorithm over
(--start and --end). You also need to set the base currency for your
algorithm through the -c flag, and the --capital_base. All the
aforementioned parameters are required. Optionally, you will want to save the
performance metrics of your algorithm so that you can analyze how it performed.
This is done via the --output flag and will cause it to write the
performance DataFrame in the pickle Python file format. Note that you can
also define a configuration file with these parameters that you can then
conveniently pass to the -c option so that you don’t have to supply the
command line args all the time.
Thus, to execute our algorithm from above and save the results to
buy_btc_simple_out.pickle we would call catalyst run as follows:
catalyst run -f buy_btc_simple.py -x bitfinex --start 2016-1-1 --end 2017-9-30 -c usd --capital-base 100000 -o buy_btc_simple_out.pickle
INFO: run_algo: running algo in backtest mode
INFO: exchange_algorithm: initialized trading algorithm in backtest mode
INFO: Performance: Simulated 639 trading days out of 639.
INFO: Performance: first open: 2016-01-01 00:00:00+00:00
INFO: Performance: last close: 2017-09-30 23:59:00+00:00
run first calls the initialize() function, and then
streams the historical asset price day-by-day through handle_data().
After each call to handle_data() we instruct catalyst to order 1
bitcoin. After the call of the order() function, catalyst
enters the ordered stock and amount in the order book. After the
handle_data() function has finished, catalyst looks for any open
orders and tries to fill them. If the trading volume is high enough for
this asset, the order is executed after adding the commission and
applying the slippage model which models the influence of your order on
the stock price, so your algorithm will be charged more than just the
asset price. (Note, that you can also change the commission and
slippage model that catalyst uses).
Let’s take a quick look at the performance DataFrame. For this, we write
different Python script–let’s call it print_results.py–and we make use of
the fantastic pandas library to print the first ten rows. Note that
catalyst makes heavy usage of pandas,
especially for data analysis and outputting so it’s worth spending some time to
learn it.
import pandas as pd
perf = pd.read_pickle('buy_btc_simple_out.pickle') # read in perf DataFrame
print(perf.head())
Which we execute by running:
$ python print_results.py
| algo_volatility | algorithm_period_return | alpha | benchmark_period_return | benchmark_volatility | beta | btc | capital_used | ending_cash | ending_exposure | ... | short_exposure | short_value | shorts_count | sortino | starting_cash | starting_exposure | starting_value | trading_days | transactions | treasury_period_return | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2016-01-01 23:59:00+00:00 | NaN | 0.000000e+00 | NaN | -0.010937 | NaN | NaN | 433.979999 | 0.000000 | 1.000000e+07 | 0.00 | ... | 0 | 0 | 0 | NaN | 1.000000e+07 | 0.00 | 0.00 | 1 | [] | 0.0227 |
| 2016-01-02 23:59:00+00:00 | 0.000011 | -9.536708e-07 | -0.000170 | -0.006480 | 0.173338 | -0.000062 | 432.700000 | -442.236708 | 9.999558e+06 | 432.70 | ... | 0 | 0 | 0 | -11.224972 | 1.000000e+07 | 0.00 | 0.00 | 2 | [{u'order_id': u'7869f7828fa140328eb40477bb7de... | 0.0227 |
| 2016-01-03 23:59:00+00:00 | 0.000011 | -2.328842e-06 | -0.000176 | -0.026512 | 0.197857 | 0.000009 | 428.390000 | -437.831716 | 9.999120e+06 | 856.78 | ... | 0 | 0 | 0 | -12.754262 | 9.999558e+06 | 432.70 | 432.70 | 3 | [{u'order_id': u'be62ff77760c4599abaac43be9cc9... | 0.0227 |
| 2016-01-04 23:59:00+00:00 | 0.000011 | -2.380954e-06 | -0.000139 | -0.008640 | 0.269790 | 0.000020 | 432.900000 | -442.441116 | 9.998677e+06 | 1298.70 | ... | 0 | 0 | 0 | -11.287205 | 9.999120e+06 | 856.78 | 856.78 | 4 | [{u'order_id': u'd6dca79513214346a646079213526... | 0.0224 |
| 2016-01-05 23:59:00+00:00 | 0.000011 | -3.650729e-06 | -0.000158 | -0.021426 | 0.245989 | 0.000024 | 431.840000 | -441.357754 | 9.998236e+06 | 1727.36 | ... | 0 | 0 | 0 | -12.333847 | 9.998677e+06 | 1298.70 | 1298.70 | 5 | [{u'order_id': u'505275d6646a41f3856b22b16678d... | 0.0225 |
There is a row for each trading day, starting on the first day of our
simulation Jan 1st, 2016. In the columns you can find various
information about the state of your algorithm. The column
btc was placed there by the record() function mentioned earlier
and allows us to plot the price of bitcoin. For example, we could easily
examine now how our portfolio value changed over time compared to the
bitcoin price.
Now we will run the simulation again, but this time we extend our original
algorithm with the addition of the analyze() function. Somewhat analogously
as how initialize() gets called once before the start of the algorith,
analyze() gets called once at the end of the algorithm, and receives two
variables: context, which we discussed at the very beginning, and perf,
which is the pandas dataframe containing the performance data for our algorithm
that we reviewed above. Inside the analyze() function is where we can
analyze and visualize the results of our strategy. Here’s the revised simple
algorithm (note the addition of Line 1, and Lines 11-18)
import matplotlib.pyplot as plt
from catalyst.api import order, record, symbol
def initialize(context):
context.asset = symbol('btc_usd')
def handle_data(context, data):
order(context.asset, 1)
record(btc = data.current(context.asset, 'price'))
def analyze(context, perf):
ax1 = plt.subplot(211)
perf.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('portfolio value')
ax2 = plt.subplot(212, sharex=ax1)
perf.btc.plot(ax=ax2)
ax2.set_ylabel('bitcoin price')
plt.show()
Here we make use of the external visualization library called
matplotlib, which you might recall we installed
alongside enigma-catalyst (with the exception of the Conda install, where it
was included by default inside the conda environment we created). If for any
reason you don’t have it installed, you can add it by running:
(catalyst)$ pip install matplotlib
If everything works well, you’ll see the following chart:
Our algorithm performance as assessed by the portfolio_value closely
matches that of the bitcoin price. This is not surprising as our algorithm
only bought bitcoin every chance it got.
If you get an error when invoking matplotlib to visualize the performance results refer to MacOS + Matplotlib. Alternatively, some users have reported the following error when running an algo in a Linux environment:
ImportError: No module named _tkinter, please install the python-tk packageWhich can easily solved by running (in Ubuntu/Debian-based systems):
sudo apt install python-tk
Access to previous prices using history¶
Working example: Dual Moving Average Cross-Over¶
The Dual Moving Average (DMA) is a classic momentum strategy. It’s probably not used by any serious trader anymore but is still very instructive. The basic idea is that we compute two rolling or moving averages (mavg) – one with a longer window that is supposed to capture long-term trends and one shorter window that is supposed to capture short-term trends. Once the short-mavg crosses the long-mavg from below we assume that the stock price has upwards momentum and long the stock. If the short-mavg crosses from above we exit the positions as we assume the stock to go down further.
As we need to have access to previous prices to implement this strategy we need a new concept: History
data.history() is a convenience function that keeps a rolling window of
data for you. The first argument is the number of bars you want to
collect, the second argument is the unit (either '1d' for '1m'
but note that you need to have minute-level data for using 1m). This is
a function we use in the handle_data() section:
%load_ext catalyst
%%catalyst --start 2016-4-1 --end 2017-9-30 -x bitfinex
from catalyst.api import order, record, symbol, order_target
def initialize(context):
context.i = 0
context.asset = symbol('btc_usd')
def handle_data(context, data):
# Skip first 150 days to get full windows
context.i += 1
if context.i < 150:
return
# Compute averages
# data.history() has to be called with the same params
# from above and returns a pandas dataframe.
short_mavg = data.history(context.asset, 'price', bar_count=50, frequency="1d").mean()
long_mavg = data.history(context.asset, 'price', bar_count=150, frequency="1d").mean()
# Trading logic
if short_mavg > long_mavg:
# order_target orders as many shares as needed to
# achieve the desired number of shares.
order_target(context.asset, 100)
elif short_mavg < long_mavg:
order_target(context.asset, 0)
# Save values for later inspection
record(btc=data.current(context.asset, 'price'),
short_mavg=short_mavg,
long_mavg=long_mavg)
def analyze(context, perf):
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12,12))
ax1 = fig.add_subplot(211)
perf.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('portfolio value in $')
ax2 = fig.add_subplot(212)
perf['btc'].plot(ax=ax2)
perf[['short_mavg', 'long_mavg']].plot(ax=ax2)
perf_trans = perf.ix[[t != [] for t in perf.transactions]]
buys = perf_trans.ix[[t[0]['amount'] > 0 for t in perf_trans.transactions]]
sells = perf_trans.ix[
[t[0]['amount'] < 0 for t in perf_trans.transactions]]
ax2.plot(buys.index, perf.short_mavg.ix[buys.index],
'^', markersize=10, color='m')
ax2.plot(sells.index, perf.short_mavg.ix[sells.index],
'v', markersize=10, color='k')
ax2.set_ylabel('price in $')
plt.legend(loc=0)
plt.show()
Here we are explicitly defining an analyze() function that gets
automatically called once the backtest is done.
Although it might not be directly apparent, the power of history()
(pun intended) can not be under-estimated as most algorithms make use of
prior market developments in one form or another. You could easily
devise a strategy that trains a classifier with
scikit-learn which tries to
predict future market movements based on past prices (note, that most of
the scikit-learn functions require numpy.ndarrays rather than
pandas.DataFrames, so you can simply pass the underlying
ndarray of a DataFrame via .values).
We also used the order_target() function above. This and other
functions like it can make order management and portfolio rebalancing
much easier.
Conclusions¶
We hope that this tutorial gave you a little insight into the
architecture, API, and features of catalyst. For next steps, check
out some of the
examples.
The natural next step would be too look into the
buy_and_hodl
example, which is a more elaborated and realistic version of the buy_btc_simple example presented in this tutorial.
Feel free to ask questions on the #catalyst_dev channel of our
Discord group and report
problems on our GitHub issue tracker.