mirror of https://github.com/wassname/catalyst.git synced 2026-07-03 13:26:01 +08:00

T

Scott Sanderson 8c38278783 ENH: Rewrite treasury loader using pandas.

Replaces our custom XML parsing with a single call to `pd.read_csv`
against the federal reserve's API.  This produces nearly identical
results as compared to the old loader, but it's dramatically simpler and
roughly 10x faster on my machine.

The average difference in magnitude between new and old is approximately
10e-7, and only one entry is different to a degree greater than the
number of significant figures provided by treasury.gov.

Additionally, the new loader correctly ignores Columbus Day of 2010, for
which the old loader erroneously produced an all-NaN row.

This also changes the interface that treasury modules modules are
required to implement. Modules must now supply a `get_treasury_data`
function that returns a `DataFrame` with a daily `DatetimeIndex` and a
column for each supported treasury duration.

Detailed comparison between results from new and old loader::

    from zipline.data.treasuries import get_treasury_data
    new = get_treasury_data() # New implementation
    old = pd.read_csv(  # Previously cached data
        '/home/ssanderson/.zipline/data/treasury_curves.csv'
        parse_dates=[0],
        index_col=0,
    )
    # These columns were unused.
    del old['tid']; del old['date']
    old = old.tz_localize('UTC')
    old.dropna(how='all')
    # old data erroneously contained an all-NaN entry for Columbus Day
    # in 2010.  Remove before comparing.
    old = old.dropna(how='all')

    In [25]: len(new) == len(old)
    Out[25]: True

    In [26]: abs(old - new).max()
    Out[26]:
    10year    2.000000e-04
    1month    6.938894e-18
    1year     1.000000e-04
    20year    1.000000e-04
    2year     2.000000e-04
    30year    1.000000e-04
    3month    1.000000e-03
    3year     1.000000e-04
    5year     1.387779e-17
    6month    1.000000e-04
    7year     1.000000e-04
    dtype: float64

    In [27]: abs(old - new).mean()
    Out[27]:
    10year    3.097414e-08
    1month    4.396534e-19
    1year     1.548707e-08
    20year    3.624502e-08
    2year     4.646120e-08
    30year    1.830496e-08
    3month    1.549427e-07
    3year     1.548707e-08
    5year     1.702619e-18
    6month    1.548707e-08
    7year     1.548707e-08
    dtype: float64

Since www.treasury.gov only reports values up to three significant
digits, we should only care about differences of greater than 1e-3.

There is exactly one such difference: the entry for the three month bond
on 1999-10-01::

    In [60]: new[(abs(new - old) >= 1e-3).any(axis=1)].T
    Out[60]:
    Time Period  1999-10-01 00:00:00+00:00
    1month                             NaN
    3month                          0.0498
    6month                          0.0501
    1year                           0.0530
    2year                           0.0573
    3year                           0.0583
    5year                           0.0590
    7year                           0.0622
    10year                          0.0600
    20year                          0.0657
    30year                          0.0615

    In [61]: old[(abs(new - old) >= 1e-3).any(axis=1)].T
    Out[61]:
            1999-10-01 00:00:00+00:00
            10year                     0.0600
            1month                        NaN
            1year                      0.0530
            20year                     0.0657
            2year                      0.0573
            30year                     0.0615
            3month                     0.0488
            3year                      0.0583
            5year                      0.0590
            6month                     0.0501
            7year                      0.0622

The US Treasury website (our old source) provides a value of 0.488 here,
whereas the Federal Reserve site (our new source) provides a value of
0.498.

2015-10-25 16:37:59 -04:00

conda

BLD Add cython as a conda build dependency.

2015-03-06 14:53:25 +01:00

docs

ENH Add command line option for printing algo on stdout, default is false.

2015-08-05 10:29:56 +02:00

etc

BLD: blaze ecosystem commits

2015-10-19 16:35:03 -04:00

scripts

ENH Add command line option for printing algo on stdout, default is false.

2015-08-05 10:29:56 +02:00

tests

TST: test history() in before_trading_start()

2015-10-23 13:32:36 -04:00

zipline

ENH: Rewrite treasury loader using pandas.

2015-10-25 16:37:59 -04:00

.bumpversion.cfg

Bump version: 0.8.0 → 0.8.0rc1

2015-02-13 17:53:02 +01:00

.coveragerc

BLD: Add coverage integration.

2014-06-18 19:59:06 +02:00

.dir-locals.el

STY: Normalize styles across installations via .dir-locals.el

2014-06-04 00:06:58 -04:00

.gitignore

MAINT: Remove msgpack as a dependency.

2013-10-01 14:28:11 -04:00

.travis.yml

BLD: show versions on travis

2015-10-19 16:35:02 -04:00

AUTHORS

DOC Add authors file as well as script to create it.

2015-02-13 14:13:22 +01:00

LICENSE

Adds Apache License, Version 2.0

2012-10-08 17:32:40 -04:00

MANIFEST.in

adding data files to the egg for pip distribution.

2015-02-05 13:57:40 -05:00

mkdocs.yml

DOC Fix release-notes filename in mkdocs.yml.

2015-02-13 13:54:48 +01:00

README.md

Update README.md

2015-10-09 21:59:05 -04:00

setup.cfg

BLD Remove setup.py rst conversion. Add README.md as pypi description.

2015-02-13 13:36:42 +01:00

setup.py

BLD: Can't use setup.py with git reqs unless you want to do a lot of work

2015-10-19 16:35:02 -04:00

vagrant_init.sh

BLD: Add "--exists-action w" to pip invocations

2015-07-23 15:58:13 -04:00

Vagrantfile

Add a VirtualBox-based Vagrant config file.

2013-07-02 10:53:04 -04:00

README.md

Zipline

![Gitter](https://badges.gitter.im/Join Chat.svg)

Zipline is a Pythonic algorithmic trading library. The system is fundamentally event-driven and a close approximation of how live-trading systems operate.

Zipline is currently used in production as the backtesting engine powering Quantopian Inc. -- a free, community-centered platform that allows development and real-time backtesting of trading algorithms in the web browser.

Join our community!

Want to contribute? See our open requests and our general guidelines below.

Features

Ease of use: Zipline tries to get out of your way so that you can focus on algorithm development. See below for a code example.
Zipline comes "batteries included" as many common statistics like moving average and linear regression can be readily accessed from within a user-written algorithm.
Input of historical data and output of performance statistics is based on Pandas DataFrames to integrate nicely into the existing Python eco-system.
Statistic and machine learning libraries like matplotlib, scipy, statsmodels, and sklearn support development, analysis and visualization of state-of-the-art trading systems.

Installation

The easiest way to install Zipline is via conda which comes as part of Anaconda or can be installed via pip install conda.

Once set up, you can install Zipline from our Quantopian channel:

conda install -c Quantopian zipline

Currently supported platforms include:

Windows 32-bit (can be 64-bit Windows but has to be 32-bit Anaconda)
OSX 64-bit
Linux 64-bit

PIP

Alternatively you can install Zipline via the more traditional pip command. Since zipline is pure-python code it should be very easy to install and set up:

pip install numpy   # Pre-install numpy to handle dependency chain quirk
pip install zipline

If there are problems installing the dependencies or zipline we recommend installing these packages via some other means. For Windows, the Enthought Python Distribution includes most of the necessary dependencies. On OSX, the Scipy Superpack works very well.

Dependencies

Python (2.7 or 3.3)
numpy (>= 1.6.0)
pandas (>= 0.9.0)
pytz
Logbook
requests
python-dateutil (>= 2.1)
ta-lib

Quickstart

See our getting started tutorial.

The following code implements a simple dual moving average algorithm.

from zipline.api import order_target, record, symbol, history, add_history


def initialize(context):
    # Register 2 histories that track daily prices,
    # one with a 100 window and one with a 300 day window
    add_history(100, '1d', 'price')
    add_history(300, '1d', 'price')

    context.i = 0


def handle_data(context, data):
    # Skip first 300 days to get full windows
    context.i += 1
    if context.i < 300:
        return

    # Compute averages
    # history() has to be called with the same params
    # from above and returns a pandas dataframe.
    short_mavg = history(100, '1d', 'price').mean()
    long_mavg = history(300, '1d', 'price').mean()

    sym = symbol('AAPL')

    # Trading logic
    if short_mavg[sym] > long_mavg[sym]:
        # order_target orders as many shares as needed to
        # achieve the desired number of shares.
        order_target(sym, 100)
    elif short_mavg[sym] < long_mavg[sym]:
        order_target(sym, 0)

    # Save values for later inspection
    record(AAPL=data[sym].price,
           short_mavg=short_mavg[sym],
           long_mavg=long_mavg[sym])

You can then run this algorithm using the Zipline CLI. From the command line, run:

python run_algo.py -f dual_moving_average.py --symbols AAPL --start 2011-1-1 --end 2012-1-1 -o dma.pickle

This will download the AAPL price data from Yahoo! Finance in the specified time range and stream it through the algorithm and save the resulting performance dataframe to dma.pickle which you can then load and analyze from within python.

You can find other examples in the zipline/examples directory.

Contributions

If you would like to contribute, please see our Contribution Requests: https://github.com/quantopian/zipline/wiki/Contribution-Requests

Languages

Python 91.2%

Jupyter Notebook 5.1%

Cython 3.2%

Shell 0.2%

Batchfile 0.2%