Merge pull request #21 from snth/docs-snth

High level documentation organisation
This commit is contained in:
Thomas Wiecki
2012-11-14 06:07:22 -08:00
15 changed files with 631 additions and 107 deletions
+40
View File
@@ -0,0 +1,40 @@
***************************
Contributing to the project
***************************
Style Guide
===========
To ensure that changes and patches are focused on behavior changes,
the zipline codebase adheres to PEP-8,
`<http://www.python.org/dev/peps/pep-0008/>`_.
The maintainers check the code using the flake8 script,
`<https://github.com/jcrocholl/pep8/>`_, which is included in the
requirements_dev.txt.
Before submitting patches or pull requests, please ensure that your
changes pass
::
flake8 --ignore=E124,E125,E126 zipline tests
Discussion and Help
===================
Discussion of the project is held at the Google Group,
`<zipline@googlegroups.com>`_,
`<https://groups.google.com/forum/#!forum/zipline>`_.
Source
======
The source for Zipline is hosted at
`<https://github.com/quantopian/zipline>`_.
Contact
=======
For other questions, please contact `<opensource@quantopian.com>`_.
+4
View File
@@ -1,3 +1,7 @@
**********
Extensions
**********
.. highlight:: cython
Philosophy
+49 -47
View File
@@ -3,56 +3,60 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Contents:
.. module:: zipline
****************************************************
Zipline: Financial Backtester for Trading Algorithms
****************************************************
Python is quickly becoming the glue language which holds together data science
and related fields like quantitative finance. Zipline is a new, BSD-licensed
quantitative trading system which allows easy backtesting of investment
algorithms on historical data. The system is fundamentally event-driven and a
close approximation of how live-trading systems operate. Moreover, Zipline
comes "batteries included" as many common statistics like
moving average and linear regression can be readily accessed from within a
user-written algorithm. Input of historical data and output of performance
statistics is based on Pandas DataFrames to integrate nicely into the existing
Python eco-system. Furthermore, statistic and machine learning libraries like
matplotlib, scipy, statsmodels, and sklearn support development, analysis and
visualization of state-of-the-art trading systems.
Zipline is currently used in production as the backtesting engine
powering `quantopian.com <https://app.quantopian.com>`_ -- a free, community-centered
platform that allows development and real-time backtesting of trading
algorithms in the web browser.
Features
========
* Ease of use: Zipline tries to get out of your way so that you can focus on
algorithm development. See below for a code example.
* Zipline comes "batteries included" as many common statistics like moving
average and linear regression can be readily accessed from within a
user-written algorithm.
* Input of historical data and output of performance statistics is based on
Pandas DataFrames to integrate nicely into the existing Python eco-system.
* Statistic and machine learning libraries like matplotlib, scipy, statsmodels,
and sklearn support development, analysis and visualization of
state-of-the-art trading systems.
Contents
========
.. toctree::
:maxdepth: 4
notes.rst
manifesto.rst
installation.rst
quickstart.rst
contributing.rst
overview.rst
modules.rst
messaging.rst
Zipline
=======
Zipline runs backtests using asynchronous components and zeromq messaging for communication and coordination.
Simulator is the heart of Zipline, and the primary access point for creating, launching, and tracking simulations. You can find it in :py:class:`~zipline.core.Simulator`
Simulator Sub-Components
========================
Each simulation contains numerous subcomponents, each operating asynchronously from all others, and communicating
via zeromq.
DataSources
--------------------
A DataSource represents a historical event record, which will be played back during simulation. A simulation may have one or more DataSources, which will be combined in DataFeed. Generally, datasources read records from a persistent store (db, csv file, remote service), format the messages for downstream simulation components, and send them to a PUSH socket. See the base class for all datasources :py:class:`~zipline.messaging.DataSource` and the module holding all datasources :py:mod:`zipline.sources`
DataFeed
--------------------
All simulations start with a collection of :py:class:`~zipline.messaging.DataSource`, which need to be fed to an algorithm. Each :py:class:`~zipline.sources.DataSource`can contain events of differing content (trades, quotes, corporate event) and frequency (quarterly, intraday). To simplify the process of managing the data sources, :py:class:`~zipline.core.DataFeed` can receive events from multiple :py:class:`DataSources <zipline.sources.DataSource>` and combine them into a serial chronological stream.
Transforms
--------------------
Often, an algorithm will require a running calculation on top of a :py:class:`~zipline.messaging.DataSource`, or on the consolidated feed. A simple example is a technical indicator or a moving average. Transforms can be described in :py:class:`~zipline.core.Simulator`'s configuration. Subclass :py:class:`~zipline.transforms.core.Transform` to add your own Transform. Transforms must hold their own state between events, and serialize their current values into messages.
Data Alignment
--------------------
Like Datasources, Transforms have differing frequencies and results. Simulator manages the results of parallel transforms and **aligns** transform results with the raw DataFeed. Client algorithms simply receive a map of data, which includes the current event and all the transformed values.
Time Compression
--------------------
Review the unit test coverage_.
extensions.rst
Indices and tables
==================
@@ -60,5 +64,3 @@ Indices and tables
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. _coverage: cover/index.html
+45 -22
View File
@@ -1,20 +1,32 @@
System Setup
==============
You need to have zeromq installed - http://www.zeromq.org/intro:get-the-software.
************
Installation
************
Running
-------
Since zipline is pure-python code it should be very easy to install
and set up with pip:
Initial `virtualenv` setup::
::
$ mkvirtualenv zipline
$ workon zipline
#go get coffee, this will compile a heap of C/C++ code
$ ./etc/ordered_pip.sh requirements_sci.txt
$ ./etc/ordered_pip.sh requirements.txt
#optionally
$ ./etc/ordered_pip.sh requirements_dev.txt
pip install zipline
If there are problems installing the dependencies or zipline we
recommend installing these packages via some other means. For Windows,
the `Enthought Python Distribution
<http://www.enthought.com/products/epd.php>`_
includes most of the necessary dependencies. On OSX, the `Scipy Superpack
<http://fonnesbeck.github.com/ScipySuperpack/>`_ works very well.
Dependencies
------------
* Python (>= 2.7.2)
* numpy (>= 1.6.0)
* pandas (>= 0.9.0)
* pytz
* msgpack-python
* iso8601
* Logbook
* blist
Develop
@@ -51,7 +63,7 @@ For building distributable egg::
Tooling hints
================
QBT relies heavily on scientific python components (numpy, scikit, pandas, matplotlib, ipython, etc). Tooling up can be a pain, and it often involves managing a configuration including your OS, c/c++/fortran compilers, python version, and versions of numerous modules. I've found the following tools absolutely indispensable:
:mod:`zipline` relies heavily on scientific python components (numpy, scikit, pandas, matplotlib, ipython, etc). Tooling up can be a pain, and it often involves managing a configuration including your OS, c/c++/fortran compilers, python version, and versions of numerous modules. I've found the following tools absolutely indispensable:
- some kind of package manager for your platform. package managers generally give you a way to search, install, uninstall, and check currently installed packages. They also do a great job of managing dependencies.
- linux: yum/apt-get
@@ -78,11 +90,22 @@ Scientific python on the Mac can be a bit confusing because of the many independ
Data Sources
=============
The Backtest can handle multiple concurrent data sources. QBT will start a subprocess to run each datasource, and merge all events from all sources into a single serial feed, ordered by date.
Data sources have events with very different frequencies. For example, liquid stocks will trade many times per minute, while illiquid stocks may trade just once a day. In order to serialize events from all sources into a single feed, qbt loads events from all sources into memory, then sorts. The communication happens like this:
1. QBT requests the next event from each data source, ignoring date (i.e. just next in sequence for all)
2. Using the earliest date from all the events from all sources, QBT then asks for "next after <date>" from all sources.
3. All datasources send all events in their history from before <date>, moving their internal pointer forward to the next unsent event.
4. QBT merges all events in memory
5. goto 1!
The Backtest can handle multiple concurrent data sources. QBT will start a
subprocess to run each datasource, and merge all events from all sources into a
single serial feed, ordered by date.
Data sources have events with very different frequencies. For example, liquid
stocks will trade many times per minute, while illiquid stocks may trade just
once a day. In order to serialize events from all sources into a single feed,
qbt loads events from all sources into memory, then sorts. The communication
happens like this:
1. QBT requests the next event from each data source, ignoring date (i.e.
just next in sequence for all)
2. Using the earliest date from all the events from all sources, QBT then
asks for "next after <date>" from all sources.
3. All datasources send all events in their history from before <date>,
moving their internal pointer forward to the next unsent event.
4. QBT merges all events in memory
5. goto 1!
+140
View File
@@ -0,0 +1,140 @@
********************
Quantopian Manifesto
********************
Wall Street's culture was born in an age of information scarcity.
Hoarding information and keeping secrets were the norm. The world has changed.
Today's world is defined by information that wants to be free. The new
scarcity is people: people with the talent and drive to wring insight from all
of that data.
Quantopian's mission is to attract the world's algorithmic and
financial talent. We want to attract today's quants, and we want to
attract talent that hasn't yet had the opportunity to be a quant. We
want to bring this talent together, provide them with the tools that they
require, and help them build a community. First and foremost, our community is
rooted in openness and sharing. Members share code, know-how, and data.
Quantopian sets the tone by providing open-sourced code, discussing our
techniques, and supplying the historical data needed for algorithmic investing.
By educating more people about statistical arbitrage and data mining for
finance, we aim to dispense with the secrecy and raise the state of the art.
Rather than hoard data, we relentlessly push data to our community. We want to
diversify the data that can be mined, and permit our members to explore as much
as they like. Our members' success in analyzing and investing will help
us draw more data and more members to our community. Every individual's
success will also help other Quantopians.
The Evolution of Algorithmic Finance
====================================
Charting
--------
Algorithmic finance originated as chart reading. Chartists would look for
certain patterns in price history charts. The patterns were always graced with
artfully chosen names like 'head and shoulders,' 'spinning top', or 'morning
star'. Chart reading looks a lot like palm reading, and for the skeptics among
us the similarities don't end with appearances. Still, chart reading is an
attempt to infer the balance of buying and selling appetites in the markets
from a stock's history. Viewed that way, chart reading pursues the noble goal
of prediction. Charting is so common that certain events can trigger market
responses, possibly because so many participants infer the same meaning from a
stock's price chart.
Technical Analysis
------------------
Analysis grew more sophisticated as chartists gave way to
computer scientists writing algorithms. These algorithms have more scientific
sounding names like Moving Averages, Volume Weighted Moving Averages, Bollinger
Bands, Relative Strength Indicators, and Pearson's Correlation
Coefficient. Building technical analysis algorithms looks a lot like modern
statistics, and the optimists among us would say the similarities run deep.
Technical analysts take algorithmic approaches to the same concept: inferring
future behavior from trailing data. In addition to greater sophistication,
technical analysts can also test their algorithms over historic data. Imperfect
to be sure, but a giant leap from staring at a chart.
Reasonable people can disagree about the 'correctness' of
inferring future events from past behavior. Rather than dwell on that question,
we choose to point out a different limitation of both charting and modern
technical analysis: **both interpret the movement of a single stock in
isolation**. This limitation is both a blessing and a curse.
On the one hand, there is little room for sophisticated statistics or machine
learning when you have just a single time series for both your signal and your
prediction target.
On the other, technical analysis can still be intuitive, which makes it easier
to get acquainted with the idea of automated trading. Often there is a mental
leap for people to make from understanding the interpretation of a price series
to issuing orders. Because the signals are easy to understand, technical
analysis makes for a good initial learning experience to explore risk and
performance evaluation as well as order management: the price going above its
30 day moving average is something you can visualize. So, you can focus your
attention on the financial and trading aspects of the problem.
Statistical Arbitrage
---------------------
Statistical Arbitrage is the grandchild of chart reading.
Like technical analysis it relies on algorithms and statistics, but it departs
in one very significant way: 'stat arb' looks for relationships
among many stocks. The challenge with stat arb is twofold:
* visualizing the relationships can be quite difficult, since the relationships
can have high dimensionality
* the data processing load is quite high - a simple linear regression for all
stocks results in 32 million individual regressions. Assuming a 10-day
window, that can be 320 million individual calculations. To prepare,
backtest, and trade a stat arb strategy required both familiarity with the
mechanics of trading, knowledge of statistics, and a strong computer science
background.
As stat arb matured, the competition to find stat arb strategies that work
became a two part race:
1. execute the trades faster
2. find new ways to identify relationships within
market data
We think the pursuit of faster trades reached diminishing returns when the
market hit sub-millisecond trade execution. We think that the resulting high
level of liquidity is a good thing, but we agree with Thomas Petterffy that
`pursuing even faster trades
<http://www.npr.org/blogs/money/2012/08/27/159992076/a-father-of-high-speed-trading-thinks-we-should-slow-down>`_
"has absolutely no social value".
Finding new relationships in the market data is possible and more important now
than ever. In the summer of 2007, there was a sudden meltdown in quantitative
trading firms. Subsequent analysis points to quants crowding into the same
arbitrage bets, and an unforeseen fund liquidation driving all the quants to
unwind those bets concurrently. We believe finding new relationships should
permit investments with lower correlation and lower risks.
Algorithmic Investing and the Future
====================================
A revolution in market understanding happens next. We want Quantopian to enable
more quants than all of Wall Street combined. We want quants, new and old, to
explore and share new ways to view the market. We want to clear away the
obstacles that have so far kept all but a few from doing algorithmic investing
by:
* simulating with clean, high-quality market data for free
* easy access to markets through trusted brokers
* providing a robust, flexible open-source backtester to permit evaluation and
iteration of algorithms
* supporting a community that fosters the exchange of knowledge, ideas, code
solutions, and data sources
The community will find new ways to identify market opportunities. It may take
the form of new, non-market data sources, like news feeds or Twitter. It may be
new algorithmic techniques. Most likely, it will be something we
haven't heard of yet: your idea. The one you keep coming back to. The
idea you couldn't test without data. The idea that needs backtesting,
and iteration, and encouragement from other quants.
Do you want to unleash your idea? This is your chance. `Come hack Wall Street
<http://www.quantopian.com>`_.
-26
View File
@@ -1,26 +0,0 @@
qbt runs backtests using multiple processes and zeromq messaging for communication and coordination.
Backtest is the primary process. It maintains both server and client sockets:
zmq sockets for internal processing::
- data sink, ZMQ.REQ. Port = port_start + 1
- backtest will connect to socket, and then spawn one process per datasource, passing the data sink url as a startup arg. Each
datasource process will bind to the socket, and start processing
- backtest is responsible for merging the data events from all sources into a serialized stream and relaying it to the
aggregators, merging agg results, and transmitting consolidated stream to event feed.
- agg source, ZMQ.PUSH. Port = port_start + 2
- agg sink, ZMQ.PULL. Port = port_start + 3
- control source, ZMQ.PUB. Port = port_start + 4
- all child processes must subscribe to this socket. Control commands:
- START -- begin processing
- TIME -- current simulated time in backtest
- KILL -- exit immediately
zmq sockets for backtest clients:
=================================
- orders sink, ZMQ.RESP. Port = port_start + 5
- backtest will connect (can you bind?) to this socket and await orders from the client. Order data will be processed against the streaming datafeed.
- event feed, ZMQ.RESP. Port = port_start + 6
- backtest will bind to this socket and respond to requests from client for more data. Response data will be the queue of events that
transpired since the last request.
+3 -2
View File
@@ -1,5 +1,6 @@
zipline
====
***********************
Packages and Modules
***********************
.. toctree::
:maxdepth: 4
+77
View File
@@ -0,0 +1,77 @@
*******************************************
Overview
*******************************************
Simulations
===========
:mod:`zipline` runs backtests using asynchronous components and zeromq messaging for
communication and coordination.
:class:`.algorithm.TradingAlgorithm` is the heart of :mod:`zipline`, and the primary access point for creating,
launching, and tracking simulations. You can find it in
:py:class:`~zipline.algorithm.TradingAlgorithm`
Simulator Sub-Components
========================
Each simulation contains numerous subcomponents, each operating asynchronously
from all others, and communicating via zeromq.
DataSources
--------------------
A DataSource represents a historical event record, which will be played back
during simulation. A simulation may have one or more DataSources, which will be
combined in DataFeed. Generally, datasources read records from a persistent
store (db, csv file, remote service), format the messages for downstream
simulation components, and send them to a PUSH socket. See the base class for
all datasources :py:class:`~zipline.messaging.DataSource` and the module
holding all datasources :py:mod:`zipline.sources`
DataFeed
--------------------
All simulations start with a collection of
:py:class:`~zipline.messaging.DataSource`, which need to be fed to an
algorithm. Each :py:class:`~zipline.sources.DataSource`can contain events of
differing content (trades, quotes, corporate event) and frequency (quarterly,
intraday). To simplify the process of managing the data sources,
:py:class:`~zipline.core.DataFeed` can receive events from multiple
:py:class:`DataSources <zipline.sources.DataSource>` and combine them into a
serial chronological stream.
Transforms
--------------------
Often, an algorithm will require a running calculation on top of a
:py:class:`~zipline.messaging.DataSource`, or on the consolidated feed. A
simple example is a technical indicator or a moving average. Transforms can be
described in :py:class:`~zipline.core.Simulator`'s configuration. Subclass
:py:class:`~zipline.transforms.core.Transform` to add your own Transform.
Transforms must hold their own state between events, and serialize their
current values into messages.
Data Alignment
--------------------
Like Datasources, Transforms have differing frequencies and results. Simulator
manages the results of parallel transforms and **aligns** transform results
with the raw DataFeed. Client algorithms simply receive a map of data, which
includes the current event and all the transformed values.
Time Compression
--------------------
According to `this post
<https://www.quantopian.com/posts/help-with-runtime-error>`_ on the Quantopian
forums, time periods during which none of the selected SIDs were traded are
skipped.
Review the unit test coverage_.
.. _coverage: cover/index.html
+55
View File
@@ -0,0 +1,55 @@
**********
Quickstart
**********
Dual-Moving Average Example
===========================
The following code implements a simple dual moving average algorithm
and tests it on data extracted from yahoo finance.
.. code:: python
from zipline.algorithm import TradingAlgorithm
from zipline.transforms import MovingAverage
from zipline.utils.factory import load_from_yahoo
class DualMovingAverage(TradingAlgorithm):
"""Dual Moving Average algorithm.
"""
def initialize(self, short_window=200, long_window=400):
# Add 2 mavg transforms, one with a long window, one
# with a short window.
self.add_transform(MovingAverage, 'short_mavg', ['price'],
market_aware=True,
days=short_window)
self.add_transform(MovingAverage, 'long_mavg', ['price'],
market_aware=True,
days=long_window)
# To keep track of whether we invested in the stock or not
self.invested = False
self.short_mavg = []
self.long_mavg = []
def handle_data(self, data):
if (data['AAPL'].short_mavg['price'] > data['AAPL'].long_mavg['price']) and not self.invested:
self.order('AAPL', 100)
self.invested = True
elif (data['AAPL'].short_mavg['price'] < data['AAPL'].long_mavg['price']) and self.invested:
self.order('AAPL', -100)
self.invested = False
# Save mavgs for later analysis.
self.short_mavg.append(data['AAPL'].short_mavg['price'])
self.long_mavg.append(data['AAPL'].long_mavg['price'])
data = load_from_yahoo()
dma = DualMovingAverage()
results = dma.run(data)
You can find other examples in the zipline/examples directory.
+40
View File
@@ -0,0 +1,40 @@
:mod:`zipline.data` subpackage
===============================
.. automodule:: zipline.data.__init__
:members:
:undoc-members:
:show-inheritance:
:mod:`benchmarks` Module
-------------------------
.. automodule:: zipline.data.benchmarks
:members:
:undoc-members:
:show-inheritance:
:mod:`loader` Module
--------------------
.. automodule:: zipline.data.loader
:members:
:undoc-members:
:show-inheritance:
:mod:`loader_utils` Module
--------------------------
.. automodule:: zipline.data.loader_utils
:members:
:undoc-members:
:show-inheritance:
:mod:`treasuries` Module
------------------------
.. automodule:: zipline.data.treasuries
:members:
:undoc-members:
:show-inheritance:
+23 -2
View File
@@ -1,5 +1,18 @@
finance Package
===============
:mod:`zipline.finance` subpackage
==================================
.. automodule:: zipline.finance.__init__
:members:
:undoc-members:
:show-inheritance:
:mod:`commission` Module
-------------------------
.. automodule:: zipline.finance.commission
:members:
:undoc-members:
:show-inheritance:
:mod:`performance` Module
-------------------------
@@ -17,6 +30,14 @@ finance Package
:undoc-members:
:show-inheritance:
:mod:`slippage` Module
-------------------------
.. automodule:: zipline.finance.slippage
:members:
:undoc-members:
:show-inheritance:
:mod:`trading` Module
---------------------
+40
View File
@@ -0,0 +1,40 @@
:mod:`zipline.gens` subpackage
==============================
.. automodule:: zipline.gens.__init__
:members:
:undoc-members:
:show-inheritance:
:mod:`composites` Module
-------------------------
.. automodule:: zipline.gens.composites
:members:
:undoc-members:
:show-inheritance:
:mod:`sort` Module
------------------
.. automodule:: zipline.gens.sort
:members:
:undoc-members:
:show-inheritance:
:mod:`tradesimulation` Module
------------------------------
.. automodule:: zipline.gens.tradesimulation
:members:
:undoc-members:
:show-inheritance:
:mod:`utils` Module
---------------------
.. automodule:: zipline.gens.utils
:members:
:undoc-members:
:show-inheritance:
+12 -8
View File
@@ -1,18 +1,23 @@
zipline Package
===============
:mod:`zipline` Package
----------------------
=======================
.. automodule:: zipline.__init__
:members:
:undoc-members:
:show-inheritance:
:mod:`protocol` Module
:mod:`algorithm` Module
-------------------------
.. automodule:: zipline.algorithm
:members:
:undoc-members:
:show-inheritance:
:mod:`sources` Module
----------------------
.. automodule:: zipline.protocol
.. automodule:: zipline.sources
:members:
:undoc-members:
:show-inheritance:
@@ -38,10 +43,9 @@ Subpackages
.. toctree::
zipline.core
zipline.data
zipline.finance
zipline.gens
zipline.optimize
zipline.transforms
zipline.utils
+47
View File
@@ -0,0 +1,47 @@
:mod:`zipline.transforms` subpackage
=====================================
.. automodule:: zipline.transforms.__init__
:members:
:undoc-members:
:show-inheritance:
:mod:`mavg` Module
-------------------------
.. automodule:: zipline.transforms.mavg
:members:
:undoc-members:
:show-inheritance:
:mod:`returns` Module
-------------------------
.. automodule:: zipline.transforms.returns
:members:
:undoc-members:
:show-inheritance:
:mod:`stddev` Module
-------------------------
.. automodule:: zipline.transforms.stddev
:members:
:undoc-members:
:show-inheritance:
:mod:`utils` Module
-------------------------
.. automodule:: zipline.transforms.utils
:members:
:undoc-members:
:show-inheritance:
:mod:`vwap` Module
-------------------------
.. automodule:: zipline.transforms.vwap
:members:
:undoc-members:
:show-inheritance:
+56
View File
@@ -0,0 +1,56 @@
:mod:`zipline.utils` subpackage
===============================
.. automodule:: zipline.utils.__init__
:members:
:undoc-members:
:show-inheritance:
:mod:`date_utils` Module
--------------------------
.. automodule:: zipline.utils.date_utils
:members:
:undoc-members:
:show-inheritance:
:mod:`factory` Module
---------------------
.. automodule:: zipline.utils.factory
:members:
:undoc-members:
:show-inheritance:
:mod:`protocol_units` Module
----------------------------
.. automodule:: zipline.utils.protocol_utils
:members:
:undoc-members:
:show-inheritance:
:mod:`simfactory` Module
--------------------------
.. automodule:: zipline.utils.simfactory
:members:
:undoc-members:
:show-inheritance:
:mod:`test_utils` Module
------------------------
.. automodule:: zipline.utils.test_utils
:members:
:undoc-members:
:show-inheritance:
:mod:`tradingcalendar` Module
------------------------------
.. automodule:: zipline.utils.tradingcalendar
:members:
:undoc-members:
:show-inheritance: