* Updated cython build scripts
* Updated setup.py to to install catalyst package
* Updated momentum example to use catalyst package
* catalyst executable now supports loading pipelines from multiple bundles
The minute to session sampling reading was creating two DataFrame
objects, the first to hold the minute data, and then a second returned
by the `DataFrame.groupby` to sample down to sessions.
Instead use the arrays returned by the minute readers `load_raw_arrays`
and implement sampling logic which takes advantage that the minutes
being passed start with the first minute of the first session and end
with the last minute of the last session.
On my machine this takes the tests in `test/test_continuous_futures`
from ~4.0 to about ~0.1 seconds.
Add the ability for an algorithm to request the current contract for a
future chain via `data.current`.
e.g.:
```
data.current(ContinuousFuture('CL', offset=0, roll='calendar'),
'contract')
```
Instead of having separate ExchangeCalendar and TradingSchedule objects, we
now just have TradingCalendar. The TradingCalendar keeps track of each
session (defined as a contiguous set of minutes between an open and a close).
It's also responsible for handling the grouping logic of any given minute
to its containing session, or the next/previous session if it's not a market
minute for the given calendar.
as well as tooling and docs to generate this for each release
Also moved Cython files to package_data, so that we install them,
instead of just packaging them.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
LabelArray is conceptually similar to pandas.Categorical, in that it
stores data with many duplicate values as indices into an array of
unique values. For string data with many duplicates (e.g. time-series
of tickers or or industry classifications), this provides multiple
orders of magnitude of improvement when doing string operations,
especially string comparison/matching operations.
- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
and a corresponding ObjectOverwrite adjustment.
- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
This method is called on the final result of any pipeline expression
after screen filtering has occurred. The default implementation of
``postprocess`` is identity, but Classifier overrides it to coerce
string columns into pandas.Categoricals before presenting them to the
user.
Adds the data bundle concept which makes it easy for users to register
loading functions to build out minute and daily data along with an
assets db and adjustments db. By default we have provided a `quandl`
bundle which pulls from the public domain WIKI dataset. Users may
register new bundles by decorating an ingest function with
`zipline.data.bundles.register(<name>)`. This also provides a
`yahoo_equities` function for creating an ingestion function that will
load a static set of assets from yahoo.
The cli is now structured as a couple of subcommands and has been
changed to `python -m zipline`. The old behavior of `run_algo.py` has
been moved to the `run` subcommand. This is almost entirely the same
except that it now takes the name of the data bundle to use, defaulting
to `quandl`.
The next subcommand is `ingest` which takes the name of
a data bundle to ingest. This will run the loading machinery and write
the data to a specified location that `run` can find.
There is also a `clean` subcommand which deletes the data that was
written with `ingest`.
Extensions have also been added to zipline. This is an experimental
feature where users can provide an extra set of python files to run at
the start of the process. These can be used to configure aspects of
zipline. Right now the only thing that is supported in an extension file
is the registration of a new data bundle.
EarningsCalendar loader.
- Moves most of AdjustedArray back into Python. The window iterator is
the only part that's performance-intensive.
- Adds a bootleg templating system for creating specialized versions of
AdjustedArrayWindow for each concrete type we care about.
- Adds support for differently dtyped terms in pipeline. This allows us
to use datetime64s which are needed in the EarningsCalendar.
- Adds EarningsCalendar dataset for the next and previous earnings
announcements in pipeline.
- Adds in memory loader for EarningsCalendar.
- Adds blaze loader for EarningsCalendar.
versioneer let's us track the version using git tags. This prevents
issues like the 0.8.2->0.8.3 push. This also puts the number of commits
from the release and the commit you are on in the version.
with explicit requirements files
Need to build extension modules explicitly now that we're not
installing zipline
Adds support for upper bounds, since we thought newer bcolz didn't
work. It just needed newer setuptools.
Put the logic for reading and writing the equity price and adjustment
data into a module located in data, making it distinct from the pipeline
loader usage of the formats.
This prepares for both incoming changes of how adjustments are written,
(which includes using the bcolz daily reader as an input), as well as
eventually providing the readers to a DataPortal object.
- Fixes an error where Modeling API data known as of the close of `day
N` would be shown to algorithms during `before_trading_start` as of
the close of the same day. Algorithms should now only receive data
during `before_trading_start/handle_data` that was known as of the
simulation time at which the function would be called.
- All Term instances now have a `mask` attribute that must be a `Filter`
or an instance of `AssetExists()`. `mask` can be used to specify that
a Factor should be computed in a manner that ignores the values that
were not `True` in the mask.
- Changed the interface for `FFCLoader.load_adjusted_array` and
`Term._compute` from `(columns, mask)`, with mask as a DataFrame, to
`(columns, dates, assets, mask)`, where mask is a numpy array. This
is primarily to avoid having to reconstruct extra DataFrames when
using masks produced by non `AssetExists` filters.
- Adds `BoundColumn.latest`, which gives the most-recently-known value
of a column.
- Add an `ascending=True` keyword to `rank()`.
- Add `top(N)` and `bottom(N)` methods to Factor. These return Filters
that pass the top and bottom N elements each day.
- Add a slightly faster path for rank(method='ordinal'). I had
originally thought the fast path was 2-3x faster because I had my
benchmark data axes flipped. The actual speedup is only 5-10%, which
means it probably wasn't worth the effort to Cythonize...but we have a
slightly faster version now so we might as well use it.
- Refactor test_filter and test_factor to make it easier to implement
and test transformations on factors. These tests now subclass
BaseFFCTestCase, which provides facilities for passing a dict of terms
and an "initial_workspace", the values for which are used by
SimpleFFCEngine rather than needing to manually manage the inputs and
outputs of each term.