Previously, whenever we try to access a missing value on the Positions
dict, we return a default Position and save it to the dict. Instead,
just return the Position
EarningsCalendar loader.
- Moves most of AdjustedArray back into Python. The window iterator is
the only part that's performance-intensive.
- Adds a bootleg templating system for creating specialized versions of
AdjustedArrayWindow for each concrete type we care about.
- Adds support for differently dtyped terms in pipeline. This allows us
to use datetime64s which are needed in the EarningsCalendar.
- Adds EarningsCalendar dataset for the next and previous earnings
announcements in pipeline.
- Adds in memory loader for EarningsCalendar.
- Adds blaze loader for EarningsCalendar.
- Adds `zipline.pipeline.Pipeline`, a new user-facing class for managing
pipelines of Modeling API expressions.
- Adds `attach_pipeline` and `drain_pipeline` as API methods
- Removes `add_factor` and `add_filter` as API methods. These have been
replaced two new methods on `Pipeline`: `add`, and `apply_screen`.
- Adding a `Filter` as a column no longer implicitly truncates rows from
the Modelling API output. It simply causes a new column, of dtype
`bool` to show up in the output. Removal of rows is now handled by the
new `apply_screen` method of `Pipeline`.
- Refactors the existing Modeling API tests to reflect the new APIs.
- Fixes an error where Modeling API data known as of the close of `day
N` would be shown to algorithms during `before_trading_start` as of
the close of the same day. Algorithms should now only receive data
during `before_trading_start/handle_data` that was known as of the
simulation time at which the function would be called.
- All Term instances now have a `mask` attribute that must be a `Filter`
or an instance of `AssetExists()`. `mask` can be used to specify that
a Factor should be computed in a manner that ignores the values that
were not `True` in the mask.
- Changed the interface for `FFCLoader.load_adjusted_array` and
`Term._compute` from `(columns, mask)`, with mask as a DataFrame, to
`(columns, dates, assets, mask)`, where mask is a numpy array. This
is primarily to avoid having to reconstruct extra DataFrames when
using masks produced by non `AssetExists` filters.
- Adds `BoundColumn.latest`, which gives the most-recently-known value
of a column.
This commit removes the ability to reference a shared TradingEnvironment through the zipline.finance.trading module. In place, the classes that require a TradingEnvironment, or its child AssetFinder, contain their own references to those objects.
This commit also adds serialization utilities that allow for the pickling/unpickling of objects without unintentionally their TradingEnvironments or AssetFinders.
This patch lays the groundwork for a compute engine designed to
facilitate construction of factor-based universe screening and portfolio
allocation. It contains:
A new module, `zipline.modelling`, containing entities that can be used
to express computations as dependency graphs. Each node in such a graph
is an instance of the base `Term` class, defined in
`zipline.modelling.term`. Dependency graphs are executed by instances
of `FFCEngine`, defined in `zipline.modelling.engine`.
A new module, `zipline.data.ffc`, containing loaders and dataset
definitions for inputs to the modelling API.
New `TradingAlgorithm` api methods: `add_factor`, and `add_filter`.
These methods can only be called from `initialize`, and are used to
inform the algorithm that each day it should compute the given terms.
Computed factor results are made available through a new attribute of
the `data` object in `before_trading_start` and `handle_data`. Computed
filter results control which assets are available in the factor matrix
on each day.
Previously the class SerializeableZiplineObject was used to
house basic __setstate__ and __getstate__ methods. It wasn't
really doing much that was helpful, so it is now gone.
Uses a numpy array instead of a dict of dicts when initializing history
container.
In testing this reduced the total time spent in HistoryContainer.update
by 66%.
BEFORE COMMIT:
Thu Oct 16 22:30:46 2014 results/cprofile/unoptimized
185223320 function calls (182210491 primitive calls) in 401.351
seconds
Ordered by: cumulative time
List reduced from 2398 to 27 due to restriction <'update'>
ncalls tottime percall cumtime percall filename:lineno(function)
8580 0.461 0.000 160.571 0.019
qexec/zipline/history/history_container.py:388(update)
AFTER COMMIT:
Thu Oct 16 22:12:28 2014 results/cprofile/optimized
143177181 function calls (140164352 primitive calls) in 272.403
seconds
Ordered by: cumulative time
List reduced from 2395 to 27 due to restriction <'update'>
ncalls tottime percall cumtime percall filename:lineno(function)
8580 0.086 0.000 47.294 0.006 qexec/zipline/history/history_container.py:388(update)
Removes support for handling dividends as part of the algorithm
simulation stream, replacing it with an API in `TradingAlgorithm` for
supplying dividends as a DataFrame.
Adding a copy of the Event's dt field as datetime via the
`alias_dt` generator, so that the API was forgiving and allowed
both datetime and dt on a SIDData object, was creating noticeable
overhead, even on an noop algorithms.
Instead of incurring the cost of copying the datetime value and
assigning it to the Event object on every event that is passed
through the system, add a property to SIDData which acts as an
alias `datetime` to `dt`.
Eventually support for `data['foo'].datetime` may be removed,
and could be considered deprecated.
Use the six module to import functions and types that are
consistent between Python 2 and 3, so that one code base can
support both versions.
- Use integer types instead of int and long.
- Use string_types instead of basestring.
- Account for iteritems, itervalues, iterkeys.
- Use six.moves for filter and zip, reduce
- Use compatible bytes for md5 hasher.
- xrange and range
`for s in data` and methods like `for s in data.keys` were not producing
the same list of active sids
Make the other iteration methods match __iter__ by using the contains
method to check whether or not the sid is active.
For use of data outside of the algoscript context, which needs access
to all data fields use data._data