- Fixes an error where Modeling API data known as of the close of `day
N` would be shown to algorithms during `before_trading_start` as of
the close of the same day. Algorithms should now only receive data
during `before_trading_start/handle_data` that was known as of the
simulation time at which the function would be called.
- All Term instances now have a `mask` attribute that must be a `Filter`
or an instance of `AssetExists()`. `mask` can be used to specify that
a Factor should be computed in a manner that ignores the values that
were not `True` in the mask.
- Changed the interface for `FFCLoader.load_adjusted_array` and
`Term._compute` from `(columns, mask)`, with mask as a DataFrame, to
`(columns, dates, assets, mask)`, where mask is a numpy array. This
is primarily to avoid having to reconstruct extra DataFrames when
using masks produced by non `AssetExists` filters.
- Adds `BoundColumn.latest`, which gives the most-recently-known value
of a column.
This commit removes the ability to reference a shared TradingEnvironment through the zipline.finance.trading module. In place, the classes that require a TradingEnvironment, or its child AssetFinder, contain their own references to those objects.
This commit also adds serialization utilities that allow for the pickling/unpickling of objects without unintentionally their TradingEnvironments or AssetFinders.
This patch lays the groundwork for a compute engine designed to
facilitate construction of factor-based universe screening and portfolio
allocation. It contains:
A new module, `zipline.modelling`, containing entities that can be used
to express computations as dependency graphs. Each node in such a graph
is an instance of the base `Term` class, defined in
`zipline.modelling.term`. Dependency graphs are executed by instances
of `FFCEngine`, defined in `zipline.modelling.engine`.
A new module, `zipline.data.ffc`, containing loaders and dataset
definitions for inputs to the modelling API.
New `TradingAlgorithm` api methods: `add_factor`, and `add_filter`.
These methods can only be called from `initialize`, and are used to
inform the algorithm that each day it should compute the given terms.
Computed factor results are made available through a new attribute of
the `data` object in `before_trading_start` and `handle_data`. Computed
filter results control which assets are available in the factor matrix
on each day.
Previously the class SerializeableZiplineObject was used to
house basic __setstate__ and __getstate__ methods. It wasn't
really doing much that was helpful, so it is now gone.
Uses a numpy array instead of a dict of dicts when initializing history
container.
In testing this reduced the total time spent in HistoryContainer.update
by 66%.
BEFORE COMMIT:
Thu Oct 16 22:30:46 2014 results/cprofile/unoptimized
185223320 function calls (182210491 primitive calls) in 401.351
seconds
Ordered by: cumulative time
List reduced from 2398 to 27 due to restriction <'update'>
ncalls tottime percall cumtime percall filename:lineno(function)
8580 0.461 0.000 160.571 0.019
qexec/zipline/history/history_container.py:388(update)
AFTER COMMIT:
Thu Oct 16 22:12:28 2014 results/cprofile/optimized
143177181 function calls (140164352 primitive calls) in 272.403
seconds
Ordered by: cumulative time
List reduced from 2395 to 27 due to restriction <'update'>
ncalls tottime percall cumtime percall filename:lineno(function)
8580 0.086 0.000 47.294 0.006 qexec/zipline/history/history_container.py:388(update)
Removes support for handling dividends as part of the algorithm
simulation stream, replacing it with an API in `TradingAlgorithm` for
supplying dividends as a DataFrame.
Adding a copy of the Event's dt field as datetime via the
`alias_dt` generator, so that the API was forgiving and allowed
both datetime and dt on a SIDData object, was creating noticeable
overhead, even on an noop algorithms.
Instead of incurring the cost of copying the datetime value and
assigning it to the Event object on every event that is passed
through the system, add a property to SIDData which acts as an
alias `datetime` to `dt`.
Eventually support for `data['foo'].datetime` may be removed,
and could be considered deprecated.
Use the six module to import functions and types that are
consistent between Python 2 and 3, so that one code base can
support both versions.
- Use integer types instead of int and long.
- Use string_types instead of basestring.
- Account for iteritems, itervalues, iterkeys.
- Use six.moves for filter and zip, reduce
- Use compatible bytes for md5 hasher.
- xrange and range
`for s in data` and methods like `for s in data.keys` were not producing
the same list of active sids
Make the other iteration methods match __iter__ by using the contains
method to check whether or not the sid is active.
For use of data outside of the algoscript context, which needs access
to all data fields use data._data
Remove the lists of DailyReturn objects in favor of using pd.Series
to store the return values.
Should make it easier to inspect the values when stepping through,
make the windowing of data to a certain range more facile by using,
and have some performance increases due to removing object creation
and member access.
The defaultdict behavior was allowing both algo code and
TradingAlgorithm wrappers to add unintended keys.
Remove use of defaultdict in favor of a dictionary that explicitly
adds the values in tradesimulation, otherwise allow a KeyError
if the bar is indexed with a sid that doesn't exist.
Also, when iterating over the keys in the data bar, only return
those keys that have pricing data.
Python 3 requires using dot syntax for relative imports,
otherwise the import is treated as an absolute import, i.e.
an import of a module from outside of the project.
By using dot syntax now, imports should be compatible with both
Python 2.7 and Python 3.
The override should be used to filter out symbols not in the universe,
however it was returning false positives.
To remove the false positives, after the contains check passes,
ensure that the key exists in the _data member.
BarData should, at least for the time being, be compatible with
existing algorithms that had worked against the prior usage of
an ndict as data, which provided `has_key`.
Of note, the Python language has deprecated `has_key` in favor
of using `in` and `__contains__`.
Instead of creating a list of benchmarks in the risk module,
stream benchmarks through the system as events, starting from the
algorithm generator.
Works towards more easily setting arbritrary pricing data as
a a benchmark, as well as working towards live minutely benchmarks.
- Add transaction and order types
- Move TransactionSimulator from trading.py to tradesimulation.py
(only used by other members of the tradesimulation module)
- Make Transaction an independent event, like dividend
- Add Blotter class.
- Flatten the transaction events to be independent of trade bar events
- Make orders into events that reach performance (need to add
handling)
- Issue IDs to orders and tracking each transaction's order id.
- Make volume share slippage fill orders independently, rather than
aggregating them into a single transaction.
- Perf tracker holds orders, serializes them with transactions.
- Order state defined and maintained by order class.
- Minutely emission of orders based on last_modified date.
- perf modified to let non-performance related events flow through.
- changes to support streaming non-trading data through batch transforms
and for mixing in sids with just custom data.
- allowing CUSTOM events to flow through to transforms.
- Added logic to maintain pre-specified sid filter.