Changes BcolzDailyBarWriter to not be an abc, data is passed as an
iterator of (sid, dataframe) pairs to the write method.
Changes the AssetsDBWriter to be a single class which accepts an engine
at construction time and has a `write` method for writing dataframes for
the various tables. We no longer support writing the various other data
types, callers should coerce their data into a dataframe themselves. See
zipline.assets.synthetic for some helpers to do this.
Adds many new fixtures and updates some existing fixtures to use the new
ones:
WithDefaultDateBounds
A fixture that provides the suite a START_DATE and END_DATE. This is
meant to make it easy for other fixtures to synchronize their date
ranges without depending on eachother in strange ways. For example,
WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should
both have data for the same dates, so they may use depend on
WithDefaultDates without forcing a dependency between them.
WithTmpDir, WithInstanceTmpDir
Provides the suite or individual test case a temporary directory.
WithBcolzDailyBarReader
Provides the suite a BcolzDailyBarReader which reads from bcolz data
written to a temporary directory. The data will be read from
dataframes and then converted to bcolz files with
BcolzDailyBarWriter.write
WithBcolzDailyBarReaderFromCSVs
Provides the suite a BcolzDailyBarReader which reads from bcolz data
written to a temporary directory. The data will be read from a
collection of CSV files and then converted into the bcolz data through
BcolzDailyBarWriter.write_csvs
WithBcolzMinuteBarReader
Provides the suite a BcolzMinuteBarReader which reads from bcolz data
written to a temporary directory. The data will be read from
dataframes and then converted to bcolz files with
BcolzMinuteBarWriter.write
WithAdjustmentReader
Provides the suite a SQLiteAdjustmentReader which reads from an in
memory sqlite database. The data will be read from dataframes and then
converted into sqlite with SQLiteAdjustmentWriter.write
WithDataPortal
Provides each test case a DataPortal object with data from temporary
resources.
Limited use of `pandas` data structures in both `HistoryContainer` and
`RollingPanel`. Where possible, methods were amended to return raw
`ndarrays` with the indexing logic done separately. This allows us to
cut down the number of times pandas objects are created both as returns
and intermediate values. The separation of indexing from data access
allowed us to minimize the times we’d make use of pandas indexes.
This required that that certain methods like `NDFrame.ffill` be replaced
with versions that work with `ndarrays`. Some of this was done via
straight numpy methods and others by access pandas internal
machinery. Outside of allowing us to use faster ndarrays, many of these
function provided speedups over their pandas counterparts as we didn’t
require the extra features like handling multiple dtypes. i.e. np.isnan
is faster than pd.isnull, but only works with certain dtypes.
getting filled with the wrong datetimes and causing errors.
Updates the logic for addressing missing datetimes and adds unit tests
for the 2 main cases (no missing datetimes, and some missing datetimes).
Previously, all specs had to be pre-allocated by using the 'add_history'
function. This is now no longer required and instead serves as a hint to
the HistoryContainer to pre-allocate the space for the given spec.
History can grow by increasing the length for a frequency, adding a
frequency, or adding a field. It can grow with any combination of
these.
HistoryContainer now is aware of the data_frequency of the algorithm,
and no longer uses the daily_at_midnight flag; instead, this is the
default behavior.
Overhaul the core HistoryContainer logic to be more robust to changing
universes.
Major Changes
-------------
* Remove `return_frame` cache. The original purpose of using
return_frames was to avoid having to create new DataFrames on each
iteration of handle_data, but we ended up having to copy the return
frames anyway because user code could mutate the frames in place.
Removing the return_frames reduces unnecessary copying, and reduces
the logic of `get_history` to just forward-filling and concatenating
two DataFrames.
* Use a `MultiIndex`ed DataFrame to represent
`last_known_prior_values`. This makes lookups faster and greatly
simplifies the logic of adding and dropping sids.
* HistoryContainer no longer attempts to determine its universe based on
the contents of its internal buffers. The TradingAlgorithm
controlling the container is now responsible for explicitly calling
`add_sids` or `drop_sids` when securities enter or leave the
algorithm's universe. These methods, along with the internal
`_realign` method, provide a clean interface for changing the universe
of securities managed by the container.
* Refactor index mutation logic in `RollingPanel` into a
`MutableIndexRollingPanel` subclass. Maintenance of the old behavior
is regrettably necessary to support `BatchTransform`.
* Refactor shared logic from `roll` and `get_history` into a single
`aggregate_ohlcv_panel` method that's responsible for collapsing an
OHLCV buffer into a frame.
There quite some bugs in certain corner cases. Dropping of obsolete
axes was not working correctly, roll over could cause obsolete axes
to not drop. The tests are much more stringent now as well.
Overhauls `HistoryContainer` in prep for support of more than one frequency.
Major changes:
- Methods/variables referring to "day" have been renamed/generalized.
- `current_day_panel` became `buffer_panel`, which is now a `RollingPanel`
- `prior_day_panel` became a dictionary mapping `Frequency` objects to
"digest panels", which are instances of `RollingPanel`.
- Hard-coded daily rollover replaced with a notion of a "current window" for
each unique frequency managed by the panel.
- When the end of the current window is reached for a given frequency, we
compute an aggregate bar (code refers to this as a "digest"), which is
appended to a panel associated with that frequency.
- Window rollover dates are managed by a pair of dictionaries,
`cur_window_starts` and `cur_window_closes`. The `Frequency` class is
responsible for computing window bounds based on the open/close of the
previous window.
- Semantic change to the `open_price` field: `open_price` now always
contains the price of the first trade occurring in the given window.
Previously it contained the price of the first minute in the window,
returning NaN it the security happened not to trade in the first minute.