Limited use of `pandas` data structures in both `HistoryContainer` and
`RollingPanel`. Where possible, methods were amended to return raw
`ndarrays` with the indexing logic done separately. This allows us to
cut down the number of times pandas objects are created both as returns
and intermediate values. The separation of indexing from data access
allowed us to minimize the times we’d make use of pandas indexes.
This required that that certain methods like `NDFrame.ffill` be replaced
with versions that work with `ndarrays`. Some of this was done via
straight numpy methods and others by access pandas internal
machinery. Outside of allowing us to use faster ndarrays, many of these
function provided speedups over their pandas counterparts as we didn’t
require the extra features like handling multiple dtypes. i.e. np.isnan
is faster than pd.isnull, but only works with certain dtypes.
getting filled with the wrong datetimes and causing errors.
Updates the logic for addressing missing datetimes and adds unit tests
for the 2 main cases (no missing datetimes, and some missing datetimes).
Previously, all specs had to be pre-allocated by using the 'add_history'
function. This is now no longer required and instead serves as a hint to
the HistoryContainer to pre-allocate the space for the given spec.
History can grow by increasing the length for a frequency, adding a
frequency, or adding a field. It can grow with any combination of
these.
HistoryContainer now is aware of the data_frequency of the algorithm,
and no longer uses the daily_at_midnight flag; instead, this is the
default behavior.
Overhaul the core HistoryContainer logic to be more robust to changing
universes.
Major Changes
-------------
* Remove `return_frame` cache. The original purpose of using
return_frames was to avoid having to create new DataFrames on each
iteration of handle_data, but we ended up having to copy the return
frames anyway because user code could mutate the frames in place.
Removing the return_frames reduces unnecessary copying, and reduces
the logic of `get_history` to just forward-filling and concatenating
two DataFrames.
* Use a `MultiIndex`ed DataFrame to represent
`last_known_prior_values`. This makes lookups faster and greatly
simplifies the logic of adding and dropping sids.
* HistoryContainer no longer attempts to determine its universe based on
the contents of its internal buffers. The TradingAlgorithm
controlling the container is now responsible for explicitly calling
`add_sids` or `drop_sids` when securities enter or leave the
algorithm's universe. These methods, along with the internal
`_realign` method, provide a clean interface for changing the universe
of securities managed by the container.
* Refactor index mutation logic in `RollingPanel` into a
`MutableIndexRollingPanel` subclass. Maintenance of the old behavior
is regrettably necessary to support `BatchTransform`.
* Refactor shared logic from `roll` and `get_history` into a single
`aggregate_ohlcv_panel` method that's responsible for collapsing an
OHLCV buffer into a frame.
There quite some bugs in certain corner cases. Dropping of obsolete
axes was not working correctly, roll over could cause obsolete axes
to not drop. The tests are much more stringent now as well.
Overhauls `HistoryContainer` in prep for support of more than one frequency.
Major changes:
- Methods/variables referring to "day" have been renamed/generalized.
- `current_day_panel` became `buffer_panel`, which is now a `RollingPanel`
- `prior_day_panel` became a dictionary mapping `Frequency` objects to
"digest panels", which are instances of `RollingPanel`.
- Hard-coded daily rollover replaced with a notion of a "current window" for
each unique frequency managed by the panel.
- When the end of the current window is reached for a given frequency, we
compute an aggregate bar (code refers to this as a "digest"), which is
appended to a panel associated with that frequency.
- Window rollover dates are managed by a pair of dictionaries,
`cur_window_starts` and `cur_window_closes`. The `Frequency` class is
responsible for computing window bounds based on the open/close of the
previous window.
- Semantic change to the `open_price` field: `open_price` now always
contains the price of the first trade occurring in the given window.
Previously it contained the price of the first minute in the window,
returning NaN it the security happened not to trade in the first minute.