This allows us to remove the check for whether the provided dt had a
time of midnight, which was a flimsy way to infer if the data frequency
was 'daily'. Besides the explicit check being preferable, this method
was broken on the futures calendar, since midnight is a valid market
minute.
Added as a subclass of MinuteEquityHistoryTestCase, where the primary
calendar is 'us_futures'.
Notes on modifications to MinuteEquityHistoryTestCase:
- To work on generic calendars, many tests now use set minutes for
window start and end, and check the values on active equity minutes.
- test_minute_regular should test against active equity minutes
- Adapts test_minute_midnight to work with futures calendar
- Use a method of getting the last open minute that works with
calendars that are open at midnight
- Test against Sunday at midnight, since the real intention of this
test is to check that given a non-open minute, we fall back to the
last open minute.
Added as a minimal subclass of DailyEquityHistoryTestCase, swapping out
just the primary calendar. This requires significant modifications to
DailyEquityHistoryTestCase, to allow for a generic primary calendar.
In preparation for using `DataPortal` in notebooks, remove restriction on
the `HistoryLoader` to dates that are monotonically increasing. Notebook
usage of the `DataPortal` is more useful when the end of the history
window can be arbitrary dates without having to restart the notebook kernel.
Due to the implementation of the prefetch and caching logic, the end
date of history calls could previously only increase. e.g. `2016-11-01`,
`2016-11-02`, `2016-11-03`. This pattern was sufficient for backtesting
and live simulations, since the current time of the algorithm only ever increases.
With this change, which resets the underlying sliding window when the
last fetched idx is greater than the
Now calls to history in the same process with end dates such
`2016-11-01`, `2016-10-31`, `2015-11-02` should work.
To support using a `DataPortal` and `HistoryLoader` in a notebook, allow
the prefetch length to be configurable, so that it can be set to 0.
Unlike backtesting where the prefetch is useful for repeated history
windows viewed from datetimes which are monotonically increasing by a
small amount, the notebook usage of history windows needs only to
retrieve the exact data needed for the window specified.
This patch also fixes some boundary conditions related to rolls and
adjustments which were uncovered by querying for the adjustments with an
end date near the end of the window.
Introducing a WithCreateBarData fixture which allows for the
creation of a BarData using only the `simulation_dt_func` and
`restrictions` params. Assumes that each suite uses the same
`data_portal`, `data_frequency` and `trading_calendar`
BarData now takes the trading calendar as a parameter.
can_trade now checks if the asset’s exchange is open at the current or
next market minute (defined by the given trading calendar).
Change the mock minute data to no longer use an increasing arange, so
that a days worth of minute data can be summed and fit inside of a
uint32.
This change was required because of working on new test data that looked
like [0, 100, 200, 0, ] which was resulting in a daily rollup of 0 data,
when the coverage needed a non-0 value.
Also, factor out the resampling function, with an eye on a making it
easier to convert from minute bars to daily bars during ingest/load
processes.
Instead of having separate ExchangeCalendar and TradingSchedule objects, we
now just have TradingCalendar. The TradingCalendar keeps track of each
session (defined as a contiguous set of minutes between an open and a close).
It's also responsible for handling the grouping logic of any given minute
to its containing session, or the next/previous session if it's not a market
minute for the given calendar.
In preparation of adding futures, add equity to the names of both the
classes and methods for writing bcolz data. Futures data will use a
different minutes per day with a separate reader. This change will allow
both equity and futures fixtures to be side by side.
Also, break out the method which generates the dataframes and trading
days member into fixtures (`EquityMinuteBarData` and
`EquityDailyBarData`) on which the `*BarReader` fixture depends. This
fixture is separated out to enable reader/writers in different formats
to use the same data setup. (There is internal code which needs to write
minute and daily bar data in a database format.)
Since the first trading day is now passed directly to the DataPortal on
init, there's no need for a method that does this. Moves all the
additional logic/assignments into the init. Also corrects an issue where
we would never create certain attributes if self._first_trading_day was
None.
Adds the ability to specify the first trading day for a data portal in a
test case when using the WithDataPortal fixture.
Fix behavior in minute mode history with frequency `1d`, where on the
day immediately following an adjustment action, the overnight adjustment
would not apply. (However the adjustment would be applied after a 1 day
lag.)
The root cause of the bug was that the history data for minute mode when
using `1d` stitches together a sliding window of the daily data for
previous and the current minute. That daily data sliding window and
corresponding adjustments was being read as if the data was being viewed
from on the last day of the window; however in this case the data is
being viewed from the day after the window has completed. The difference
in view points requires the adjustments to popped and applied by the
adjusted array one index earlier. The fix uses the `extra_slot` value as
signifier on whether the data is being viewed on the following day and
then accordingly adjusts the index of the mulitpy object.
Also, change the split and merger test data ratios to have different values,
to ensure that different adjustment values are applied; as opposed to
doubling up on just one of the values.
Make the delineation between `DailyEquityHistoryTestCase` and
`MinuteEquityHistoryTestCase` whether or not minute dts or daily dts are
used as the query timestamp, instead of whether the frequency is `1d`
vs. `1m`.
Preparing for adding a repro case for where using `1d` with minute data
fails when there is an adjustment occuring the day before the query
minute dt.
Allow `WithBcolzDailyBarData` to opt-in to reading data defined by
`WithBcolzMinuteBarData`, so that the daily and minute test for the same
asset and dts correlate between the two readers.
The correlation is relevant for history tests which blend daily and
minute data.
Also, make the test data for the split and mergers assets in the minute
suite align at the thousands place if the adjustmets are applied
correctly, by starting the prices with a base of 4000 and then halving
the start value each day.
Updates the BcolzMinuteBarWriter.write api to allow users to pass their
data as a stream instead of requiring that they loop over their data
externally. This matches the API presented by BcolzDailyBarWriter.
Changes BcolzDailyBarWriter to not be an abc, data is passed as an
iterator of (sid, dataframe) pairs to the write method.
Changes the AssetsDBWriter to be a single class which accepts an engine
at construction time and has a `write` method for writing dataframes for
the various tables. We no longer support writing the various other data
types, callers should coerce their data into a dataframe themselves. See
zipline.assets.synthetic for some helpers to do this.
Adds many new fixtures and updates some existing fixtures to use the new
ones:
WithDefaultDateBounds
A fixture that provides the suite a START_DATE and END_DATE. This is
meant to make it easy for other fixtures to synchronize their date
ranges without depending on eachother in strange ways. For example,
WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should
both have data for the same dates, so they may use depend on
WithDefaultDates without forcing a dependency between them.
WithTmpDir, WithInstanceTmpDir
Provides the suite or individual test case a temporary directory.
WithBcolzDailyBarReader
Provides the suite a BcolzDailyBarReader which reads from bcolz data
written to a temporary directory. The data will be read from
dataframes and then converted to bcolz files with
BcolzDailyBarWriter.write
WithBcolzDailyBarReaderFromCSVs
Provides the suite a BcolzDailyBarReader which reads from bcolz data
written to a temporary directory. The data will be read from a
collection of CSV files and then converted into the bcolz data through
BcolzDailyBarWriter.write_csvs
WithBcolzMinuteBarReader
Provides the suite a BcolzMinuteBarReader which reads from bcolz data
written to a temporary directory. The data will be read from
dataframes and then converted to bcolz files with
BcolzMinuteBarWriter.write
WithAdjustmentReader
Provides the suite a SQLiteAdjustmentReader which reads from an in
memory sqlite database. The data will be read from dataframes and then
converted into sqlite with SQLiteAdjustmentWriter.write
WithDataPortal
Provides each test case a DataPortal object with data from temporary
resources.
Fix a bug where if history were called with assets `[1, 2]` and then
subsequently, `[2, 1]`, the loader would return the cached array in
order for `[1, 2]`.
Instead cache an AdjustedArray for each asset, then when a history
window is requested, check if each asset has a sufficient cache, and if
not then read values for the assets which are missing or need to be
refreshed.
An added benefit of this change is that if a subsequent call to history
has a smaller number of assets than the previous, no new data needs to
be read from disk. e.g. a call with assets `[1, 2, 3]` and then `[1, 2]`
would use the cached values for `1` and `2` from the first call.
Conversely, if the second call has more assets, then only the data for
the new assets needs to be retrieved. e.g. a history with `[1, 2]`, then
`[1, 2, 3]` would only need (assuming `1` and `2` have not expired) to
retrieve data for `3`. Unfortunately, the benefit here is not great
because `load_raw_arrays` is optimized for reading many assets, and
pulls the entire daily bar dataset into memory. This change makes tuning
`load_raw_arrays` so that faster reads (e.g. by slicing from the carray
for each asset, instead of pulling all data into a numpy array), when
only a few assets are requested, more beneficial than it would have been
previously.
Renames zipline.utils.test_utils to zipline.testing
Adds zipline.testing.fixtures.ZiplineTestCase to manage setup and
teardown and adds mixins to define fixtures like an asset finder or
trading calendar.
This commit removes the ability to reference a shared TradingEnvironment through the zipline.finance.trading module. In place, the classes that require a TradingEnvironment, or its child AssetFinder, contain their own references to those objects.
This commit also adds serialization utilities that allow for the pickling/unpickling of objects without unintentionally their TradingEnvironments or AssetFinders.
This commit modifies the DataFrameSource and DataPanelSource to accept only Int64Indexes on the incoming data and moves the burden of mapping user identifiers to TradingAlgorithm.run().
Limited use of `pandas` data structures in both `HistoryContainer` and
`RollingPanel`. Where possible, methods were amended to return raw
`ndarrays` with the indexing logic done separately. This allows us to
cut down the number of times pandas objects are created both as returns
and intermediate values. The separation of indexing from data access
allowed us to minimize the times we’d make use of pandas indexes.
This required that that certain methods like `NDFrame.ffill` be replaced
with versions that work with `ndarrays`. Some of this was done via
straight numpy methods and others by access pandas internal
machinery. Outside of allowing us to use faster ndarrays, many of these
function provided speedups over their pandas counterparts as we didn’t
require the extra features like handling multiple dtypes. i.e. np.isnan
is faster than pd.isnull, but only works with certain dtypes.
contstruction.
BarData can be falsey. in create_buffer_panel, the intention of the
check against bar_data was to see if it was passed at all, not if it was
truthey. In order to make that check more explicit, the check now
asserts that bar_data is not None.