`data.loader.ensure_benchmark_data()` was trying to use data after an exception was raised loading it. The code was logging and swallowing exceptions; this re-raises.
With the addition of the truncate function, there are cases where we'll
want to construct a BcolzMinuteBarWriter to call truncate, without
gathering all the metadata. This commit adds a write_metadata arg to its
init, which is True by default. If False is specified, no metadata is
written.
Requires adding logic to truncate to update end_session in metadata to
the truncate date.
- Fixes a warning on indexing with a float that ultimately came from
pd.Timedelta.total_seconds(). Adds ``timedelta_to_integral_seconds``
and ``timedelta_to_integral_minutes()`` functions and replaces various
usages of ``int(delta.total_seconds())`` with them.
- Fixes a warnings triggered in ``_create_daily_stats`` from
passing tz-aware datetimes to np.datetime64.
This reverts commit 86c7635b45, reversing
changes made to c77f2b92df.
Some real world cases hit errors with this change, due to the new offset
logic attempting to create Adjustments with invalid parameters.
Will identify exact conditions that cause this error and add as a test
case before remerging.
Instead of `HistoryLoader` containing separate adjustment calculation
logic, use `SQLiteAdjustmentReader.load_adjustments`.
This change required the addition of two offset parameters to
`load_adjustments` since the perspective on the data from within
`schedule_function` is skewed from how Pipeline looks at historical
data.
This is working towards creating an `AdjustmentReader` abc which
`SQLiteAdjustmentReader` and a upcoming continuous future adjustment
reader will share.
Remove module scope invocations of `get_calendar('NYSE')`, which cuts
zipline import time in half on my machine. This make the zipline CLI
noticeably more responsive, and it reduces memory consumed at import
time from 130MB to 90MB.
Before:
$ time python -c 'import zipline'
real 0m1.262s
user 0m1.128s
sys 0m0.120s
After:
$ time python -c 'import zipline'
real 0m0.676s
user 0m0.536s
sys 0m0.132s
test_resample now fully covers the resample module.
Fix a bug exposed by increased coverage, where daily aggregation on
`high` would return `nan` for an asset instead of 1) during the
course of day `1d` history was called on non-consecutive minutes and 2)
either, a) the value for the previously inspected dt was `nan` or b)
there were only `nan`s between the previous and current dt.
`low` had a similar bug which was only triggered if the value for the
previously inspected dt was `nan`.
Increase coverage on `ReindexSessionBarReader` so that all methods which
are considered part of the interface are covered by `test_resample`.
Fix bug in `get_value`, exposed by increased coverage, where the
`NoDataOnDate` exception was bubbling from the bcolz reader all the way
up when a session which was a holidy on the underlying reader was passed
to the reindex reader. (The reindex reader should return nan/0 in that
case.)
Also, move location of data index exceptions so that they are agnostic
to bcolz/us_equity_pricing; since the exception is now used by the
resample module to fix aforementioned bug.
Remove special handling for the last session of an asset, which was
moving the last traded back a session.
If the asset has data on a session, `get_last_traded_dt` should always
return that session if it is the parameter to the method.
`1d` history calls were failing on key errors when using the
`us_futures` calendar, because of timestamps occuring before a midnight
would present the wrong midnight (i.e. the midnight before the session,
instead of the following midnight, which is the label for the current
session.)
Tests will follow when bringing up coverage on resample and data portal
modules.
Combine the equity and future readers into asset dispatch readers, so
that simulations that use both asset types can access data for each.
This patch enables `history` for future assets in algorithms; however,
it does not add extra coverage in the `test_data_portal` or `test_history`
to cover future assets. Those tests will follow, however putting this in
separately since it shows that the wrapping of the readers in the asset
dispatch reader does not break existing equity strategies.
Add `AssetDispatchSessionBarReader` and corresponding minute and session
bar version of that reader.
This reader routes requests to the appropriate reader based on the asset
type of the requested sids.
`load_raw_array` in the dispatch reader batches the sid by asset type
and then interleaves the results in the out arrays, so that the arrays
data corresponds with sids in the order that sids are passed to the
method, to meet the expected behavior of `load_raw_arrays`.
The dispatch redaer is intended for use by the data portal when using
both future and equities. The dispatch reader will also be passed to the
to the `HistoryLoader`s contained within the data portal, where the
batched `load_raw_arrays` will be used.
Also, BUG:
- Fix the return of `MinuteResampleSessionBarReader.load_raw_arrays` to
match all other readers.
- Use the input dt for the `MinuteResampleSessionBarReader.load_raw_arrays`
as a session label, instead of a minute dt, since it is a session bar
reader.
(Both of these bugs where discovered when using the resample reader for
future data in the dispatch tests.)
Working towards history results which contain mixed asset types, add
a reader which makes `load_raw_arrays` return results indexed on the
session/minute ranges specified by the specified `trading_calendar`
instead of the calendar of the backing reader.
This reader will be used to make Equity readers align with Future
readers. It is intended for use as part of another reader (which will
dispatch queries based on asset type and then recombined results) which
will be passed to the `[Minute|Session]HistoryLoaders in the data portal.
For scaling up pricing data before writing to bcolz, the writer now
accepts a dict mapping each sid to the ratio to use. It still accepts a
single ratio as default_ohlc_ratio, which is used as a fallback if no
mapping exists for a given sid. The default is OHLC_RATIO (1000).
This allows better handling of futures pricing data, where the required
precision across root symbols is not consistent.
The daily/session bar reader's `spot_price` took the same parameters and
returned the same kind of output as the minute bar reader's `get_value`.
Standardize on one method to make a common interface, which may be
formally factored out in a later patch; to help enable writing reader
implementations or mixins which can be agnostic to the bar frequency.
We were mistakenly using the minute_per_day field.
We now expose from the metadata object the version from which the
metadata was read. This allows a new test that verifies the version is
read correctly.
In the data portal, remove methods that make a distinction between
future and equity asset type. Instead rely on the pricing reader
dispatching.
In support of incoming work which will upsample equity history arrays to
the larger future calendar.
Also, remove perf tracker tests which were using an equity
reader/writer, to be added back in later.
* First pass.
* Improvements and fixes
- Update usages of BcolzMinuteBarWriter
- Updates with rebuilt example data
- Expose calendar from BcolzMinuteBarMetadata instead of calendar_name
- Keep market_opens and market_closes in metadata for compatibility
* Store start_session and end_session in minute bcolz metadata
- start_session replaces first_trading_day
- Add end_session to limit to correct days
* For last_available_dt, get last close from calendar to maintain tz
* Bumps version and handles earlier versionson read
* Rebuilt example data on python 3
* Indicate metadata fields that are deprecated
Implement a `SessionBarReader` which uses a minute bar reader as a
backing source, resampling the minute bars into the box around the
corresponding session data.
Also, add future/CME test cases to resample suite.