Commit Graph

85 Commits

Author SHA1 Message Date
Conner Fromknecht 99efa7a9f3 Fixed catalyst tests except example tests 2017-06-19 14:43:10 -07:00
Andrew Daniels f088afc1e1 MAINT: Modify ReindexBarReader.get_value to handle missing data
Instead of raising an exception, return 0.0 for volume, and nan for
everything else.
2017-05-09 09:34:37 -04:00
Andrew Daniels b3c1cd5535 MAINT: Display diff if input to daily bar writer has gaps/extra bars 2017-05-04 10:19:09 -04:00
Andrew Daniels 0d9f4d29f5 MAINT: Handle gaps in input to daily bars writer (#1778)
Previously, a dataframe passed into BcolzDailyBarWriter.write that was
missing an expected session between its first and last sessions would be
written incorrectly. Upon converting the dataframe to a ctable, the
values for all days following the gap would be shifted backwards, and
nans would be shifted in at the end.

This commit handles the issue by asserting that the number of rows in
the input table matches the number of sessions in the calendar between
the table's first and last sessions.

Also fixes a test that was mistakenly using minutes_in_range where it
should have been using sessions_in_range (uncovered by this change).
2017-05-03 20:49:22 -04:00
Andrew Daniels f4f2048a68 PERF: Avoid repeated recursive calls when getting forward-filled close
Instead of recursively calling `DailyHistoryAggregator.closes` until we
find a non-nan close, we can instead call `load_raw_arrays` once, and
find the value from the returned array.
2017-04-06 09:51:01 -04:00
dmichalowicz cf68953bf2 TST: Use 'us_futures' calendar in test fixtures 2017-04-03 10:18:03 -04:00
dmichalowicz 0d157859e0 BUG: Open and close resampling code could hit index errors 2017-03-28 16:06:29 -04:00
Eddie Hebert 873f3a7fc9 BUG: Fix end session metadata for minute bar writer.
When opening with a new `end_session`, i.e. opening for append, write the new
end session to the metadata.

Fixes an issue where the calendar on minute bar readers did not include the
recently appended day, causing reads on the last values to fail.

According, update append test to read a value, instead of checking table length.
2017-01-22 15:14:05 -05:00
Eddie Hebert 261803b622 ENH: Add a method to open existing minute bar directory.
Remove need for a consumer that is editing an existing minute bars directory to
reread the values which should not change from the metadata.

Add a test to the append on new day and truncate, which would be the common
usage of this method.
2017-01-17 17:25:27 -05:00
Eddie Hebert d7d2214756 ENH: Add a reader writer pair for HDF5 minute bar updates.
This format is intended for storing data for all sids of an asset type,
e.g. equities or futures for a session. bcolz is not used to avoid the overhead
of creating the directories and files for each asset (which numbers around ~8000
for active equities) can be removed since the update is meant to be read at
once, instead of supporting the random access pattern needed by the simulation.

This patch only adds the reader/writer pair, with the management of finding the
paths to delta files and the application of the updates to the bcolz write left
to internal loader code.

Also, the update reader interface is intentionally constrained to the data for
an entire session to allow for an implementation that allows for mid-session updates.
2017-01-04 12:09:10 -05:00
dmichalowicz ba81f12370 TST: Extra test for reading/writing ohlc ratios 2016-12-22 14:34:46 -05:00
Eddie Hebert 4fcf31730c BUG: Fix minute bar last traded after half day.
When the following conditions occur,

- a `nan` occurred after a half day (e.g. on the Monday after
Thanksgiving, where the Friday would be a half day.)

-data was written to the span between the early close and where the market close
would have been if it were not an early close session

- a `nan` also occured on the last minute of the early market session.

the exisitng implementation would incorrectly return a `nan` when requesting a
forward filled price.

The steps that caused this error were.

1. Request for `'price'` on the market open of the day after the early close.

2. `nan` is found for that minute

3. `get_last_traded_dt` is called, and finds a volume that occurs after the
early close. e.g. `18:47` when the market close was `18:00`.

4. The minute position for `18:47` is used, when calling
`find_positon_of_minute`, since that value is after the `market_close` the
minute is set to the position of `18:00`` due to the delta logic in

5. Since there is also no data in at `18:00`, a `nan` is returned, even though
there were valid minutes earlier in the session. e.g. a non-zero volume at
`16:47` should have been used, but was not.

Fix by checking the current minute against the minute close when searching for
the last traded minute. If the minute is greater than the market close for the
corresponding day, continue the search until the minute position is within the
trading session.

This could also be fixed by enforcing that only zeros can be written between an
early close and the minute where the close would have been, but this fix allows
the reader to work with existing data.
2016-11-15 15:09:19 -05:00
Eddie Hebert 098d38ac76 MAINT: Return nan from daily bcolz get_value.
Match the behavior of the minute bar reader, now that the session and
minute bar readers share a common interface.

isnull is slightly slower than checking against -1; however, n cases
where we check against illiquid trades in a tight loop, volume is
checked which is not using nan. The change here should be marginal with
regards to performance.
2016-10-25 11:25:09 -04:00
Eddie Hebert fccbae25ed BUG: Fix session from minute reader's last traded.
The last traded dt provided from the session bar reader which resamples
from minutes should provide a dt that is a session label, not one that
is at the minute frequency.
2016-10-24 13:58:58 -04:00
Eddie Hebert a4205a0500 PERF: Speedup minute to session sampling.
The minute to session sampling reading was creating two DataFrame
objects, the first to hold the minute data, and then a second returned
by the `DataFrame.groupby` to sample down to sessions.

Instead use the arrays returned by the minute readers `load_raw_arrays`
and implement sampling logic which takes advantage that the minutes
being passed start with the first minute of the first session and end
with the last minute of the last session.

On my machine this takes the tests in `test/test_continuous_futures`
from ~4.0 to about ~0.1 seconds.
2016-10-24 09:59:22 -04:00
Andrew Daniels 518b1d78ac MAINT: Adds option for minute bar writer to not write metadata
With the addition of the truncate function, there are cases where we'll
want to construct a BcolzMinuteBarWriter to call truncate, without
gathering all the metadata. This commit adds a write_metadata arg to its
init, which is True by default. If False is specified, no metadata is
written.

Requires adding logic to truncate to update end_session in metadata to
the truncate date.
2016-09-21 11:31:54 -04:00
Scott Sanderson 758ed0fffa MAINT: Pass float explicitly. 2016-09-20 17:12:07 -04:00
Scott Sanderson 2772975e2d MAINT: Use float in np.full. 2016-09-20 17:12:07 -04:00
Andrew Daniels 95e07f2735 ENH: Adds truncate method to BcolzMinuteBarWriter (#1499) 2016-09-19 13:02:48 -04:00
Jean Bredeche ae0d41af6f ENH: Make reader.get_value raise NoDataOnDate if the date is not in the calendar.
DataPortal now catches the NoDataOnDate exception and returns nan for
OHLC and 0 for V.

Price is still forward filled, unchanged.
2016-09-14 22:21:43 -04:00
Jean Bredeche a5693d0589 MAINT: Add BarReader base class for both minute and session readers 2016-09-14 13:47:12 -04:00
Scott Sanderson d6ad73e064 MAINT: Updates from Joe's PR feedback. 2016-09-07 20:42:19 -04:00
Scott Sanderson 1ca23f2583 PERF: Remove module-scope calendar creations.
Remove module scope invocations of `get_calendar('NYSE')`, which cuts
zipline import time in half on my machine. This make the zipline CLI
noticeably more responsive, and it reduces memory consumed at import
time from 130MB to 90MB.

Before:

$ time python -c 'import zipline'

real    0m1.262s
user    0m1.128s
sys     0m0.120s

After:

$ time python -c 'import zipline'

real    0m0.676s
user    0m0.536s
sys     0m0.132s
2016-09-06 09:57:23 -04:00
Eddie Hebert 5e3b949fc6 TST/BUG: Full coverage on resample module.
test_resample now fully covers the resample module.

Fix a bug exposed by increased coverage, where daily aggregation on
`high` would return `nan` for an asset instead of 1) during the
course of day `1d` history was called on non-consecutive minutes and 2)
either, a) the value for the previously inspected dt was `nan` or b)
there were only `nan`s between the previous and current dt.

`low` had a similar bug which was only triggered if the value for the
previously inspected dt was `nan`.
2016-09-01 16:41:45 -04:00
Eddie Hebert d463a9855b TST/BUG: Cover all reindex session public methods.
Increase coverage on `ReindexSessionBarReader` so that all methods which
are considered part of the interface are covered by `test_resample`.

Fix bug in `get_value`, exposed by increased coverage, where the
`NoDataOnDate` exception was bubbling from the bcolz reader all the way
up when a session which was a holidy on the underlying reader was passed
to the reindex reader. (The reindex reader should return nan/0 in that
case.)

Also, move location of data index exceptions so that they are agnostic
to bcolz/us_equity_pricing; since the exception is now used by the
resample module to fix aforementioned bug.
2016-09-01 11:51:00 -04:00
Eddie Hebert 151c3e45a7 TST: Fix get_last_traded_dt on bcolz daily reader.
Remove special handling for the last session of an asset, which was
moving the last traded back a session.

If the asset has data on a session, `get_last_traded_dt` should always
return that session if it is the parameter to the method.
2016-08-31 14:59:58 -04:00
Eddie Hebert 0cba47e29f TST: Increase coverage for reindex reader methods
Add direct coverage on last_available_dt.

Also move reader creation into the instance fixture.

This patch attempted to add coverage on `get_last_traded_dt`, but in doing
so, revealed a bug in `BcolzDailyBarReader.get_last_traded_dt` when
requesting the last trading session of an asset.
When that is fixed, the skip can be removed.
2016-08-30 16:43:58 -04:00
Eddie Hebert 1984d13c2f Merge pull request #1446 from quantopian/use-us-futures-in-test-resample
TST: Use futures cal in resample suite.
2016-08-30 10:24:18 -04:00
Eddie Hebert 9db385bb75 TST: Use futures cal in resample suite.
Instead of CME, use the futures cal, which should now be the standard
calendar throughout; though some tests remain to be ported.
2016-08-29 15:43:39 -04:00
Eddie Hebert 3c7dae8c41 TST: Cover resample bar first_trading_day method.
Add a test to directly cover the first_trading_day method via the
`test_resample` suite. (The lack of coverage was exposed when testing
against real data.)

Also, refactor resample bar tests so that session bar reader is set up
in instance fixture.
2016-08-29 15:00:08 -04:00
Eddie Hebert 0f604686b6 MAINT: Add a reader which dispatches on asset type
Add `AssetDispatchSessionBarReader` and corresponding minute and session
bar version of that reader.
This reader routes requests to the appropriate reader based on the asset
type of the requested sids.

`load_raw_array` in the dispatch reader batches the sid by asset type
and then interleaves the results in the out arrays, so that the arrays
data corresponds with sids in the order that sids are passed to the
method, to meet the expected behavior of `load_raw_arrays`.

The dispatch redaer is intended for use by the data portal when using
both future and equities. The dispatch reader will also be passed to the
to the `HistoryLoader`s contained within the data portal, where the
batched `load_raw_arrays` will be used.

Also, BUG:
- Fix the return of `MinuteResampleSessionBarReader.load_raw_arrays` to
match all other readers.
- Use the input dt for the `MinuteResampleSessionBarReader.load_raw_arrays`
as a session label, instead of a minute dt, since it is a session bar
reader.
(Both of these bugs where discovered when using the resample reader for
future data in the dispatch tests.)
2016-08-25 16:29:45 -04:00
Eddie Hebert 67112e0e11 Merge pull request #1432 from quantopian/reindex-reader
ENH: Add a reader base which reindexes results.
2016-08-25 09:34:06 -04:00
Eddie Hebert 562098dbf8 ENH: Add a reader base which reindexes results.
Working towards history results which contain mixed asset types, add
a reader which makes `load_raw_arrays` return results indexed on the
session/minute ranges specified by the specified `trading_calendar`
instead of the calendar of the backing reader.

This reader will be used to make Equity readers align with Future
readers. It is intended for use as part of another reader (which will
dispatch queries based on asset type and then recombined results) which
will be passed to the `[Minute|Session]HistoryLoaders in the data portal.
2016-08-24 16:28:19 -04:00
Richard Frank 3493723e7b DEV: zipline ingest can downgrade the assets db
This lets us publish an "old" db version for the most recent release
of zipline, using the latest code base.
2016-08-24 15:32:30 -04:00
Eddie Hebert 71a34bf7ac MAINT: Standardize reader get value methods.
The daily/session bar reader's `spot_price` took the same parameters and
returned the same kind of output as the minute bar reader's `get_value`.

Standardize on one method to make a common interface, which may be
formally factored out in a later patch; to help enable writing reader
implementations or mixins which can be agnostic to the bar frequency.
2016-08-24 12:46:36 -04:00
Andrew Daniels a8f2b704a2 BUG: Fixes BcolzMinuteBarMetadata to read the version correctly (#1425)
We were mistakenly using the minute_per_day field.

We now expose from the metadata object the version from which the
metadata was read. This allows a new test that verifies the version is
read correctly.
2016-08-22 17:45:07 -04:00
Andrew Daniels 37e6a48e99 ENH: Pass calendar instance to BcolzMinuteBarWriter (#1406)
* First pass.

* Improvements and fixes

- Update usages of BcolzMinuteBarWriter
- Updates with rebuilt example data
- Expose calendar from BcolzMinuteBarMetadata instead of calendar_name
- Keep market_opens and market_closes in metadata for compatibility

* Store start_session and end_session in minute bcolz metadata

- start_session replaces first_trading_day
- Add end_session to limit to correct days

* For last_available_dt, get last close from calendar to maintain tz

* Bumps version and handles earlier versionson read

* Rebuilt example data on python 3

* Indicate metadata fields that are deprecated
2016-08-18 15:41:26 -04:00
Eddie Hebert 4a017ef63b ENH: Session bar reader resampled from minute data
Implement a `SessionBarReader` which uses a minute bar reader as a
backing source, resampling the minute bars into the box around the
corresponding session data.

Also, add future/CME test cases to resample suite.
2016-08-18 11:37:42 -04:00
Richard Frank fcf1067071 BUG: Fixes should_clean for keep_last=0 2016-08-17 18:18:01 -04:00
Andrew Daniels 440806ad60 MAINT: Use TradingCalendar objects for bundles (#1397)
* MAINT: Use TradingCalendar objects for bundles

Instead of trading days, opens, and closes, register now takes a
TradingCalendar object, along with a start_session and end_session. The
ingest function is now passed these values instead as well.

* Accept calendar name in addition to the actual object

* Updates bundles documentation for changes

* Fix typo in docs

* Use class formatting

* Force start_session and end_session within the bounds of the calendar

* Use UTC timestamps in test_core

* Document Trading Calendar API in appendix.rst
2016-08-17 13:37:07 -04:00
Eddie Hebert d1f7a819fc TST: Share resample test cases.
Also, move `DailyHistoryAggregator` to `resample` module, so that tools
for converting from minute to session bars are collocated.

This patch is in preparation of adding a daily bar reader which
resamples minute data, which will be located in the `resample` module
and share the test cases and expected results in `test_resample`.
2016-08-16 15:44:32 -04:00
Eddie Hebert e934c6aeaf TST: Make room for multiple calendars in tests.
When adding fixtures for futures data, there will be a need for multiple
calendars in the fixture ecosystem. e.g. a test that includes both
equities and futures would need an overall calendar which encompasses
both equities and futures; however, the test data for equities should
still still be limited to the bounds set by the NYSE calendar.

Make the fixtures that setup trading calendars and values dervied from
the trading calendar (e.g. trading sessions) accept an iterable of
calendars which need to be created, then populate those values into a
dict keyed by the calendar name.

Change `WithNYSETradingDays` to include sessions in the name,
since we are moving to session as the name for the 'day' unit.

Provide `trading_days` which is really "NYSE trading sessions` on
`WithTradingSessions` for backwards compatibility.
2016-08-05 12:17:27 -04:00
Joe Jevnik 4265a13edf Revert "Merge pull request #1354 from quantopian/revert-1302-point-in-time-asset-db"
This reverts commit 3b633011c6, reversing
changes made to 70ac5323de.
2016-08-02 14:25:10 -04:00
Joe Jevnik 814a2be7b7 Revert "Point in time asset db" 2016-07-27 23:29:08 -04:00
Joe Jevnik 7fd8c29880 ENH: add point in time aspect to equity symbol mapping
Changes the overlap behavior so that it is an error to write data which
would have two companies holding the same ticker. Other than one test
around which company would win in that case, all the other tests are
passing. That single test has been changed to check the write-time
error.
2016-07-26 13:34:58 -04:00
Joe Jevnik ec0ecfc1b9 TST: don't use showprogress in tests 2016-07-25 13:09:55 -04:00
Joe Jevnik e8728c0cd4 TST: fix data tests 2016-07-25 13:09:55 -04:00
Joe Jevnik ef4eafbbb8 TST: fix bundle test discovery 2016-07-25 13:09:55 -04:00
Jean Bredeche 5a0f840917 Clean up daily bar reader/writer to take advantage of new trading calendar. The reader
is backwards-compatible with the previous format.

In USEquityLoader, use dailyreader's trading_calendar.

This is backwards compatible and will fall back to the NYSE calendar if
the reader doesn’t have a calendar specified.
2016-07-15 15:13:57 -04:00
Jean Bredeche 6fb4923cc7 Re-implemented the Calendar API.
Instead of having separate ExchangeCalendar and TradingSchedule objects, we
now just have TradingCalendar.  The TradingCalendar keeps track of each
session (defined as a contiguous set of minutes between an open and a close).
It's also responsible for handling the grouping logic of any given minute
to its containing session, or the next/previous session if it's not a market
minute for the given calendar.
2016-07-12 13:13:50 -04:00