Commit Graph

62 Commits

Author SHA1 Message Date
Eddie Hebert baff6a84bc TST/BUG: Full coverage on resample module.
test_resample now fully covers the resample module.

Fix a bug exposed by increased coverage, where daily aggregation on
`high` would return `nan` for an asset instead of 1) during the
course of day `1d` history was called on non-consecutive minutes and 2)
either, a) the value for the previously inspected dt was `nan` or b)
there were only `nan`s between the previous and current dt.

`low` had a similar bug which was only triggered if the value for the
previously inspected dt was `nan`.
2016-09-01 16:41:45 -04:00
Eddie Hebert 59d06a1883 TST/BUG: Cover all reindex session public methods.
Increase coverage on `ReindexSessionBarReader` so that all methods which
are considered part of the interface are covered by `test_resample`.

Fix bug in `get_value`, exposed by increased coverage, where the
`NoDataOnDate` exception was bubbling from the bcolz reader all the way
up when a session which was a holidy on the underlying reader was passed
to the reindex reader. (The reindex reader should return nan/0 in that
case.)

Also, move location of data index exceptions so that they are agnostic
to bcolz/us_equity_pricing; since the exception is now used by the
resample module to fix aforementioned bug.
2016-09-01 11:51:00 -04:00
Eddie Hebert 1bebad5b68 TST: Fix get_last_traded_dt on bcolz daily reader.
Remove special handling for the last session of an asset, which was
moving the last traded back a session.

If the asset has data on a session, `get_last_traded_dt` should always
return that session if it is the parameter to the method.
2016-08-31 14:59:58 -04:00
Eddie Hebert 544dda115c TST: Increase coverage for reindex reader methods
Add direct coverage on last_available_dt.

Also move reader creation into the instance fixture.

This patch attempted to add coverage on `get_last_traded_dt`, but in doing
so, revealed a bug in `BcolzDailyBarReader.get_last_traded_dt` when
requesting the last trading session of an asset.
When that is fixed, the skip can be removed.
2016-08-30 16:43:58 -04:00
Eddie Hebert 624b4659f1 Merge pull request #1446 from quantopian/use-us-futures-in-test-resample
TST: Use futures cal in resample suite.
2016-08-30 10:24:18 -04:00
Eddie Hebert bce159f275 TST: Use futures cal in resample suite.
Instead of CME, use the futures cal, which should now be the standard
calendar throughout; though some tests remain to be ported.
2016-08-29 15:43:39 -04:00
Eddie Hebert 7146edbe7d TST: Cover resample bar first_trading_day method.
Add a test to directly cover the first_trading_day method via the
`test_resample` suite. (The lack of coverage was exposed when testing
against real data.)

Also, refactor resample bar tests so that session bar reader is set up
in instance fixture.
2016-08-29 15:00:08 -04:00
Eddie Hebert 413ea3d9d5 MAINT: Add a reader which dispatches on asset type
Add `AssetDispatchSessionBarReader` and corresponding minute and session
bar version of that reader.
This reader routes requests to the appropriate reader based on the asset
type of the requested sids.

`load_raw_array` in the dispatch reader batches the sid by asset type
and then interleaves the results in the out arrays, so that the arrays
data corresponds with sids in the order that sids are passed to the
method, to meet the expected behavior of `load_raw_arrays`.

The dispatch redaer is intended for use by the data portal when using
both future and equities. The dispatch reader will also be passed to the
to the `HistoryLoader`s contained within the data portal, where the
batched `load_raw_arrays` will be used.

Also, BUG:
- Fix the return of `MinuteResampleSessionBarReader.load_raw_arrays` to
match all other readers.
- Use the input dt for the `MinuteResampleSessionBarReader.load_raw_arrays`
as a session label, instead of a minute dt, since it is a session bar
reader.
(Both of these bugs where discovered when using the resample reader for
future data in the dispatch tests.)
2016-08-25 16:29:45 -04:00
Eddie Hebert 0dd01650f1 Merge pull request #1432 from quantopian/reindex-reader
ENH: Add a reader base which reindexes results.
2016-08-25 09:34:06 -04:00
Eddie Hebert fc8cf38a6e ENH: Add a reader base which reindexes results.
Working towards history results which contain mixed asset types, add
a reader which makes `load_raw_arrays` return results indexed on the
session/minute ranges specified by the specified `trading_calendar`
instead of the calendar of the backing reader.

This reader will be used to make Equity readers align with Future
readers. It is intended for use as part of another reader (which will
dispatch queries based on asset type and then recombined results) which
will be passed to the `[Minute|Session]HistoryLoaders in the data portal.
2016-08-24 16:28:19 -04:00
Richard Frank 419bd1e3b5 DEV: zipline ingest can downgrade the assets db
This lets us publish an "old" db version for the most recent release
of zipline, using the latest code base.
2016-08-24 15:32:30 -04:00
Eddie Hebert a3c1f4ce36 MAINT: Standardize reader get value methods.
The daily/session bar reader's `spot_price` took the same parameters and
returned the same kind of output as the minute bar reader's `get_value`.

Standardize on one method to make a common interface, which may be
formally factored out in a later patch; to help enable writing reader
implementations or mixins which can be agnostic to the bar frequency.
2016-08-24 12:46:36 -04:00
Andrew Daniels 193b657bfe BUG: Fixes BcolzMinuteBarMetadata to read the version correctly (#1425)
We were mistakenly using the minute_per_day field.

We now expose from the metadata object the version from which the
metadata was read. This allows a new test that verifies the version is
read correctly.
2016-08-22 17:45:07 -04:00
Andrew Daniels 53ca68e8fb ENH: Pass calendar instance to BcolzMinuteBarWriter (#1406)
* First pass.

* Improvements and fixes

- Update usages of BcolzMinuteBarWriter
- Updates with rebuilt example data
- Expose calendar from BcolzMinuteBarMetadata instead of calendar_name
- Keep market_opens and market_closes in metadata for compatibility

* Store start_session and end_session in minute bcolz metadata

- start_session replaces first_trading_day
- Add end_session to limit to correct days

* For last_available_dt, get last close from calendar to maintain tz

* Bumps version and handles earlier versionson read

* Rebuilt example data on python 3

* Indicate metadata fields that are deprecated
2016-08-18 15:41:26 -04:00
Eddie Hebert f14fcd9b07 ENH: Session bar reader resampled from minute data
Implement a `SessionBarReader` which uses a minute bar reader as a
backing source, resampling the minute bars into the box around the
corresponding session data.

Also, add future/CME test cases to resample suite.
2016-08-18 11:37:42 -04:00
Richard Frank cbcb71bbad BUG: Fixes should_clean for keep_last=0 2016-08-17 18:18:01 -04:00
Andrew Daniels 08a03edad7 MAINT: Use TradingCalendar objects for bundles (#1397)
* MAINT: Use TradingCalendar objects for bundles

Instead of trading days, opens, and closes, register now takes a
TradingCalendar object, along with a start_session and end_session. The
ingest function is now passed these values instead as well.

* Accept calendar name in addition to the actual object

* Updates bundles documentation for changes

* Fix typo in docs

* Use class formatting

* Force start_session and end_session within the bounds of the calendar

* Use UTC timestamps in test_core

* Document Trading Calendar API in appendix.rst
2016-08-17 13:37:07 -04:00
Eddie Hebert 9685f3077a TST: Share resample test cases.
Also, move `DailyHistoryAggregator` to `resample` module, so that tools
for converting from minute to session bars are collocated.

This patch is in preparation of adding a daily bar reader which
resamples minute data, which will be located in the `resample` module
and share the test cases and expected results in `test_resample`.
2016-08-16 15:44:32 -04:00
Eddie Hebert e934c6aeaf TST: Make room for multiple calendars in tests.
When adding fixtures for futures data, there will be a need for multiple
calendars in the fixture ecosystem. e.g. a test that includes both
equities and futures would need an overall calendar which encompasses
both equities and futures; however, the test data for equities should
still still be limited to the bounds set by the NYSE calendar.

Make the fixtures that setup trading calendars and values dervied from
the trading calendar (e.g. trading sessions) accept an iterable of
calendars which need to be created, then populate those values into a
dict keyed by the calendar name.

Change `WithNYSETradingDays` to include sessions in the name,
since we are moving to session as the name for the 'day' unit.

Provide `trading_days` which is really "NYSE trading sessions` on
`WithTradingSessions` for backwards compatibility.
2016-08-05 12:17:27 -04:00
Joe Jevnik 4265a13edf Revert "Merge pull request #1354 from quantopian/revert-1302-point-in-time-asset-db"
This reverts commit 3b633011c6, reversing
changes made to 70ac5323de.
2016-08-02 14:25:10 -04:00
Joe Jevnik 814a2be7b7 Revert "Point in time asset db" 2016-07-27 23:29:08 -04:00
Joe Jevnik 7fd8c29880 ENH: add point in time aspect to equity symbol mapping
Changes the overlap behavior so that it is an error to write data which
would have two companies holding the same ticker. Other than one test
around which company would win in that case, all the other tests are
passing. That single test has been changed to check the write-time
error.
2016-07-26 13:34:58 -04:00
Joe Jevnik ec0ecfc1b9 TST: don't use showprogress in tests 2016-07-25 13:09:55 -04:00
Joe Jevnik e8728c0cd4 TST: fix data tests 2016-07-25 13:09:55 -04:00
Joe Jevnik ef4eafbbb8 TST: fix bundle test discovery 2016-07-25 13:09:55 -04:00
Jean Bredeche 5a0f840917 Clean up daily bar reader/writer to take advantage of new trading calendar. The reader
is backwards-compatible with the previous format.

In USEquityLoader, use dailyreader's trading_calendar.

This is backwards compatible and will fall back to the NYSE calendar if
the reader doesn’t have a calendar specified.
2016-07-15 15:13:57 -04:00
Jean Bredeche 6fb4923cc7 Re-implemented the Calendar API.
Instead of having separate ExchangeCalendar and TradingSchedule objects, we
now just have TradingCalendar.  The TradingCalendar keeps track of each
session (defined as a contiguous set of minutes between an open and a close).
It's also responsible for handling the grouping logic of any given minute
to its containing session, or the next/previous session if it's not a market
minute for the given calendar.
2016-07-12 13:13:50 -04:00
Eddie Hebert 51eda06323 MAINT: Add equity to naming of bar data classes.
In preparation of adding futures, add equity to the names of both the
classes and methods for writing bcolz data. Futures data will use a
different minutes per day with a separate reader. This change will allow
both equity and futures fixtures to be side by side.

Also, break out the method which generates the dataframes and trading
days member into fixtures (`EquityMinuteBarData` and
`EquityDailyBarData`) on which the `*BarReader` fixture depends.  This
fixture is separated out to enable reader/writers in different formats
to use the same data setup. (There is internal code which needs to write
minute and daily bar data in a database format.)
2016-06-30 08:21:42 -04:00
Andrew Daniels 60cd4aab91 Use new API in tests/data/bundles/test_core.py 2016-06-08 16:24:36 -04:00
jfkirk 2a8f69fc01 MAINT: DataPortal env -> asset_finder 2016-06-08 13:34:22 -04:00
jfkirk 0a6ad9ac9e STY: Your flake is on fleek 2016-06-08 13:34:22 -04:00
jfkirk 581e817603 MAINT: Rebase reconciliation 2016-06-08 13:34:22 -04:00
jfkirk 10a118d94c MAINT: Removes references to tradingcalendar 2016-06-08 13:34:20 -04:00
jfkirk 75e0e4723d TST: Refactors more tests to use WithTradingSchedule 2016-06-08 13:34:20 -04:00
jfkirk 241abda2a5 STY: Flake8 2016-06-08 13:34:19 -04:00
jfkirk c8304e8601 ENH: Adds ExchangeCalendar, TradingSchedule, and implementations
Conflicts:
	tests/data/test_minute_bars.py
	tests/data/test_us_equity_pricing.py
	tests/finance/test_slippage.py
	tests/pipeline/test_engine.py
	tests/pipeline/test_us_equity_pricing_loader.py
	tests/serialization_cases.py
	tests/test_algorithm.py
	tests/test_assets.py
	tests/test_bar_data.py
	tests/test_benchmark.py
	tests/test_exception_handling.py
	tests/test_fetcher.py
	tests/test_finance.py
	tests/test_history.py
	tests/test_perf_tracking.py
	tests/test_security_list.py
	tests/utils/test_events.py
	zipline/algorithm.py
	zipline/data/data_portal.py
	zipline/data/us_equity_loader.py
	zipline/errors.py
	zipline/finance/trading.py
	zipline/testing/core.py
	zipline/utils/events.py
2016-06-08 13:34:18 -04:00
Andrew Daniels 8e6c98e9aa BUG: Fixes reading and writing of daily bars first_trading_day attr
When writing first_trading_day, it is already in the correct frame of
reference (seconds since epoch) and does not need to be transformed
further. Adjusts the reader to expect this value.
2016-06-02 13:41:09 -04:00
Stewart Douglas 17c20da026 TST: Add test to append minutely data 2016-05-26 09:38:25 -04:00
Andrew Daniels f1cfe1f2db BUG: Fixes bcolz padding to not always pad 390 minutes
If minutes already exist for the last existing day, adjust the number of
minutes padded to account for them. Previously we would always pad 390,
leading to a mismatch in the number of rows.
2016-05-25 14:26:16 -04:00
Stewart Douglas 8217cdb1bd ENH: Allow BcolzMinuteBarWriter to append to most recent day
Minutely data can now be appended to bcolz files even when
minutes in the same day have already been written. For example,
previously attempting to write data for the minute 2016-05-11 16:30
would raise an exception if any OHLCV data for 2016-05-11 had been
written to the same file.

Trying to overwrite existing minutes still raises a
BcolzMinuteOverlappingData exception.

Note that previously all sids' bcolz files ended at the same time.
This is no longer necessarily the case. The last record in each
sid's bcolz file now corresponds to the latest minute for which
OHLCV data is provided to the writer.
2016-05-13 16:24:21 -04:00
Joe Jevnik 55f1548160 BUG: fix inverted splits in quandl data 2016-05-09 14:00:35 -04:00
Joe Jevnik d819721d96 ENH: use more human readable format for bundle ingest directories
We are now using isoformats with ':' replaced with ';'. We cannot use a
normal isoformat because windows does not allow files or directories
with ':' in the name.
2016-05-05 18:22:13 -04:00
Joe Jevnik 89542e33bd ENH: Adds quantopian-quandl bundle as new default.
This data bundle will use the quantopian mirror of the quandl WIKI data
instead of downloading from quandl directly. This dramatically improves
the speed because we do not pay the rate limiting for quandl and we can
send the data in the format zipline expects.
2016-05-05 18:22:13 -04:00
Joe Jevnik 59c8e371a2 ENH: Updates the cli, data bundles and extensions.
Adds the data bundle concept which makes it easy for users to register
loading functions to build out minute and daily data along with an
assets db and adjustments db. By default we have provided a `quandl`
bundle which pulls from the public domain WIKI dataset. Users may
register new bundles by decorating an ingest function with
`zipline.data.bundles.register(<name>)`. This also provides a
`yahoo_equities` function for creating an ingestion function that will
load a static set of assets from yahoo.

The cli is now structured as a couple of subcommands and has been
changed to `python -m zipline`. The old behavior of `run_algo.py` has
been moved to the `run` subcommand. This is almost entirely the same
except that it now takes the name of the data bundle to use, defaulting
to `quandl`.

The next subcommand is `ingest` which takes the name of
a data bundle to ingest. This will run the loading machinery and write
the data to a specified location that `run` can find.

There is also a `clean` subcommand which deletes the data that was
written with `ingest`.

Extensions have also been added to zipline. This is an experimental
feature where users can provide an extra set of python files to run at
the start of the process. These can be used to configure aspects of
zipline. Right now the only thing that is supported in an extension file
is the registration of a new data bundle.
2016-05-03 18:38:24 -04:00
Joe Jevnik efac476976 ENH: make BcolzMinuteBarWriter.write take iterable
Updates the BcolzMinuteBarWriter.write api to allow users to pass their
data as a stream instead of requiring that they loop over their data
externally. This matches the API presented by BcolzDailyBarWriter.
2016-04-29 16:14:48 -04:00
Eddie Hebert 66d05aa563 PERF: Improve read time for smaller num of assets.
The BcolzDailyBarReader was optimized for the pipeline case of reading
all assets at once.

Now that the reader is also used to support daily history the case of
reading a data for a small number of assets is more common, particularly
in algorithms that use the history API which have a high rotation of
assets (e.g. an algorithm which pipeline uses to set the active
universe)

Remove the bottleneck in reading a small number of assets by
conditionally reading the slice for each asset from the carray, instead
of reading the data for all equities and then indexing into that full
array. On a certain number of assets, it is still better to read all the
data at once. On the Quantopian dataset, which holds data for 20000
about for the last 10 years of equity data (where not all equities trade
over the full range), stored in 118 blosc blp files per column, the
tipping point where the 'read all' mode wins out between 3000-4000
assets.

That number was tested by trying to exercise a worst case scenario where
the equities were spread out evenly across the blp files, by stepping
along a sorted list of assets that were alive over a query range which
spanned 70 trading days.
```
size = 3000
sids = [assets[i] for i in range(0, len(assets), len(assets) /
size)][:size]
```

Also, add parameter to WithBcolzDailyBarReader fixture which allows the
test to specify what the threshold count for reading all data should be,
so that the test_us_equity_pricing can be forced into either mode to
make sure that both branches in logic are covered by all test cases.

On local dev machine this patch improves the read time of `load_raw_array`
for one asset from 100 ms to 96.5 µs. (10^5 improvement.) With reading
only asset per call a being an observed common case when populating the
non-cached values in USEquityHistoryLoader.
2016-04-21 20:43:52 -04:00
Joe Jevnik bc0b117dc9 MAINT: make the data loading apis more consistent.
Changes BcolzDailyBarWriter to not be an abc, data is passed as an
iterator of (sid, dataframe) pairs to the write method.

Changes the AssetsDBWriter to be a single class which accepts an engine
at construction time and has a `write` method for writing dataframes for
the various tables. We no longer support writing the various other data
types, callers should coerce their data into a dataframe themselves. See
zipline.assets.synthetic for some helpers to do this.

Adds many new fixtures and updates some existing fixtures to use the new
ones:

WithDefaultDateBounds
  A fixture that provides the suite a START_DATE and END_DATE. This is
  meant to make it easy for other fixtures to synchronize their date
  ranges without depending on eachother in strange ways. For example,
  WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should
  both have data for the same dates, so they may use depend on
  WithDefaultDates without forcing a dependency between them.

WithTmpDir, WithInstanceTmpDir
  Provides the suite or individual test case a temporary directory.

WithBcolzDailyBarReader
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzDailyBarWriter.write

WithBcolzDailyBarReaderFromCSVs
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from a
  collection of CSV files and then converted into the bcolz data through
  BcolzDailyBarWriter.write_csvs

WithBcolzMinuteBarReader
  Provides the suite a BcolzMinuteBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzMinuteBarWriter.write

WithAdjustmentReader
  Provides the suite a SQLiteAdjustmentReader which reads from an in
  memory sqlite database. The data will be read from dataframes and then
  converted into sqlite with SQLiteAdjustmentWriter.write

WithDataPortal
  Provides each test case a DataPortal object with data from temporary
  resources.
2016-04-15 23:46:10 -04:00
Eddie Hebert d27f85e16b BUG: Enforce sorted order on minutes to delete.
The intervals are returned as a set, so order is not guaranteed,
which becomes exposed when reading windows which span multiple years.

The deletion of values from the regular sized minute array assumes that
intervals can be reversed to delete the array from the back.
2016-04-12 14:16:10 -04:00
Eddie Hebert 0a3a2f3653 BUG: Ensure matched input length to minute writer.
When the dts and length of cols are mismatched the writer behaves in
unintended ways. e.g. in a case where a consumer passed dts which had
minutes with no trades removed, but regular (market minute for day)
sized arrays for the data with `0`'s on minutes without trades, the non
trade minutes from cols are written to slots in the output where a trade
is intended.

Protect against this misuse by checking that all lengths are equal when
using the `write_cols` method.

Make a separate `_write_cols` method for use by both `write_cols` and
`write`, since the `write` method which takes a DataFrame has the
matched input length enforced by the DataFrame.
2016-04-07 13:53:59 -04:00
Eddie Hebert 16fd6681a6 ENH: Rewrite of Zipline to use lazy access pattern
More documentation to follow in release notes.

Based on lazy-mainline branch, see for more details.

Also-By: Jean Bredeche <jean@quantopian.com>
Also-By: Andrew Liang <aliang@quantopian.com>
Also-By: Abhijeet Kalyan <akalyan@quantopian.com>
2016-04-04 16:12:58 -04:00