Commit Graph

117 Commits

Author SHA1 Message Date
Eddie Hebert 75213ac176 MAINT: Write open and closes for minute bar format
Write arrays representing corresponding market opens and market closes,
which will eventually replace the `minute_index` field.

The market closes are being added for incoming work on another branch
which will use the market closes to generate a list of non-market
minutes to filter out when returning data from `unadjusted_window`.
2016-03-24 23:18:42 -04:00
Richard Frank d873038a7e BUG: Specify int64 instead of system int
to handle 32bit python
2016-03-23 15:26:52 -04:00
Eddie Hebert 0f14972e08 ENH: Unadjusted window data for minute bars.
Add a method to minute bar reader which returns the OHLCV for all
requested fields for a list assets over the specified start and end
minutes.

Initial usage is intended for use by a loader which consumes minute bar
data to resample into daily bars, but may also be used when aggregating
minute data during '1d' history calls in Q2.0.

This iteration does not include including of early closes.
2016-03-14 21:52:01 -04:00
Eddie Hebert 37d341bb95 BUG: Truncate treasury curves to env min date. 2016-03-08 17:14:28 -05:00
Joe Jevnik 5eb453675d BUG: don't fail if you cannot make a webrequest 2016-02-11 18:46:43 -05:00
Scott Sanderson 5f49fa22cb MAINT: Upgrade numpy and fix warnings.
Mostly fixes ambiguous calls to numpy.full, and uses explicitly-united
NaT values.
2016-02-11 18:46:39 -05:00
Eddie Hebert 27f94f83fa ENH: Allow passing of numpy arrays to writer.
For faster parsing and writing workflows, do not require a DataFrame.
2016-02-02 14:03:42 -05:00
Eddie Hebert 488721e805 ENH: Add padding method to minute bars writer.
So that consumers can write empty days worth of data, without needing
to construct a DataFrame with zero data force a write.

The internal loader uses `last_date_in_output_for_sid` to signify that
data has been attempted to be retrieved for all dates up until that, so
that when resuming a job those retrieval of data for those dates are not
re-attempted.

Also, used to make the write logic cleaneer, by making it only
necessary to create an array large enough for the given df.
2016-02-01 14:18:22 -05:00
Jeremiah Lowin e44c0d42e1 Replace print with logger.info 2016-01-25 19:22:28 -05:00
Eddie Hebert 930aa1b29b MAINT: Use metadata method for reader init.
Use the preexisting metadata method when instantiating the minute bar
reader.

An internal sublcass uses the `_get_metadata` method to setup data for
directories that have not used the new writer/reader interface.
(i.e. allows for reader creation when the metadata.json file does not
exist.)
2016-01-25 12:58:43 -05:00
Eddie Hebert 984e934e83 BUG: Fix OSError when creating sids that share dir
Fix a bug where creating a sid bcolz file when the containing directory
was already occupied by a sid caused an OSError on attempt of creating
the directory because it already existed.

e.g. if there were two sids, `1` and `2`. The paths would be
`00/00/000001.bcolz` and `00/00/000002.bcolz` which share the same
directory `00/00`.

Fixed by checking for directory existence before calling `makedirs`.

Add test coverage which exercises writing of sids that are siblings in
the sid directory structure.
2016-01-25 10:37:50 -05:00
Eddie Hebert 603a345e2d BUG: Allow writing on first day. 2016-01-21 11:10:57 -05:00
Eddie Hebert 3a8be8c624 BUG: Need to be able to append to minute ctable. 2016-01-21 11:10:07 -05:00
Eddie Hebert d5c3b5a15c ENH: Add writer for minute bcolz format.
Implement a writer for minute data into a format comprised of multiple
ctables, one for each individual asset, with a common 'index' shared by
all ctables where a given a dt maps to the same array index for all
equities and fields.

This format is pulled from the lazy-mainline/Q2.0 branch, with some
changes to the interface.

Add basic retrieval of values at a given dt to reader. Not yet used by
Zipline simulations, but added to support unit tests.

Also, rename stubbed out us_equity_minutes to minute_bars, since the
writer can be agnostic to asset type.
2016-01-21 10:54:27 -05:00
Scott Sanderson ec0abf1822 MAINT: Use coerce_string in BcolzDailyBarReader. 2016-01-12 17:39:44 -05:00
Scott Sanderson c8b80dddb0 BUG: Handle unicode adjustments path in py2.
In Python 2, passing unicode to SQLiteAdjustmentReader would fail to
coerce.
2016-01-12 17:39:36 -05:00
dmichalowicz 4f24a32c45 BUG: Benchmark and treasury curves data missing on first download 2015-12-21 13:38:24 -05:00
Eddie Hebert 8c1e52385f MAINT: Raise NotImplementedError in data_portal
The patch that added data_portal intended for NotImplementedError to be
raised if one of the functions was invoked, but the raise was omitted.
2015-12-15 17:08:19 -05:00
Eddie Hebert 6106cb98a5 REF: Remove unused parameter. 2015-12-14 14:26:06 -05:00
Eddie Hebert e5b5023d42 ENH: Add initial commit for DataPortal and readers
Moved from the `lazy-mainline` branch,
https://github.com/quantopian/zipline/pull/858

The intent of this patch to provide the basic class and readers
interfaces, developed on that branch, so that the use of creating the
object and opening paths etc. can be tested internally.

Additional changes beyond the lazy-mainline branch, addition of future
minute reader, and daily bar reader.

Also allow an argument of the future_daily_reader, though no such reader
yet exists.

It may be that future and equity readers share an interface, and a
further improvement would be providing an abstract base class.

co-author: @jbredeche <jean@quantopian.com>
2015-12-14 14:23:20 -05:00
Eddie Hebert 5f81acea05 ENH: Return -1 for missing spot prices.
Return -1 when there is a zero value for a spot price.
Intended for use by the incoming data portal changes. When the data
portal will see a -1 value, the portal will seek back a trading day
until a non-negative value is returned.
2015-11-25 11:32:36 -05:00
Eddie Hebert 53dae6320c BUG: Fix volume value returned by daily spot price
Volumes were incorrectly having the thousands factor applied, however
the volume is written as is (without the factor, since it volume is an
int, not float value.)

Fix by adding a special case for volume which returns the price as is.
2015-11-25 10:19:52 -05:00
Scott Sanderson 43ac9eab5c ENH: Check getmtime on download locations.
Rather than repeatedly try and fail to download data that's not yet
available, only try to download again if we haven't successfully
downloaded in the last hour.
2015-11-13 18:06:04 -05:00
Scott Sanderson 1b7d0c9477 MAINT: Add __future__ print function import.
We do print(stock) in this file, which happens to work in py2, but is
confusing.
2015-11-13 18:06:04 -05:00
Scott Sanderson 01888918dd MAINT: Use itemgetter instead of homegrown func. 2015-10-25 16:37:59 -04:00
Scott Sanderson 75f7c44223 BUG: Better check for last date.
Use get_loc to find the trading day that ended 2 days before now.
2015-10-25 16:37:59 -04:00
Scott Sanderson 8fd18e5aa6 DOC: Comment on treasury division by 100. 2015-10-25 16:37:59 -04:00
Scott Sanderson 0710062e6a DOC: Docstring edits. 2015-10-25 16:37:59 -04:00
Scott Sanderson cabe22ae8e ENH: Always use Adjusted Close for benchmarks.
Previously we were using Close, and we calculated returns on the first
day of a window against the Open for that day.  We now always look back
an extra day to get the previous day's close.
2015-10-25 16:37:59 -04:00
Scott Sanderson df4cda4dc9 ENH: Remove defaults from get_benchmark_data. 2015-10-25 16:37:59 -04:00
Scott Sanderson d82cfb1e64 MAINT: Final polish on loader rewrites.
- Fixes an issue with the canadian treasury loader where it would never
  have enough data to not redownload because it can only download data
  in the last 10 years.
- Uses module objects directly instead of lazy imports.
- Adds lots of docstrings.
2015-10-25 16:37:59 -04:00
Scott Sanderson 71db6d3fdc MAINT: Remove unused loader_utils file. 2015-10-25 16:37:59 -04:00
Scott Sanderson 24d26f9e63 MAINT: Rewrite the benchmark loader. 2015-10-25 16:37:59 -04:00
Scott Sanderson 948196d2de MAINT: Remove unused loader_utils functions. 2015-10-25 16:37:59 -04:00
Scott Sanderson c9e165aa2d ENH: Rewrite Canadian treasury loader. 2015-10-25 16:37:59 -04:00
Scott Sanderson 8c38278783 ENH: Rewrite treasury loader using pandas.
Replaces our custom XML parsing with a single call to `pd.read_csv`
against the federal reserve's API.  This produces nearly identical
results as compared to the old loader, but it's dramatically simpler and
roughly 10x faster on my machine.

The average difference in magnitude between new and old is approximately
10e-7, and only one entry is different to a degree greater than the
number of significant figures provided by treasury.gov.

Additionally, the new loader correctly ignores Columbus Day of 2010, for
which the old loader erroneously produced an all-NaN row.

This also changes the interface that treasury modules modules are
required to implement. Modules must now supply a `get_treasury_data`
function that returns a `DataFrame` with a daily `DatetimeIndex` and a
column for each supported treasury duration.

Detailed comparison between results from new and old loader::

    from zipline.data.treasuries import get_treasury_data
    new = get_treasury_data() # New implementation
    old = pd.read_csv(  # Previously cached data
        '/home/ssanderson/.zipline/data/treasury_curves.csv'
        parse_dates=[0],
        index_col=0,
    )
    # These columns were unused.
    del old['tid']; del old['date']
    old = old.tz_localize('UTC')
    old.dropna(how='all')
    # old data erroneously contained an all-NaN entry for Columbus Day
    # in 2010.  Remove before comparing.
    old = old.dropna(how='all')

    In [25]: len(new) == len(old)
    Out[25]: True

    In [26]: abs(old - new).max()
    Out[26]:
    10year    2.000000e-04
    1month    6.938894e-18
    1year     1.000000e-04
    20year    1.000000e-04
    2year     2.000000e-04
    30year    1.000000e-04
    3month    1.000000e-03
    3year     1.000000e-04
    5year     1.387779e-17
    6month    1.000000e-04
    7year     1.000000e-04
    dtype: float64

    In [27]: abs(old - new).mean()
    Out[27]:
    10year    3.097414e-08
    1month    4.396534e-19
    1year     1.548707e-08
    20year    3.624502e-08
    2year     4.646120e-08
    30year    1.830496e-08
    3month    1.549427e-07
    3year     1.548707e-08
    5year     1.702619e-18
    6month    1.548707e-08
    7year     1.548707e-08
    dtype: float64

Since www.treasury.gov only reports values up to three significant
digits, we should only care about differences of greater than 1e-3.

There is exactly one such difference: the entry for the three month bond
on 1999-10-01::

    In [60]: new[(abs(new - old) >= 1e-3).any(axis=1)].T
    Out[60]:
    Time Period  1999-10-01 00:00:00+00:00
    1month                             NaN
    3month                          0.0498
    6month                          0.0501
    1year                           0.0530
    2year                           0.0573
    3year                           0.0583
    5year                           0.0590
    7year                           0.0622
    10year                          0.0600
    20year                          0.0657
    30year                          0.0615

    In [61]: old[(abs(new - old) >= 1e-3).any(axis=1)].T
    Out[61]:
            1999-10-01 00:00:00+00:00
            10year                     0.0600
            1month                        NaN
            1year                      0.0530
            20year                     0.0657
            2year                      0.0573
            30year                     0.0615
            3month                     0.0488
            3year                      0.0583
            5year                      0.0590
            6month                     0.0501
            7year                      0.0622

The US Treasury website (our old source) provides a value of 0.488 here,
whereas the Federal Reserve site (our new source) provides a value of
0.498.
2015-10-25 16:37:59 -04:00
Scott Sanderson 3c954af08c MAINT: Just do searchsorted with the date.
Previously we were converting our date to a string, then calling
`searchsorted` on the DatetimeIndex with the string, which would cause
pandas to convert the string back into a date to actually do the lookup.
2015-10-25 16:37:59 -04:00
Scott Sanderson 854b6638b2 MAINT: Remove default values from dump_treasury_curves.
We never call the function without passing them explicitly.
2015-10-25 16:37:59 -04:00
Eddie Hebert 8543b32468 Merge pull request #791 from quantopian/pipeline-effective-dates
MAINT: Set dividend effective date to ex_date.
2015-10-21 16:44:07 -04:00
Eddie Hebert 55b25bdd3f MAINT: Set dividend effective date to ex_date.
The price shock occurs on the effective_date. Had changed the effective_date to
be day before the ex_date with the belief that pipeline was applying values up
and until the effective_date, but the lookback windows apply before the
effective_date. Thus, the price shock calculation should still use the previous
days data but be dated on the ex_date to stay aligned with splits and
merger dating.
2015-10-21 16:43:13 -04:00
llllllllll 0183d0a914 ENH: Allows Float64Adjustments to act on a range of columns 2015-10-19 16:35:03 -04:00
Thomas Wiecki 659a367b09 STY Remove unused import of to_datetime. 2015-10-16 16:15:28 +02:00
Eddie Hebert 6b9476d346 BUG: Filter out payout rows with no prev close.
When the prev_close is 0 or does not exist, the resulting ration was either +inf
or nan, respectively.

Create a mask on the non-zero effective dates, where effective date is only
written when the prev close is sufficient for a valid ratio; and use that mask
to filter out the bad rows.

Also, use prev close as the effective date.
2015-10-15 13:30:05 -04:00
Eddie Hebert 9a2767ad07 Merge pull request #765 from quantopian/add-spot-price-and-write-adjustments
Add spot price and write adjustments
2015-10-13 14:02:44 -04:00
Eddie Hebert ccdc815526 ENH: Write dividend payouts to adjustments db.
To prepare for querying for payouts from SQLite, write the dividend
payouts to a new table `dividend_payouts`.

Change the expected columns of the passed dividend frame to contain the
payout data, and use that data to calculate the ratios (this moves
internal code that was calcualting the ratios into Zipline.)

The end result is that instead of just a `dividends` table with the
backward looking adjustment ratios, also write a `dividend_payouts`
table and a `stock_dividend_payout` table.
2015-10-13 14:02:26 -04:00
Eddie Hebert 752a2c3962 DOC: Fix comment typo.
Reader/Writer
2015-10-10 07:20:36 -04:00
Eddie Hebert 5338c8e611 ENH: Add spot_price to BcolzDailyBarReader.
Add new method to BcolzDailyBarReader, `spot_price` which returns the
unadjusted price for the specified day and sid.
2015-10-10 07:19:03 -04:00
Scott Sanderson 23ca58813a PERF: Speed up reading of adjustments.
For a pipeline doing simple computations on USEquityPricing data, we
were spending ~60% of `run_pipeline` loading adjustments.  Almost all of
that time was spent in calls to `DatetimeIndex.get_loc` to find the
indices of adjustment `eff_date`s.

This optimizes the eff_date lookups by pre-populating a cache of
seconds-since-epoch timestamps that we expect to see, and falling back
to `np.searchsorted` on cache misses.

In testing, this reduces the time to compute a 1-year pipeline with 30
and 90 day moving averages from 3.1 seconds to 0.9 seconds.
2015-10-09 17:48:07 -04:00
Scott Sanderson 4a9cd76dab MAINT: Remove unused constant. 2015-10-09 17:47:47 -04:00
Scott Sanderson f06f4bdd25 MAINT: Remove unused import. 2015-10-09 17:47:18 -04:00