catalyst

mirror of https://github.com/wassname/catalyst.git synced 2026-07-06 02:45:53 +08:00

Author	SHA1	Message	Date
Eddie Hebert	75213ac176	MAINT: Write open and closes for minute bar format Write arrays representing corresponding market opens and market closes, which will eventually replace the `minute_index` field. The market closes are being added for incoming work on another branch which will use the market closes to generate a list of non-market minutes to filter out when returning data from `unadjusted_window`.	2016-03-24 23:18:42 -04:00
Richard Frank	d873038a7e	BUG: Specify int64 instead of system int to handle 32bit python	2016-03-23 15:26:52 -04:00
Eddie Hebert	0f14972e08	ENH: Unadjusted window data for minute bars. Add a method to minute bar reader which returns the OHLCV for all requested fields for a list assets over the specified start and end minutes. Initial usage is intended for use by a loader which consumes minute bar data to resample into daily bars, but may also be used when aggregating minute data during '1d' history calls in Q2.0. This iteration does not include including of early closes.	2016-03-14 21:52:01 -04:00
Eddie Hebert	37d341bb95	BUG: Truncate treasury curves to env min date.	2016-03-08 17:14:28 -05:00
Joe Jevnik	5eb453675d	BUG: don't fail if you cannot make a webrequest	2016-02-11 18:46:43 -05:00
Scott Sanderson	5f49fa22cb	MAINT: Upgrade numpy and fix warnings. Mostly fixes ambiguous calls to numpy.full, and uses explicitly-united NaT values.	2016-02-11 18:46:39 -05:00
Eddie Hebert	27f94f83fa	ENH: Allow passing of numpy arrays to writer. For faster parsing and writing workflows, do not require a DataFrame.	2016-02-02 14:03:42 -05:00
Eddie Hebert	488721e805	ENH: Add padding method to minute bars writer. So that consumers can write empty days worth of data, without needing to construct a DataFrame with zero data force a write. The internal loader uses `last_date_in_output_for_sid` to signify that data has been attempted to be retrieved for all dates up until that, so that when resuming a job those retrieval of data for those dates are not re-attempted. Also, used to make the write logic cleaneer, by making it only necessary to create an array large enough for the given df.	2016-02-01 14:18:22 -05:00
Jeremiah Lowin	e44c0d42e1	Replace print with logger.info	2016-01-25 19:22:28 -05:00
Eddie Hebert	930aa1b29b	MAINT: Use metadata method for reader init. Use the preexisting metadata method when instantiating the minute bar reader. An internal sublcass uses the `_get_metadata` method to setup data for directories that have not used the new writer/reader interface. (i.e. allows for reader creation when the metadata.json file does not exist.)	2016-01-25 12:58:43 -05:00
Eddie Hebert	984e934e83	BUG: Fix OSError when creating sids that share dir Fix a bug where creating a sid bcolz file when the containing directory was already occupied by a sid caused an OSError on attempt of creating the directory because it already existed. e.g. if there were two sids, `1` and `2`. The paths would be `00/00/000001.bcolz` and `00/00/000002.bcolz` which share the same directory `00/00`. Fixed by checking for directory existence before calling `makedirs`. Add test coverage which exercises writing of sids that are siblings in the sid directory structure.	2016-01-25 10:37:50 -05:00
Eddie Hebert	603a345e2d	BUG: Allow writing on first day.	2016-01-21 11:10:57 -05:00
Eddie Hebert	3a8be8c624	BUG: Need to be able to append to minute ctable.	2016-01-21 11:10:07 -05:00
Eddie Hebert	d5c3b5a15c	ENH: Add writer for minute bcolz format. Implement a writer for minute data into a format comprised of multiple ctables, one for each individual asset, with a common 'index' shared by all ctables where a given a dt maps to the same array index for all equities and fields. This format is pulled from the lazy-mainline/Q2.0 branch, with some changes to the interface. Add basic retrieval of values at a given dt to reader. Not yet used by Zipline simulations, but added to support unit tests. Also, rename stubbed out us_equity_minutes to minute_bars, since the writer can be agnostic to asset type.	2016-01-21 10:54:27 -05:00
Scott Sanderson	ec0abf1822	MAINT: Use coerce_string in `BcolzDailyBarReader`.	2016-01-12 17:39:44 -05:00
Scott Sanderson	c8b80dddb0	BUG: Handle unicode adjustments path in py2. In Python 2, passing unicode to SQLiteAdjustmentReader would fail to coerce.	2016-01-12 17:39:36 -05:00
dmichalowicz	4f24a32c45	BUG: Benchmark and treasury curves data missing on first download	2015-12-21 13:38:24 -05:00
Eddie Hebert	8c1e52385f	MAINT: Raise NotImplementedError in data_portal The patch that added data_portal intended for NotImplementedError to be raised if one of the functions was invoked, but the raise was omitted.	2015-12-15 17:08:19 -05:00
Eddie Hebert	6106cb98a5	REF: Remove unused parameter.	2015-12-14 14:26:06 -05:00
Eddie Hebert	e5b5023d42	ENH: Add initial commit for DataPortal and readers Moved from the `lazy-mainline` branch, https://github.com/quantopian/zipline/pull/858 The intent of this patch to provide the basic class and readers interfaces, developed on that branch, so that the use of creating the object and opening paths etc. can be tested internally. Additional changes beyond the lazy-mainline branch, addition of future minute reader, and daily bar reader. Also allow an argument of the future_daily_reader, though no such reader yet exists. It may be that future and equity readers share an interface, and a further improvement would be providing an abstract base class. co-author: @jbredeche <jean@quantopian.com>	2015-12-14 14:23:20 -05:00
Eddie Hebert	5f81acea05	ENH: Return -1 for missing spot prices. Return -1 when there is a zero value for a spot price. Intended for use by the incoming data portal changes. When the data portal will see a -1 value, the portal will seek back a trading day until a non-negative value is returned.	2015-11-25 11:32:36 -05:00
Eddie Hebert	53dae6320c	BUG: Fix volume value returned by daily spot price Volumes were incorrectly having the thousands factor applied, however the volume is written as is (without the factor, since it volume is an int, not float value.) Fix by adding a special case for volume which returns the price as is.	2015-11-25 10:19:52 -05:00
Scott Sanderson	43ac9eab5c	ENH: Check `getmtime` on download locations. Rather than repeatedly try and fail to download data that's not yet available, only try to download again if we haven't successfully downloaded in the last hour.	2015-11-13 18:06:04 -05:00
Scott Sanderson	1b7d0c9477	MAINT: Add __future__ print function import. We do print(stock) in this file, which happens to work in py2, but is confusing.	2015-11-13 18:06:04 -05:00
Scott Sanderson	01888918dd	MAINT: Use itemgetter instead of homegrown func.	2015-10-25 16:37:59 -04:00
Scott Sanderson	75f7c44223	BUG: Better check for last date. Use get_loc to find the trading day that ended 2 days before now.	2015-10-25 16:37:59 -04:00
Scott Sanderson	8fd18e5aa6	DOC: Comment on treasury division by 100.	2015-10-25 16:37:59 -04:00
Scott Sanderson	0710062e6a	DOC: Docstring edits.	2015-10-25 16:37:59 -04:00
Scott Sanderson	cabe22ae8e	ENH: Always use Adjusted Close for benchmarks. Previously we were using Close, and we calculated returns on the first day of a window against the Open for that day. We now always look back an extra day to get the previous day's close.	2015-10-25 16:37:59 -04:00
Scott Sanderson	df4cda4dc9	ENH: Remove defaults from get_benchmark_data.	2015-10-25 16:37:59 -04:00
Scott Sanderson	d82cfb1e64	MAINT: Final polish on loader rewrites. - Fixes an issue with the canadian treasury loader where it would never have enough data to not redownload because it can only download data in the last 10 years. - Uses module objects directly instead of lazy imports. - Adds lots of docstrings.	2015-10-25 16:37:59 -04:00
Scott Sanderson	71db6d3fdc	MAINT: Remove unused loader_utils file.	2015-10-25 16:37:59 -04:00
Scott Sanderson	24d26f9e63	MAINT: Rewrite the benchmark loader.	2015-10-25 16:37:59 -04:00
Scott Sanderson	948196d2de	MAINT: Remove unused loader_utils functions.	2015-10-25 16:37:59 -04:00
Scott Sanderson	c9e165aa2d	ENH: Rewrite Canadian treasury loader.	2015-10-25 16:37:59 -04:00
Scott Sanderson	8c38278783	ENH: Rewrite treasury loader using pandas. Replaces our custom XML parsing with a single call to `pd.read_csv` against the federal reserve's API. This produces nearly identical results as compared to the old loader, but it's dramatically simpler and roughly 10x faster on my machine. The average difference in magnitude between new and old is approximately 10e-7, and only one entry is different to a degree greater than the number of significant figures provided by treasury.gov. Additionally, the new loader correctly ignores Columbus Day of 2010, for which the old loader erroneously produced an all-NaN row. This also changes the interface that treasury modules modules are required to implement. Modules must now supply a `get_treasury_data` function that returns a `DataFrame` with a daily `DatetimeIndex` and a column for each supported treasury duration. Detailed comparison between results from new and old loader:: from zipline.data.treasuries import get_treasury_data new = get_treasury_data() # New implementation old = pd.read_csv( # Previously cached data '/home/ssanderson/.zipline/data/treasury_curves.csv' parse_dates=[0], index_col=0, ) # These columns were unused. del old['tid']; del old['date'] old = old.tz_localize('UTC') old.dropna(how='all') # old data erroneously contained an all-NaN entry for Columbus Day # in 2010. Remove before comparing. old = old.dropna(how='all') In [25]: len(new) == len(old) Out[25]: True In [26]: abs(old - new).max() Out[26]: 10year 2.000000e-04 1month 6.938894e-18 1year 1.000000e-04 20year 1.000000e-04 2year 2.000000e-04 30year 1.000000e-04 3month 1.000000e-03 3year 1.000000e-04 5year 1.387779e-17 6month 1.000000e-04 7year 1.000000e-04 dtype: float64 In [27]: abs(old - new).mean() Out[27]: 10year 3.097414e-08 1month 4.396534e-19 1year 1.548707e-08 20year 3.624502e-08 2year 4.646120e-08 30year 1.830496e-08 3month 1.549427e-07 3year 1.548707e-08 5year 1.702619e-18 6month 1.548707e-08 7year 1.548707e-08 dtype: float64 Since www.treasury.gov only reports values up to three significant digits, we should only care about differences of greater than 1e-3. There is exactly one such difference: the entry for the three month bond on 1999-10-01:: In [60]: new[(abs(new - old) >= 1e-3).any(axis=1)].T Out[60]: Time Period 1999-10-01 00:00:00+00:00 1month NaN 3month 0.0498 6month 0.0501 1year 0.0530 2year 0.0573 3year 0.0583 5year 0.0590 7year 0.0622 10year 0.0600 20year 0.0657 30year 0.0615 In [61]: old[(abs(new - old) >= 1e-3).any(axis=1)].T Out[61]: 1999-10-01 00:00:00+00:00 10year 0.0600 1month NaN 1year 0.0530 20year 0.0657 2year 0.0573 30year 0.0615 3month 0.0488 3year 0.0583 5year 0.0590 6month 0.0501 7year 0.0622 The US Treasury website (our old source) provides a value of 0.488 here, whereas the Federal Reserve site (our new source) provides a value of 0.498.	2015-10-25 16:37:59 -04:00
Scott Sanderson	3c954af08c	MAINT: Just do searchsorted with the date. Previously we were converting our date to a string, then calling `searchsorted` on the DatetimeIndex with the string, which would cause pandas to convert the string back into a date to actually do the lookup.	2015-10-25 16:37:59 -04:00
Scott Sanderson	854b6638b2	MAINT: Remove default values from dump_treasury_curves. We never call the function without passing them explicitly.	2015-10-25 16:37:59 -04:00
Eddie Hebert	8543b32468	Merge pull request #791 from quantopian/pipeline-effective-dates MAINT: Set dividend effective date to ex_date.	2015-10-21 16:44:07 -04:00
Eddie Hebert	55b25bdd3f	MAINT: Set dividend effective date to ex_date. The price shock occurs on the effective_date. Had changed the effective_date to be day before the ex_date with the belief that pipeline was applying values up and until the effective_date, but the lookback windows apply before the effective_date. Thus, the price shock calculation should still use the previous days data but be dated on the ex_date to stay aligned with splits and merger dating.	2015-10-21 16:43:13 -04:00
llllllllll	0183d0a914	ENH: Allows Float64Adjustments to act on a range of columns	2015-10-19 16:35:03 -04:00
Thomas Wiecki	659a367b09	STY Remove unused import of to_datetime.	2015-10-16 16:15:28 +02:00
Eddie Hebert	6b9476d346	BUG: Filter out payout rows with no prev close. When the prev_close is 0 or does not exist, the resulting ration was either +inf or nan, respectively. Create a mask on the non-zero effective dates, where effective date is only written when the prev close is sufficient for a valid ratio; and use that mask to filter out the bad rows. Also, use prev close as the effective date.	2015-10-15 13:30:05 -04:00
Eddie Hebert	9a2767ad07	Merge pull request #765 from quantopian/add-spot-price-and-write-adjustments Add spot price and write adjustments	2015-10-13 14:02:44 -04:00
Eddie Hebert	ccdc815526	ENH: Write dividend payouts to adjustments db. To prepare for querying for payouts from SQLite, write the dividend payouts to a new table `dividend_payouts`. Change the expected columns of the passed dividend frame to contain the payout data, and use that data to calculate the ratios (this moves internal code that was calcualting the ratios into Zipline.) The end result is that instead of just a `dividends` table with the backward looking adjustment ratios, also write a `dividend_payouts` table and a `stock_dividend_payout` table.	2015-10-13 14:02:26 -04:00
Eddie Hebert	752a2c3962	DOC: Fix comment typo. Reader/Writer	2015-10-10 07:20:36 -04:00
Eddie Hebert	5338c8e611	ENH: Add spot_price to BcolzDailyBarReader. Add new method to BcolzDailyBarReader, `spot_price` which returns the unadjusted price for the specified day and sid.	2015-10-10 07:19:03 -04:00
Scott Sanderson	23ca58813a	PERF: Speed up reading of adjustments. For a pipeline doing simple computations on USEquityPricing data, we were spending ~60% of `run_pipeline` loading adjustments. Almost all of that time was spent in calls to `DatetimeIndex.get_loc` to find the indices of adjustment `eff_date`s. This optimizes the eff_date lookups by pre-populating a cache of seconds-since-epoch timestamps that we expect to see, and falling back to `np.searchsorted` on cache misses. In testing, this reduces the time to compute a 1-year pipeline with 30 and 90 day moving averages from 3.1 seconds to 0.9 seconds.	2015-10-09 17:48:07 -04:00
Scott Sanderson	4a9cd76dab	MAINT: Remove unused constant.	2015-10-09 17:47:47 -04:00
Scott Sanderson	f06f4bdd25	MAINT: Remove unused import.	2015-10-09 17:47:18 -04:00

1 2 3

117 Commits