catalyst

mirror of https://github.com/wassname/catalyst.git synced 2026-06-30 07:35:55 +08:00

Author	SHA1	Message	Date
jkleint	82273e296f	Propagate exceptions in loader to prevent variable reference before use `data.loader.ensure_benchmark_data()` was trying to use data after an exception was raised loading it. The code was logging and swallowing exceptions; this re-raises.	2016-09-23 15:55:55 -07:00
Scott Sanderson	a8a2cc1582	PERF: Remove module-scope calendar creations. Remove module scope invocations of `get_calendar('NYSE')`, which cuts zipline import time in half on my machine. This make the zipline CLI noticeably more responsive, and it reduces memory consumed at import time from 130MB to 90MB. Before: $ time python -c 'import zipline' real 0m1.262s user 0m1.128s sys 0m0.120s After: $ time python -c 'import zipline' real 0m0.676s user 0m0.536s sys 0m0.132s	2016-09-06 09:57:23 -04:00
Jean Bredeche	6fb4923cc7	Re-implemented the Calendar API. Instead of having separate ExchangeCalendar and TradingSchedule objects, we now just have TradingCalendar. The TradingCalendar keeps track of each session (defined as a contiguous set of minutes between an open and a close). It's also responsible for handling the grouping logic of any given minute to its containing session, or the next/previous session if it's not a market minute for the given calendar.	2016-07-12 13:13:50 -04:00
jfkirk	26742dda67	MAINT: Removes obsolete tradingcalendar module	2016-06-08 13:34:19 -04:00
Joe Jevnik	0bd790d122	MAINT: replace usages of pandas.io.data.DataReader with pandas_datareader.data.DataReader	2016-05-20 11:23:28 -04:00
Joe Jevnik	59c8e371a2	ENH: Updates the cli, data bundles and extensions. Adds the data bundle concept which makes it easy for users to register loading functions to build out minute and daily data along with an assets db and adjustments db. By default we have provided a `quandl` bundle which pulls from the public domain WIKI dataset. Users may register new bundles by decorating an ingest function with `zipline.data.bundles.register(<name>)`. This also provides a `yahoo_equities` function for creating an ingestion function that will load a static set of assets from yahoo. The cli is now structured as a couple of subcommands and has been changed to `python -m zipline`. The old behavior of `run_algo.py` has been moved to the `run` subcommand. This is almost entirely the same except that it now takes the name of the data bundle to use, defaulting to `quandl`. The next subcommand is `ingest` which takes the name of a data bundle to ingest. This will run the loading machinery and write the data to a specified location that `run` can find. There is also a `clean` subcommand which deletes the data that was written with `ingest`. Extensions have also been added to zipline. This is an experimental feature where users can provide an extra set of python files to run at the start of the process. These can be used to configure aspects of zipline. Right now the only thing that is supported in an extension file is the registration of a new data bundle.	2016-05-03 18:38:24 -04:00
Eddie Hebert	37d341bb95	BUG: Truncate treasury curves to env min date.	2016-03-08 17:14:28 -05:00
Joe Jevnik	5eb453675d	BUG: don't fail if you cannot make a webrequest	2016-02-11 18:46:43 -05:00
Jeremiah Lowin	e44c0d42e1	Replace print with logger.info	2016-01-25 19:22:28 -05:00
dmichalowicz	4f24a32c45	BUG: Benchmark and treasury curves data missing on first download	2015-12-21 13:38:24 -05:00
Scott Sanderson	43ac9eab5c	ENH: Check `getmtime` on download locations. Rather than repeatedly try and fail to download data that's not yet available, only try to download again if we haven't successfully downloaded in the last hour.	2015-11-13 18:06:04 -05:00
Scott Sanderson	1b7d0c9477	MAINT: Add __future__ print function import. We do print(stock) in this file, which happens to work in py2, but is confusing.	2015-11-13 18:06:04 -05:00
Scott Sanderson	75f7c44223	BUG: Better check for last date. Use get_loc to find the trading day that ended 2 days before now.	2015-10-25 16:37:59 -04:00
Scott Sanderson	cabe22ae8e	ENH: Always use Adjusted Close for benchmarks. Previously we were using Close, and we calculated returns on the first day of a window against the Open for that day. We now always look back an extra day to get the previous day's close.	2015-10-25 16:37:59 -04:00
Scott Sanderson	d82cfb1e64	MAINT: Final polish on loader rewrites. - Fixes an issue with the canadian treasury loader where it would never have enough data to not redownload because it can only download data in the last 10 years. - Uses module objects directly instead of lazy imports. - Adds lots of docstrings.	2015-10-25 16:37:59 -04:00
Scott Sanderson	24d26f9e63	MAINT: Rewrite the benchmark loader.	2015-10-25 16:37:59 -04:00
Scott Sanderson	8c38278783	ENH: Rewrite treasury loader using pandas. Replaces our custom XML parsing with a single call to `pd.read_csv` against the federal reserve's API. This produces nearly identical results as compared to the old loader, but it's dramatically simpler and roughly 10x faster on my machine. The average difference in magnitude between new and old is approximately 10e-7, and only one entry is different to a degree greater than the number of significant figures provided by treasury.gov. Additionally, the new loader correctly ignores Columbus Day of 2010, for which the old loader erroneously produced an all-NaN row. This also changes the interface that treasury modules modules are required to implement. Modules must now supply a `get_treasury_data` function that returns a `DataFrame` with a daily `DatetimeIndex` and a column for each supported treasury duration. Detailed comparison between results from new and old loader:: from zipline.data.treasuries import get_treasury_data new = get_treasury_data() # New implementation old = pd.read_csv( # Previously cached data '/home/ssanderson/.zipline/data/treasury_curves.csv' parse_dates=[0], index_col=0, ) # These columns were unused. del old['tid']; del old['date'] old = old.tz_localize('UTC') old.dropna(how='all') # old data erroneously contained an all-NaN entry for Columbus Day # in 2010. Remove before comparing. old = old.dropna(how='all') In [25]: len(new) == len(old) Out[25]: True In [26]: abs(old - new).max() Out[26]: 10year 2.000000e-04 1month 6.938894e-18 1year 1.000000e-04 20year 1.000000e-04 2year 2.000000e-04 30year 1.000000e-04 3month 1.000000e-03 3year 1.000000e-04 5year 1.387779e-17 6month 1.000000e-04 7year 1.000000e-04 dtype: float64 In [27]: abs(old - new).mean() Out[27]: 10year 3.097414e-08 1month 4.396534e-19 1year 1.548707e-08 20year 3.624502e-08 2year 4.646120e-08 30year 1.830496e-08 3month 1.549427e-07 3year 1.548707e-08 5year 1.702619e-18 6month 1.548707e-08 7year 1.548707e-08 dtype: float64 Since www.treasury.gov only reports values up to three significant digits, we should only care about differences of greater than 1e-3. There is exactly one such difference: the entry for the three month bond on 1999-10-01:: In [60]: new[(abs(new - old) >= 1e-3).any(axis=1)].T Out[60]: Time Period 1999-10-01 00:00:00+00:00 1month NaN 3month 0.0498 6month 0.0501 1year 0.0530 2year 0.0573 3year 0.0583 5year 0.0590 7year 0.0622 10year 0.0600 20year 0.0657 30year 0.0615 In [61]: old[(abs(new - old) >= 1e-3).any(axis=1)].T Out[61]: 1999-10-01 00:00:00+00:00 10year 0.0600 1month NaN 1year 0.0530 20year 0.0657 2year 0.0573 30year 0.0615 3month 0.0488 3year 0.0583 5year 0.0590 6month 0.0501 7year 0.0622 The US Treasury website (our old source) provides a value of 0.488 here, whereas the Federal Reserve site (our new source) provides a value of 0.498.	2015-10-25 16:37:59 -04:00
Scott Sanderson	3c954af08c	MAINT: Just do searchsorted with the date. Previously we were converting our date to a string, then calling `searchsorted` on the DatetimeIndex with the string, which would cause pandas to convert the string back into a date to actually do the lookup.	2015-10-25 16:37:59 -04:00
Scott Sanderson	854b6638b2	MAINT: Remove default values from dump_treasury_curves. We never call the function without passing them explicitly.	2015-10-25 16:37:59 -04:00
Scott Sanderson	ef4f642e62	ENH: Compute engine architecture for FFC API. This patch lays the groundwork for a compute engine designed to facilitate construction of factor-based universe screening and portfolio allocation. It contains: A new module, `zipline.modelling`, containing entities that can be used to express computations as dependency graphs. Each node in such a graph is an instance of the base `Term` class, defined in `zipline.modelling.term`. Dependency graphs are executed by instances of `FFCEngine`, defined in `zipline.modelling.engine`. A new module, `zipline.data.ffc`, containing loaders and dataset definitions for inputs to the modelling API. New `TradingAlgorithm` api methods: `add_factor`, and `add_filter`. These methods can only be called from `initialize`, and are used to inform the algorithm that each day it should compute the given terms. Computed factor results are made available through a new attribute of the `data` object in `before_trading_start` and `handle_data`. Computed filter results control which assets are available in the factor matrix on each day.	2015-07-29 12:30:46 -04:00
jfkirk	b84ac01cbf	ENH: Adds futures trading and asset management logic to TradingAlgorithm and performance classes	2015-06-11 11:35:49 -04:00
Eddie Hebert	0fa44471be	MAINT: Change expected type of treasury curves from load to DataFrame. Instead of converting the curves back and forth from dictionaries to DataFrame and back, use the DataFrame format when passing to environment.	2015-04-20 10:26:09 -04:00
Benjamin Berman	ef598c7130	BUG: Handle a ValueError on from_csv calls The cached market data could be corrupted. Pandas raises a ValueError in that case, and this error handles it.	2015-04-14 12:40:37 -04:00
Scott Sanderson	885db87dea	MAINT: Use logger instead of printing in loader.py Makes it easier to filter logs when they're not desired.	2015-04-14 12:40:37 -04:00
warren-oneill	b62fadc76f	adding NYSE trading_day and trading_days as default in load_market_data()	2015-04-08 16:57:23 -04:00
warren-oneill	aa872afdf4	adding updates from master	2015-04-08 16:57:12 -04:00
warren-oneill	49c168b3d0	adding trading_day and trading_days as variables to load_market_data	2015-04-08 16:56:13 -04:00
Jonathan Kamens	e942275108	STY: Flake8 Upgrade the version of the flake8, pep8, and mccabe PyPI packages, and make the code changes necessary for compatibility with the updated packages.	2015-03-19 17:21:25 -04:00
Jonathan Kamens	c46a3afa3c	BUG: Don't download benchmarks / treasury curves unnecessary Fix an off-by-one error which was causing us to download the benchmark and treasury curves over and over again even when they weren't needed.	2015-03-08 09:31:50 -04:00
Luke Schiefelbein	1542b41fbd	BUG: Fix price caching for tickers with '/' char On Ubuntu (assume this is true for all posix) tickers containing a slash char ("CRD/A", "BRK/A", both valid tickers with yahoo api accessible timeseries) lead to a path error in loader.py line 286.	2014-11-19 11:26:27 +01:00
Thomas Wiecki	820115f7be	MAINT: Replace iterkv with iteritems. iterkv is being deprecated as of pandas 0.14.	2014-10-22 17:25:37 +02:00
twiecki	4bdecd6402	STY: PEP8 fixes.	2014-03-26 20:46:20 +09:00
Eddie Hebert	71cda461c5	BUG: Fix check for cached public data for Python 2.7 Python 2 and 3 throw different exception types when a file does not exist. Catch both exception types to trigger the download, so that the loader works under both Python versions.	2014-01-07 17:19:16 -05:00
Eddie Hebert	46ab748dd2	MAINT: Use pandas for data cache file I/O The compatibility between the two versions was made easier by letting pandas handle the heavy lifting, so pass filenames to the pandas serialization methods, instead of dealing doing the file handling and reading/writing within the data module.	2014-01-07 12:01:08 -05:00
Eddie Hebert	b4959e46cf	MAINT: Use six for Python 3 compatible names and behavior. Use the six module to import functions and types that are consistent between Python 2 and 3, so that one code base can support both versions. - Use integer types instead of int and long. - Use string_types instead of basestring. - Account for iteritems, itervalues, iterkeys. - Use six.moves for filter and zip, reduce - Use compatible bytes for md5 hasher. - xrange and range	2014-01-07 11:33:50 -05:00
Eddie Hebert	54ddd1c109	MAINT: print function clean up in preparation for Python 3 - Use `print()` function for all print calls - Fix strip and format calls that were on the outside of the print function for some reason. (Which were breaking in Python 3 because of print returning None.) - Remove commented out print calls.	2014-01-04 20:55:43 -05:00
David Stephens	e45528458f	ENH: Added functionality to download Canadian treasury curves. Added automatic switching of treasury curves based on index sent to environment.	2013-12-27 13:27:43 -05:00
Eddie Hebert	50800a9863	BUG: Fix data cache filepath on Windows. Prevent the ':' char, generated by converting a datetime to a string, from creating on incompatible filepath for Windows.	2013-11-18 20:37:45 -05:00
Eddie Hebert	43b85cffb0	MAINT: Calculate tradingcalendar with days beyond the current day. To make 'next open' calculations more straight ahead, calculate more than enough days in the trading calendar.	2013-11-11 15:48:44 -05:00
Eddie Hebert	797cb8ece3	BUG: Fix bad reference to benchmark timezone in loader.	2013-11-11 14:39:11 -05:00
Eddie Hebert	89793e371c	MAINT: Protect loader against Series saved with no tz. Checking for tz.UTC is not sufficient, since it is possible for the index.tz value to be None.	2013-11-11 14:17:14 -05:00
Eddie Hebert	c45c1a22e1	BUG: Only localize benchmark index if it is naive. Check for whether or not the index's timezone is UTC or not before attempting to localize, since an already localized index throws an error when tz_localize is called.	2013-10-29 13:17:58 -04:00
Eddie Hebert	2d64ab8bfe	BUG: Fix naive timestamps in benchmarks. Always convert the benchmarks to UTC, not just on reload.	2013-10-29 08:36:53 -04:00
Eddie Hebert	37c56b9aa4	MAINT: Use Series throughout for daily returns. Remove the lists of DailyReturn objects in favor of using pd.Series to store the return values. Should make it easier to inspect the values when stepping through, make the windowing of data to a certain range more facile by using, and have some performance increases due to removing object creation and member access.	2013-10-19 23:06:18 -04:00
Eddie Hebert	71f03e9537	BUG: Ensure loading benchmarks include latest dates. The Series `.append` does not update in-place, assign the value to `saved_benchmarks` so that we update the newest benchmarks.	2013-10-07 12:17:26 -04:00
Eddie Hebert	6ac5d49573	MAINT: Remove duplicated treasury loading code. The dump and update of curves were both using the entire history. So instead of having the update use a different code path, always use dump and overwrite.	2013-10-02 11:10:15 -04:00
Eddie Hebert	5ddc134379	ENH: Cache daily data to eliminate repeat network calls. Both unit tests and repeated runs while developing an algorithm can benefit from having a local copy of the Yahoo data, instead of doing a network call each time. Store the web request results as a csv file in a cache directory, named by symbol and date range.	2013-10-01 15:04:02 -04:00
Eddie Hebert	b44fc20e4e	MAINT: Remove msgpack as a dependency. Now that the data serialization uses pandas, msgpack is no longer needed.	2013-10-01 14:28:11 -04:00
Thomas Wiecki	a66f45b598	MAINT: Moving yahoo loader from factory to utils.	2013-10-01 14:09:26 -04:00
Eddie Hebert	b65f7f42c0	BUG: Fix updating treasury curves. A transpose back to the serialization shape was left out. Also, fixes empty return from update.	2013-10-01 11:57:04 -04:00

1 2

65 Commits