catalyst

mirror of https://github.com/wassname/catalyst.git synced 2026-06-28 09:59:22 +08:00

Author	SHA1	Message	Date
Andrew Daniels	37e6a48e99	ENH: Pass calendar instance to BcolzMinuteBarWriter (#1406 ) * First pass. * Improvements and fixes - Update usages of BcolzMinuteBarWriter - Updates with rebuilt example data - Expose calendar from BcolzMinuteBarMetadata instead of calendar_name - Keep market_opens and market_closes in metadata for compatibility * Store start_session and end_session in minute bcolz metadata - start_session replaces first_trading_day - Add end_session to limit to correct days * For last_available_dt, get last close from calendar to maintain tz * Bumps version and handles earlier versionson read * Rebuilt example data on python 3 * Indicate metadata fields that are deprecated	2016-08-18 15:41:26 -04:00
Eddie Hebert	4a017ef63b	ENH: Session bar reader resampled from minute data Implement a `SessionBarReader` which uses a minute bar reader as a backing source, resampling the minute bars into the box around the corresponding session data. Also, add future/CME test cases to resample suite.	2016-08-18 11:37:42 -04:00
Richard Frank	fcf1067071	BUG: Fixes should_clean for keep_last=0	2016-08-17 18:18:01 -04:00
Andrew Daniels	440806ad60	MAINT: Use TradingCalendar objects for bundles (#1397 ) * MAINT: Use TradingCalendar objects for bundles Instead of trading days, opens, and closes, register now takes a TradingCalendar object, along with a start_session and end_session. The ingest function is now passed these values instead as well. * Accept calendar name in addition to the actual object * Updates bundles documentation for changes * Fix typo in docs * Use class formatting * Force start_session and end_session within the bounds of the calendar * Use UTC timestamps in test_core * Document Trading Calendar API in appendix.rst	2016-08-17 13:37:07 -04:00
Eddie Hebert	d1f7a819fc	TST: Share resample test cases. Also, move `DailyHistoryAggregator` to `resample` module, so that tools for converting from minute to session bars are collocated. This patch is in preparation of adding a daily bar reader which resamples minute data, which will be located in the `resample` module and share the test cases and expected results in `test_resample`.	2016-08-16 15:44:32 -04:00
Eddie Hebert	e934c6aeaf	TST: Make room for multiple calendars in tests. When adding fixtures for futures data, there will be a need for multiple calendars in the fixture ecosystem. e.g. a test that includes both equities and futures would need an overall calendar which encompasses both equities and futures; however, the test data for equities should still still be limited to the bounds set by the NYSE calendar. Make the fixtures that setup trading calendars and values dervied from the trading calendar (e.g. trading sessions) accept an iterable of calendars which need to be created, then populate those values into a dict keyed by the calendar name. Change `WithNYSETradingDays` to include sessions in the name, since we are moving to session as the name for the 'day' unit. Provide `trading_days` which is really "NYSE trading sessions` on `WithTradingSessions` for backwards compatibility.	2016-08-05 12:17:27 -04:00
Joe Jevnik	4265a13edf	Revert "Merge pull request #1354 from quantopian/revert-1302-point-in-time-asset-db" This reverts commit `3b633011c6`, reversing changes made to `70ac5323de`.	2016-08-02 14:25:10 -04:00
Joe Jevnik	814a2be7b7	Revert "Point in time asset db"	2016-07-27 23:29:08 -04:00
Joe Jevnik	7fd8c29880	ENH: add point in time aspect to equity symbol mapping Changes the overlap behavior so that it is an error to write data which would have two companies holding the same ticker. Other than one test around which company would win in that case, all the other tests are passing. That single test has been changed to check the write-time error.	2016-07-26 13:34:58 -04:00
Joe Jevnik	ec0ecfc1b9	TST: don't use showprogress in tests	2016-07-25 13:09:55 -04:00
Joe Jevnik	e8728c0cd4	TST: fix data tests	2016-07-25 13:09:55 -04:00
Joe Jevnik	ef4eafbbb8	TST: fix bundle test discovery	2016-07-25 13:09:55 -04:00
Jean Bredeche	5a0f840917	Clean up daily bar reader/writer to take advantage of new trading calendar. The reader is backwards-compatible with the previous format. In USEquityLoader, use dailyreader's trading_calendar. This is backwards compatible and will fall back to the NYSE calendar if the reader doesn’t have a calendar specified.	2016-07-15 15:13:57 -04:00
Jean Bredeche	6fb4923cc7	Re-implemented the Calendar API. Instead of having separate ExchangeCalendar and TradingSchedule objects, we now just have TradingCalendar. The TradingCalendar keeps track of each session (defined as a contiguous set of minutes between an open and a close). It's also responsible for handling the grouping logic of any given minute to its containing session, or the next/previous session if it's not a market minute for the given calendar.	2016-07-12 13:13:50 -04:00
Eddie Hebert	51eda06323	MAINT: Add equity to naming of bar data classes. In preparation of adding futures, add equity to the names of both the classes and methods for writing bcolz data. Futures data will use a different minutes per day with a separate reader. This change will allow both equity and futures fixtures to be side by side. Also, break out the method which generates the dataframes and trading days member into fixtures (`EquityMinuteBarData` and `EquityDailyBarData`) on which the `*BarReader` fixture depends. This fixture is separated out to enable reader/writers in different formats to use the same data setup. (There is internal code which needs to write minute and daily bar data in a database format.)	2016-06-30 08:21:42 -04:00
Andrew Daniels	60cd4aab91	Use new API in tests/data/bundles/test_core.py	2016-06-08 16:24:36 -04:00
jfkirk	2a8f69fc01	MAINT: DataPortal env -> asset_finder	2016-06-08 13:34:22 -04:00
jfkirk	0a6ad9ac9e	STY: Your flake is on fleek	2016-06-08 13:34:22 -04:00
jfkirk	581e817603	MAINT: Rebase reconciliation	2016-06-08 13:34:22 -04:00
jfkirk	10a118d94c	MAINT: Removes references to tradingcalendar	2016-06-08 13:34:20 -04:00
jfkirk	75e0e4723d	TST: Refactors more tests to use WithTradingSchedule	2016-06-08 13:34:20 -04:00
jfkirk	241abda2a5	STY: Flake8	2016-06-08 13:34:19 -04:00
jfkirk	c8304e8601	ENH: Adds ExchangeCalendar, TradingSchedule, and implementations Conflicts: tests/data/test_minute_bars.py tests/data/test_us_equity_pricing.py tests/finance/test_slippage.py tests/pipeline/test_engine.py tests/pipeline/test_us_equity_pricing_loader.py tests/serialization_cases.py tests/test_algorithm.py tests/test_assets.py tests/test_bar_data.py tests/test_benchmark.py tests/test_exception_handling.py tests/test_fetcher.py tests/test_finance.py tests/test_history.py tests/test_perf_tracking.py tests/test_security_list.py tests/utils/test_events.py zipline/algorithm.py zipline/data/data_portal.py zipline/data/us_equity_loader.py zipline/errors.py zipline/finance/trading.py zipline/testing/core.py zipline/utils/events.py	2016-06-08 13:34:18 -04:00
Andrew Daniels	8e6c98e9aa	BUG: Fixes reading and writing of daily bars first_trading_day attr When writing first_trading_day, it is already in the correct frame of reference (seconds since epoch) and does not need to be transformed further. Adjusts the reader to expect this value.	2016-06-02 13:41:09 -04:00
Stewart Douglas	17c20da026	TST: Add test to append minutely data	2016-05-26 09:38:25 -04:00
Andrew Daniels	f1cfe1f2db	BUG: Fixes bcolz padding to not always pad 390 minutes If minutes already exist for the last existing day, adjust the number of minutes padded to account for them. Previously we would always pad 390, leading to a mismatch in the number of rows.	2016-05-25 14:26:16 -04:00
Stewart Douglas	8217cdb1bd	ENH: Allow BcolzMinuteBarWriter to append to most recent day Minutely data can now be appended to bcolz files even when minutes in the same day have already been written. For example, previously attempting to write data for the minute 2016-05-11 16:30 would raise an exception if any OHLCV data for 2016-05-11 had been written to the same file. Trying to overwrite existing minutes still raises a BcolzMinuteOverlappingData exception. Note that previously all sids' bcolz files ended at the same time. This is no longer necessarily the case. The last record in each sid's bcolz file now corresponds to the latest minute for which OHLCV data is provided to the writer.	2016-05-13 16:24:21 -04:00
Joe Jevnik	55f1548160	BUG: fix inverted splits in quandl data	2016-05-09 14:00:35 -04:00
Joe Jevnik	d819721d96	ENH: use more human readable format for bundle ingest directories We are now using isoformats with ':' replaced with ';'. We cannot use a normal isoformat because windows does not allow files or directories with ':' in the name.	2016-05-05 18:22:13 -04:00
Joe Jevnik	89542e33bd	ENH: Adds quantopian-quandl bundle as new default. This data bundle will use the quantopian mirror of the quandl WIKI data instead of downloading from quandl directly. This dramatically improves the speed because we do not pay the rate limiting for quandl and we can send the data in the format zipline expects.	2016-05-05 18:22:13 -04:00
Joe Jevnik	59c8e371a2	ENH: Updates the cli, data bundles and extensions. Adds the data bundle concept which makes it easy for users to register loading functions to build out minute and daily data along with an assets db and adjustments db. By default we have provided a `quandl` bundle which pulls from the public domain WIKI dataset. Users may register new bundles by decorating an ingest function with `zipline.data.bundles.register(<name>)`. This also provides a `yahoo_equities` function for creating an ingestion function that will load a static set of assets from yahoo. The cli is now structured as a couple of subcommands and has been changed to `python -m zipline`. The old behavior of `run_algo.py` has been moved to the `run` subcommand. This is almost entirely the same except that it now takes the name of the data bundle to use, defaulting to `quandl`. The next subcommand is `ingest` which takes the name of a data bundle to ingest. This will run the loading machinery and write the data to a specified location that `run` can find. There is also a `clean` subcommand which deletes the data that was written with `ingest`. Extensions have also been added to zipline. This is an experimental feature where users can provide an extra set of python files to run at the start of the process. These can be used to configure aspects of zipline. Right now the only thing that is supported in an extension file is the registration of a new data bundle.	2016-05-03 18:38:24 -04:00
Joe Jevnik	efac476976	ENH: make BcolzMinuteBarWriter.write take iterable Updates the BcolzMinuteBarWriter.write api to allow users to pass their data as a stream instead of requiring that they loop over their data externally. This matches the API presented by BcolzDailyBarWriter.	2016-04-29 16:14:48 -04:00
Eddie Hebert	66d05aa563	PERF: Improve read time for smaller num of assets. The BcolzDailyBarReader was optimized for the pipeline case of reading all assets at once. Now that the reader is also used to support daily history the case of reading a data for a small number of assets is more common, particularly in algorithms that use the history API which have a high rotation of assets (e.g. an algorithm which pipeline uses to set the active universe) Remove the bottleneck in reading a small number of assets by conditionally reading the slice for each asset from the carray, instead of reading the data for all equities and then indexing into that full array. On a certain number of assets, it is still better to read all the data at once. On the Quantopian dataset, which holds data for 20000 about for the last 10 years of equity data (where not all equities trade over the full range), stored in 118 blosc blp files per column, the tipping point where the 'read all' mode wins out between 3000-4000 assets. That number was tested by trying to exercise a worst case scenario where the equities were spread out evenly across the blp files, by stepping along a sorted list of assets that were alive over a query range which spanned 70 trading days. ``` size = 3000 sids = [assets[i] for i in range(0, len(assets), len(assets) / size)][:size] ``` Also, add parameter to WithBcolzDailyBarReader fixture which allows the test to specify what the threshold count for reading all data should be, so that the test_us_equity_pricing can be forced into either mode to make sure that both branches in logic are covered by all test cases. On local dev machine this patch improves the read time of `load_raw_array` for one asset from 100 ms to 96.5 µs. (10^5 improvement.) With reading only asset per call a being an observed common case when populating the non-cached values in USEquityHistoryLoader.	2016-04-21 20:43:52 -04:00
Joe Jevnik	bc0b117dc9	MAINT: make the data loading apis more consistent. Changes BcolzDailyBarWriter to not be an abc, data is passed as an iterator of (sid, dataframe) pairs to the write method. Changes the AssetsDBWriter to be a single class which accepts an engine at construction time and has a `write` method for writing dataframes for the various tables. We no longer support writing the various other data types, callers should coerce their data into a dataframe themselves. See zipline.assets.synthetic for some helpers to do this. Adds many new fixtures and updates some existing fixtures to use the new ones: WithDefaultDateBounds A fixture that provides the suite a START_DATE and END_DATE. This is meant to make it easy for other fixtures to synchronize their date ranges without depending on eachother in strange ways. For example, WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should both have data for the same dates, so they may use depend on WithDefaultDates without forcing a dependency between them. WithTmpDir, WithInstanceTmpDir Provides the suite or individual test case a temporary directory. WithBcolzDailyBarReader Provides the suite a BcolzDailyBarReader which reads from bcolz data written to a temporary directory. The data will be read from dataframes and then converted to bcolz files with BcolzDailyBarWriter.write WithBcolzDailyBarReaderFromCSVs Provides the suite a BcolzDailyBarReader which reads from bcolz data written to a temporary directory. The data will be read from a collection of CSV files and then converted into the bcolz data through BcolzDailyBarWriter.write_csvs WithBcolzMinuteBarReader Provides the suite a BcolzMinuteBarReader which reads from bcolz data written to a temporary directory. The data will be read from dataframes and then converted to bcolz files with BcolzMinuteBarWriter.write WithAdjustmentReader Provides the suite a SQLiteAdjustmentReader which reads from an in memory sqlite database. The data will be read from dataframes and then converted into sqlite with SQLiteAdjustmentWriter.write WithDataPortal Provides each test case a DataPortal object with data from temporary resources.	2016-04-15 23:46:10 -04:00
Eddie Hebert	d27f85e16b	BUG: Enforce sorted order on minutes to delete. The intervals are returned as a set, so order is not guaranteed, which becomes exposed when reading windows which span multiple years. The deletion of values from the regular sized minute array assumes that intervals can be reversed to delete the array from the back.	2016-04-12 14:16:10 -04:00
Eddie Hebert	0a3a2f3653	BUG: Ensure matched input length to minute writer. When the dts and length of cols are mismatched the writer behaves in unintended ways. e.g. in a case where a consumer passed dts which had minutes with no trades removed, but regular (market minute for day) sized arrays for the data with `0`'s on minutes without trades, the non trade minutes from cols are written to slots in the output where a trade is intended. Protect against this misuse by checking that all lengths are equal when using the `write_cols` method. Make a separate `_write_cols` method for use by both `write_cols` and `write`, since the `write` method which takes a DataFrame has the matched input length enforced by the DataFrame.	2016-04-07 13:53:59 -04:00
Eddie Hebert	16fd6681a6	ENH: Rewrite of Zipline to use lazy access pattern More documentation to follow in release notes. Based on lazy-mainline branch, see for more details. Also-By: Jean Bredeche <jean@quantopian.com> Also-By: Andrew Liang <aliang@quantopian.com> Also-By: Abhijeet Kalyan <akalyan@quantopian.com>	2016-04-04 16:12:58 -04:00
Eddie Hebert	be08a77d76	BUG: Prevent writing int max instead of nan. np.array.astype can not be relied upon to convert nan's reliably to 0 Fix by calling nan_to_num on the float arrays before converting to uint32.	2016-03-30 14:35:06 -04:00
Eddie Hebert	75213ac176	MAINT: Write open and closes for minute bar format Write arrays representing corresponding market opens and market closes, which will eventually replace the `minute_index` field. The market closes are being added for incoming work on another branch which will use the market closes to generate a list of non-market minutes to filter out when returning data from `unadjusted_window`.	2016-03-24 23:18:42 -04:00
Eddie Hebert	0f14972e08	ENH: Unadjusted window data for minute bars. Add a method to minute bar reader which returns the OHLCV for all requested fields for a list assets over the specified start and end minutes. Initial usage is intended for use by a loader which consumes minute bar data to resample into daily bars, but may also be used when aggregating minute data during '1d' history calls in Q2.0. This iteration does not include including of early closes.	2016-03-14 21:52:01 -04:00
Joe Jevnik	721dd36116	TST: move test_utils and adds test fixture classes Renames zipline.utils.test_utils to zipline.testing Adds zipline.testing.fixtures.ZiplineTestCase to manage setup and teardown and adds mixins to define fixtures like an asset finder or trading calendar.	2016-03-10 15:39:52 -05:00
Eddie Hebert	27f94f83fa	ENH: Allow passing of numpy arrays to writer. For faster parsing and writing workflows, do not require a DataFrame.	2016-02-02 14:03:42 -05:00
Eddie Hebert	488721e805	ENH: Add padding method to minute bars writer. So that consumers can write empty days worth of data, without needing to construct a DataFrame with zero data force a write. The internal loader uses `last_date_in_output_for_sid` to signify that data has been attempted to be retrieved for all dates up until that, so that when resuming a job those retrieval of data for those dates are not re-attempted. Also, used to make the write logic cleaneer, by making it only necessary to create an array large enough for the given df.	2016-02-01 14:18:22 -05:00
Eddie Hebert	984e934e83	BUG: Fix OSError when creating sids that share dir Fix a bug where creating a sid bcolz file when the containing directory was already occupied by a sid caused an OSError on attempt of creating the directory because it already existed. e.g. if there were two sids, `1` and `2`. The paths would be `00/00/000001.bcolz` and `00/00/000002.bcolz` which share the same directory `00/00`. Fixed by checking for directory existence before calling `makedirs`. Add test coverage which exercises writing of sids that are siblings in the sid directory structure.	2016-01-25 10:37:50 -05:00
Eddie Hebert	d5c3b5a15c	ENH: Add writer for minute bcolz format. Implement a writer for minute data into a format comprised of multiple ctables, one for each individual asset, with a common 'index' shared by all ctables where a given a dt maps to the same array index for all equities and fields. This format is pulled from the lazy-mainline/Q2.0 branch, with some changes to the interface. Add basic retrieval of values at a given dt to reader. Not yet used by Zipline simulations, but added to support unit tests. Also, rename stubbed out us_equity_minutes to minute_bars, since the writer can be agnostic to asset type.	2016-01-21 10:54:27 -05:00
Eddie Hebert	5f81acea05	ENH: Return -1 for missing spot prices. Return -1 when there is a zero value for a spot price. Intended for use by the incoming data portal changes. When the data portal will see a -1 value, the portal will seek back a trading day until a non-negative value is returned.	2015-11-25 11:32:36 -05:00
Eddie Hebert	53dae6320c	BUG: Fix volume value returned by daily spot price Volumes were incorrectly having the thousands factor applied, however the volume is written as is (without the factor, since it volume is an int, not float value.) Fix by adding a special case for volume which returns the price as is.	2015-11-25 10:19:52 -05:00
Eddie Hebert	5338c8e611	ENH: Add spot_price to BcolzDailyBarReader. Add new method to BcolzDailyBarReader, `spot_price` which returns the unadjusted price for the specified day and sid.	2015-10-10 07:19:03 -04:00
Eddie Hebert	e33f6dcdcd	MAINT: Move equity data formats out of loader. Put the logic for reading and writing the equity price and adjustment data into a module located in data, making it distinct from the pipeline loader usage of the formats. This prepares for both incoming changes of how adjustments are written, (which includes using the bcolz daily reader as an input), as well as eventually providing the readers to a DataPortal object.	2015-10-09 17:20:19 -04:00

49 Commits