catalyst

mirror of https://github.com/wassname/catalyst.git synced 2026-06-29 18:07:38 +08:00

Author	SHA1	Message	Date
Scott Sanderson	a29da32252	TEST: Don't assert particular numpy error. They change from version to version.	2016-05-04 19:40:50 -04:00
Scott Sanderson	b78501e54a	BUG: Fix broken isnull() on string classifiers. Adds a special case in NullFilter to handle LabelArrays correctly.	2016-05-04 17:26:27 -04:00
Scott Sanderson	5a1ed7b1d3	ENH: Make element_of work for ints too.	2016-05-04 16:31:58 -04:00
Scott Sanderson	0922714bac	DOC: Clarify test docstrings.	2016-05-04 15:54:51 -04:00
Scott Sanderson	4d42cddae4	ENH: Fail fast on outputs in CustomClassifier. We don't support multiple outputs for CustomClassifier because we use LabelArrays for string classifiers.	2016-05-04 15:54:50 -04:00
Scott Sanderson	620d7648b0	BUG: Tests/bugfixes for LabelArray slicing. - Fixes a bug where __setitem__ was not called when setting with a slice on Python 2 (__setslice__ was called instead), which caused strange behavior when setting an empty string. This is fixed by overriding __setslice__ and forwarding to __setitem__. - Fixes a bug where __getitem__ returned an instance of np.void when returning a scalar. We now correctly return an entry from our categoricals.	2016-05-04 15:54:50 -04:00
Scott Sanderson	8de45540f2	ENH: NaN semantics for LabelArray missing values.	2016-05-04 15:54:50 -04:00
Scott Sanderson	2395cbb671	ENH: Use np.void for labelarray storage. This disables most broken ufuncs	2016-05-04 15:54:50 -04:00
Scott Sanderson	7a65121e6e	BUG: contains was renamed to has_substring	2016-05-04 15:54:50 -04:00
Scott Sanderson	c40bbfae03	TEST: More tests for string predicates.	2016-05-04 15:54:50 -04:00
Scott Sanderson	bb6f908036	TEST: Add test for categorical postprocessing.	2016-05-04 15:54:50 -04:00
Scott Sanderson	5f190395ad	ENH: Add support for strings in Pipeline. - Adds a new class, ``LabelArray``, which is a subclass of np.ndarray. LabelArray is conceptually similar to pandas.Categorical, in that it stores data with many duplicate values as indices into an array of unique values. For string data with many duplicates (e.g. time-series of tickers or or industry classifications), this provides multiple orders of magnitude of improvement when doing string operations, especially string comparison/matching operations. - Adds a new generic object "specialization" for `AdjustedArrayWindow`, and a corresponding ObjectOverwrite adjustment. - Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``. This method is called on the final result of any pipeline expression after screen filtering has occurred. The default implementation of ``postprocess`` is identity, but Classifier overrides it to coerce string columns into pandas.Categoricals before presenting them to the user.	2016-05-04 15:50:52 -04:00
Joe Jevnik	f3e436a1bf	Merge pull request #1173 from quantopian/quandl-wiki-loader Quandl wiki loader	2016-05-03 19:11:18 -04:00
Joe Jevnik	59c8e371a2	ENH: Updates the cli, data bundles and extensions. Adds the data bundle concept which makes it easy for users to register loading functions to build out minute and daily data along with an assets db and adjustments db. By default we have provided a `quandl` bundle which pulls from the public domain WIKI dataset. Users may register new bundles by decorating an ingest function with `zipline.data.bundles.register(<name>)`. This also provides a `yahoo_equities` function for creating an ingestion function that will load a static set of assets from yahoo. The cli is now structured as a couple of subcommands and has been changed to `python -m zipline`. The old behavior of `run_algo.py` has been moved to the `run` subcommand. This is almost entirely the same except that it now takes the name of the data bundle to use, defaulting to `quandl`. The next subcommand is `ingest` which takes the name of a data bundle to ingest. This will run the loading machinery and write the data to a specified location that `run` can find. There is also a `clean` subcommand which deletes the data that was written with `ingest`. Extensions have also been added to zipline. This is an experimental feature where users can provide an extra set of python files to run at the start of the process. These can be used to configure aspects of zipline. Right now the only thing that is supported in an extension file is the registration of a new data bundle.	2016-05-03 18:38:24 -04:00
Andrew Liang	fb6bda5840	FIX: Error message for BenchmarkAssetNotAvailableTooLate is wrong Should be '...does not exist on self.trading_days[-1]...' not self.trading_days[0]	2016-05-02 12:00:35 -04:00
Joe Jevnik	efac476976	ENH: make BcolzMinuteBarWriter.write take iterable Updates the BcolzMinuteBarWriter.write api to allow users to pass their data as a stream instead of requiring that they loop over their data externally. This matches the API presented by BcolzDailyBarWriter.	2016-04-29 16:14:48 -04:00
Andrew Liang	e73ce0bf2b	Merge pull request #1168 from quantopian/fix_crashing_benchmark FIX: Crashing on calculating benchmarking when no trading days	2016-04-29 14:59:49 -04:00
Andrew Liang	7332586abe	FIX: Crashing on calculating benchmarking when no trading days When we run a simulation that starts and ends on the same weekend, return an empty series for the benchmark so as to not crash	2016-04-29 14:30:46 -04:00
Maya Tydykov	11d666daaa	TST: add test for 13d filings dataset MAINT: add 13d filings to factors init MAINT: rename constant MAINT: add event_date_col field	2016-04-28 11:59:49 -04:00
Maya Tydykov	e726cc94c9	ENH: add 13d filings dataset to pipeline	2016-04-28 11:53:45 -04:00
Andrew Liang	d69b960c49	BUG: Don't save empty positions when user access non-existent position Previously, whenever we try to access a missing value on the Positions dict, we return a default Position and save it to the dict. Instead, just return the Position	2016-04-26 13:28:35 -04:00
Andrew Liang	5809ae17f1	DEV: Better error message for sid= in get_open_orders Let the user to know to use asset= instead	2016-04-26 12:23:57 -04:00
Jean Bredeche	c404c60d68	BUG: don't allow ordering in before_trading_start	2016-04-26 10:56:36 -04:00
Maya Tydykov	b7765fe0d3	Merge pull request #1153 from quantopian/filter-nulls-in-expected-cols Filter nulls in expected cols	2016-04-25 16:32:45 -04:00
Maya Tydykov	0191d9d903	MAINT: move filtering for null date rows back to dataframe TST: test both next and prev event frame loading and use EventsLoader. BUG: remove extra arg MAINT: call list on zip for compatibility with python 3	2016-04-25 16:11:12 -04:00
Maya Tydykov	390295481c	TST: add test for blaze loader with null data in date col MAINT: fix blaze query	2016-04-25 11:42:10 -04:00
Maya Tydykov	e41c99d077	MAINT: add an event date col field to each loader MAINT: add event date col field and filter rows where this field is null TST: modify tests to filter nulls in event date col MAINT: calculate value repeats by vectorized computation on separate start and end dates. MAINT: pass DatetimeIndex instead of list of strings	2016-04-25 11:42:08 -04:00
Maya Tydykov	f8aa7c2ef4	TST: add test for case when null in expected column	2016-04-25 11:42:06 -04:00
Jean Bredeche	02ded435f6	DEV: Don't log an error if we can't find a matching asset/field/day triple in fetcher data	2016-04-25 09:47:18 -04:00
Eddie Hebert	66d05aa563	PERF: Improve read time for smaller num of assets. The BcolzDailyBarReader was optimized for the pipeline case of reading all assets at once. Now that the reader is also used to support daily history the case of reading a data for a small number of assets is more common, particularly in algorithms that use the history API which have a high rotation of assets (e.g. an algorithm which pipeline uses to set the active universe) Remove the bottleneck in reading a small number of assets by conditionally reading the slice for each asset from the carray, instead of reading the data for all equities and then indexing into that full array. On a certain number of assets, it is still better to read all the data at once. On the Quantopian dataset, which holds data for 20000 about for the last 10 years of equity data (where not all equities trade over the full range), stored in 118 blosc blp files per column, the tipping point where the 'read all' mode wins out between 3000-4000 assets. That number was tested by trying to exercise a worst case scenario where the equities were spread out evenly across the blp files, by stepping along a sorted list of assets that were alive over a query range which spanned 70 trading days. ``` size = 3000 sids = [assets[i] for i in range(0, len(assets), len(assets) / size)][:size] ``` Also, add parameter to WithBcolzDailyBarReader fixture which allows the test to specify what the threshold count for reading all data should be, so that the test_us_equity_pricing can be forced into either mode to make sure that both branches in logic are covered by all test cases. On local dev machine this patch improves the read time of `load_raw_array` for one asset from 100 ms to 96.5 µs. (10^5 improvement.) With reading only asset per call a being an observed common case when populating the non-cached values in USEquityHistoryLoader.	2016-04-21 20:43:52 -04:00
Maya Tydykov	e5ccd814e8	Merge pull request #1143 from quantopian/add-final-val-col-to-estimates ENH: add actual value column to estimates dataset.	2016-04-21 16:23:55 -04:00
Jean Bredeche	9d1e15ddde	BUG: Fetcher wasn't working properly in `before_trading_start`. We were trying to use the previous day in before_trading_start because we were looking for the previous market minute, then normalizing it. That's no longer the case, as we want to use today's date for fetcher lookups in before_trading_start. Also refactored a bit how dataportal determines if a query should be routed to the fetcher data structures.	2016-04-21 15:09:14 -04:00
Jean Bredeche	6423a2cfbd	Merge branch 'master' into check-keyword-args	2016-04-21 12:31:45 -04:00
Maya Tydykov	bd58140b97	ENH: add actual value column to estimates dataset.	2016-04-21 11:45:00 -04:00
Jean Bredeche	c323506f40	BUG: we were improperly checking iterable kwargs in BarData	2016-04-21 11:06:46 -04:00
dmichalowicz	d9bfcaabde	ENH: Support multiple outputs for custom factors	2016-04-21 10:57:29 -04:00
Maya Tydykov	1531568899	ENH: add custom dataset for estimize MAINT: alphabetize constants MAINT: remove obsolete column TST: refactor tests to use common code MAINT: remove unneeded fields from dataset MAINT: remove obsolete earnings estimates columns and refactor	2016-04-19 11:29:03 -04:00
Andrew Liang	8aac0ab19f	BUG: Week rule plus time rule doesn't work The next trigger for the week rule get recalculated every time the rule is triggered	2016-04-18 17:05:43 -04:00
Joe Jevnik	bc0b117dc9	MAINT: make the data loading apis more consistent. Changes BcolzDailyBarWriter to not be an abc, data is passed as an iterator of (sid, dataframe) pairs to the write method. Changes the AssetsDBWriter to be a single class which accepts an engine at construction time and has a `write` method for writing dataframes for the various tables. We no longer support writing the various other data types, callers should coerce their data into a dataframe themselves. See zipline.assets.synthetic for some helpers to do this. Adds many new fixtures and updates some existing fixtures to use the new ones: WithDefaultDateBounds A fixture that provides the suite a START_DATE and END_DATE. This is meant to make it easy for other fixtures to synchronize their date ranges without depending on eachother in strange ways. For example, WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should both have data for the same dates, so they may use depend on WithDefaultDates without forcing a dependency between them. WithTmpDir, WithInstanceTmpDir Provides the suite or individual test case a temporary directory. WithBcolzDailyBarReader Provides the suite a BcolzDailyBarReader which reads from bcolz data written to a temporary directory. The data will be read from dataframes and then converted to bcolz files with BcolzDailyBarWriter.write WithBcolzDailyBarReaderFromCSVs Provides the suite a BcolzDailyBarReader which reads from bcolz data written to a temporary directory. The data will be read from a collection of CSV files and then converted into the bcolz data through BcolzDailyBarWriter.write_csvs WithBcolzMinuteBarReader Provides the suite a BcolzMinuteBarReader which reads from bcolz data written to a temporary directory. The data will be read from dataframes and then converted to bcolz files with BcolzMinuteBarWriter.write WithAdjustmentReader Provides the suite a SQLiteAdjustmentReader which reads from an in memory sqlite database. The data will be read from dataframes and then converted into sqlite with SQLiteAdjustmentWriter.write WithDataPortal Provides each test case a DataPortal object with data from temporary resources.	2016-04-15 23:46:10 -04:00
Eddie Hebert	5f9d0a148d	BUG: Prevent out of order history arrays. Fix a bug where if history were called with assets `[1, 2]` and then subsequently, `[2, 1]`, the loader would return the cached array in order for `[1, 2]`. Instead cache an AdjustedArray for each asset, then when a history window is requested, check if each asset has a sufficient cache, and if not then read values for the assets which are missing or need to be refreshed. An added benefit of this change is that if a subsequent call to history has a smaller number of assets than the previous, no new data needs to be read from disk. e.g. a call with assets `[1, 2, 3]` and then `[1, 2]` would use the cached values for `1` and `2` from the first call. Conversely, if the second call has more assets, then only the data for the new assets needs to be retrieved. e.g. a history with `[1, 2]`, then `[1, 2, 3]` would only need (assuming `1` and `2` have not expired) to retrieve data for `3`. Unfortunately, the benefit here is not great because `load_raw_arrays` is optimized for reading many assets, and pulls the entire daily bar dataset into memory. This change makes tuning `load_raw_arrays` so that faster reads (e.g. by slicing from the carray for each asset, instead of pulling all data into a numpy array), when only a few assets are requested, more beneficial than it would have been previously.	2016-04-15 22:44:00 -04:00
Andrew Liang	6d6cd58c3b	BUG: Recalculate trigger for week rule if we miss the first one If we start the simulation on a day so that we miss the trigger (the first for the sim) for that week, recalculate the trigger for next week	2016-04-15 15:09:08 -04:00
Andrew Liang	1ee3c5f049	BUG: week_end rule with offset=0 skips every other week	2016-04-15 10:17:18 -04:00
Eddie Hebert	76e14eda2f	ENH: Add expiring cache. Add a cache interface which supports expirable entries with a changeable backend for the cache into which they are entered. The default cache is a `dict` but could swapped for `cachetools.LRUCache` or any other cache which supports `__get__`, `__set__`, and `__del__`. So that consumers can change the use of `CachedObjects` stored in a cache from: ``` self._cache = {} ... try: obj = self._cache[key] try: return obj.unwrap(dt) except Expired: pass except KeyError: pass ... self._cache[key] = CachedObject(value, new_expiration) ``` to: ``` self._cache = ExpiringCache(LRUCache(maxsize=6)) ... try: return self._cache.get(key, dt) except KeyError: # Get fresh value ... self._cache.set(key, value, new_expiration) ```	2016-04-14 16:10:32 -04:00
Jean Bredeche	63bd7589b7	BUG: support passing an empty list to `data` methods. Our type checking code was a bit too aggressive.	2016-04-14 11:11:08 -04:00
Andrew Liang	8dc3ed73ab	FIX: Check types of args passed to api methods on data	2016-04-13 09:47:07 -04:00
Jean Bredeche	fac5905c10	Merge pull request #1114 from quantopian/handle-data-optional ENH: make handle_data optional	2016-04-13 09:31:41 -04:00
Richard Frank	70befd490b	MAINT: Don't store data portal everywhere Removed lots of data portal references that participated in ref cycles and prevented deterministic cleanup of dbs.	2016-04-12 19:33:22 -04:00
Richard Frank	8b610a2ab7	TST: Cleaned up test references to adjustments db If we don't clean them up, then windows can delete the temp dir with the db.	2016-04-12 19:33:22 -04:00
Richard Frank	5254b273b2	PERF: Reimplemented remember_last with a weak_lru_cache which won't leak instances whose methods have been decorated (specifically DataPortal instances) MAINT: Not using functools32 anymore	2016-04-12 19:33:21 -04:00
Richard Frank	32a400a9fb	BUG: Fixing bitness issues on 32-bit systems by being explicit with sizes	2016-04-12 17:07:50 -04:00

1 2 3 4 5 ...

955 Commits