catalyst

mirror of https://github.com/wassname/catalyst.git synced 2026-07-02 11:01:10 +08:00

Author	SHA1	Message	Date
Scott Sanderson	c40bbfae03	TEST: More tests for string predicates.	2016-05-04 15:54:50 -04:00
Scott Sanderson	bb6f908036	TEST: Add test for categorical postprocessing.	2016-05-04 15:54:50 -04:00
Scott Sanderson	5f190395ad	ENH: Add support for strings in Pipeline. - Adds a new class, ``LabelArray``, which is a subclass of np.ndarray. LabelArray is conceptually similar to pandas.Categorical, in that it stores data with many duplicate values as indices into an array of unique values. For string data with many duplicates (e.g. time-series of tickers or or industry classifications), this provides multiple orders of magnitude of improvement when doing string operations, especially string comparison/matching operations. - Adds a new generic object "specialization" for `AdjustedArrayWindow`, and a corresponding ObjectOverwrite adjustment. - Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``. This method is called on the final result of any pipeline expression after screen filtering has occurred. The default implementation of ``postprocess`` is identity, but Classifier overrides it to coerce string columns into pandas.Categoricals before presenting them to the user.	2016-05-04 15:50:52 -04:00
Eddie Hebert	8756bf2c91	Merge pull request #1177 from quantopian/limit-minute-carrays PERF: Cap memory usage by minute bar carrays.	2016-05-04 12:47:18 -04:00
Eddie Hebert	1248dcde36	PERF: Cap memory usage by minute bar carrays. Instead of letting the cache of carrays grow unbounded, use an LRUCache to cap the number of equities for any given column. Tested with the size 1000, on an algo that was using pipeline which was using over 3000, runtimes were similar, but the memory usage was successfully capped to around 1.2GB. Also, tested with an algorithm which bought and hold just one equity and no major slow down was seen when using the LRUCache vs. a dictionary. We may want to follow this up with an extension to `carray` which is not as memory hungry per column; e.g. by not loading repeated/similar metadata or releasing the last read chunk after a certain amount of time.	2016-05-04 12:08:50 -04:00
Joe Jevnik	f3e436a1bf	Merge pull request #1173 from quantopian/quandl-wiki-loader Quandl wiki loader	2016-05-03 19:11:18 -04:00
Joe Jevnik	59c8e371a2	ENH: Updates the cli, data bundles and extensions. Adds the data bundle concept which makes it easy for users to register loading functions to build out minute and daily data along with an assets db and adjustments db. By default we have provided a `quandl` bundle which pulls from the public domain WIKI dataset. Users may register new bundles by decorating an ingest function with `zipline.data.bundles.register(<name>)`. This also provides a `yahoo_equities` function for creating an ingestion function that will load a static set of assets from yahoo. The cli is now structured as a couple of subcommands and has been changed to `python -m zipline`. The old behavior of `run_algo.py` has been moved to the `run` subcommand. This is almost entirely the same except that it now takes the name of the data bundle to use, defaulting to `quandl`. The next subcommand is `ingest` which takes the name of a data bundle to ingest. This will run the loading machinery and write the data to a specified location that `run` can find. There is also a `clean` subcommand which deletes the data that was written with `ingest`. Extensions have also been added to zipline. This is an experimental feature where users can provide an extra set of python files to run at the start of the process. These can be used to configure aspects of zipline. Right now the only thing that is supported in an extension file is the registration of a new data bundle.	2016-05-03 18:38:24 -04:00
Andrew Liang	21bc598d85	Merge pull request #1169 from quantopian/beyond_max_day FIX: Error message for BenchmarkAssetNotAvailableTooLate is wrong	2016-05-02 12:21:39 -04:00
Andrew Liang	fb6bda5840	FIX: Error message for BenchmarkAssetNotAvailableTooLate is wrong Should be '...does not exist on self.trading_days[-1]...' not self.trading_days[0]	2016-05-02 12:00:35 -04:00
Joe Jevnik	efac476976	ENH: make BcolzMinuteBarWriter.write take iterable Updates the BcolzMinuteBarWriter.write api to allow users to pass their data as a stream instead of requiring that they loop over their data externally. This matches the API presented by BcolzDailyBarWriter.	2016-04-29 16:14:48 -04:00
Andrew Liang	e73ce0bf2b	Merge pull request #1168 from quantopian/fix_crashing_benchmark FIX: Crashing on calculating benchmarking when no trading days	2016-04-29 14:59:49 -04:00
Andrew Liang	bd07e824be	FIX: Refactor to pass benchmark_asset to appropriate methods	2016-04-29 14:30:46 -04:00
Andrew Liang	7332586abe	FIX: Crashing on calculating benchmarking when no trading days When we run a simulation that starts and ends on the same weekend, return an empty series for the benchmark so as to not crash	2016-04-29 14:30:46 -04:00
David Michalowicz	5d5b072112	Merge pull request #1172 from quantopian/identification-please BUG: Don't crash on dataframes with assets in index.	2016-04-28 15:56:19 -04:00
dmichalowicz	8d1ecb508a	Use string_types	2016-04-28 15:28:19 -04:00
Scott Sanderson	85ae664d8c	BUG: Don't crash on dataframes with assets in index.	2016-04-28 15:19:57 -04:00
Maya Tydykov	73c3bd6955	Merge pull request #1106 from quantopian/13d_in_pipeline 13d in pipeline	2016-04-28 13:18:47 -04:00
Maya Tydykov	11d666daaa	TST: add test for 13d filings dataset MAINT: add 13d filings to factors init MAINT: rename constant MAINT: add event_date_col field	2016-04-28 11:59:49 -04:00
Maya Tydykov	e726cc94c9	ENH: add 13d filings dataset to pipeline	2016-04-28 11:53:45 -04:00
Andrew Liang	231c3a58b1	Merge pull request #1166 from quantopian/empty_positions BUG: Don't save empty positions when user access non-existent position	2016-04-26 16:58:11 -04:00
Andrew Liang	4ffe04e4a5	FIX: Add last_sale_date to Position init for consistency	2016-04-26 16:13:07 -04:00
Andrew Liang	d69b960c49	BUG: Don't save empty positions when user access non-existent position Previously, whenever we try to access a missing value on the Positions dict, we return a default Position and save it to the dict. Instead, just return the Position	2016-04-26 13:28:35 -04:00
Jean Bredeche	50f4917341	Merge pull request #1164 from quantopian/get_open_orders_error DEV: Better error message for sid= in get_open_orders	2016-04-26 13:01:50 -04:00
Andrew Liang	5809ae17f1	DEV: Better error message for sid= in get_open_orders Let the user to know to use asset= instead	2016-04-26 12:23:57 -04:00
Jean Bredeche	789dba8eca	Merge pull request #1165 from quantopian/dont-order-in-bts BUG: don't allow ordering in before_trading_start	2016-04-26 10:57:05 -04:00
Jean Bredeche	c404c60d68	BUG: don't allow ordering in before_trading_start	2016-04-26 10:56:36 -04:00
Maya Tydykov	b7765fe0d3	Merge pull request #1153 from quantopian/filter-nulls-in-expected-cols Filter nulls in expected cols	2016-04-25 16:32:45 -04:00
Jean Bredeche	ba20235d83	Merge pull request #1162 from quantopian/handle-missing-fields DEV: Don't log an error if we can't find a matching asset/field/day triple in Fetcher	2016-04-25 16:22:17 -04:00
Maya Tydykov	0191d9d903	MAINT: move filtering for null date rows back to dataframe TST: test both next and prev event frame loading and use EventsLoader. BUG: remove extra arg MAINT: call list on zip for compatibility with python 3	2016-04-25 16:11:12 -04:00
Maya Tydykov	390295481c	TST: add test for blaze loader with null data in date col MAINT: fix blaze query	2016-04-25 11:42:10 -04:00
Maya Tydykov	8585fd5b59	MAINT: move filtering for nulls in date column to blaze loader	2016-04-25 11:42:10 -04:00
Maya Tydykov	e41c99d077	MAINT: add an event date col field to each loader MAINT: add event date col field and filter rows where this field is null TST: modify tests to filter nulls in event date col MAINT: calculate value repeats by vectorized computation on separate start and end dates. MAINT: pass DatetimeIndex instead of list of strings	2016-04-25 11:42:08 -04:00
Maya Tydykov	f8aa7c2ef4	TST: add test for case when null in expected column	2016-04-25 11:42:06 -04:00
Jean Bredeche	19128fa5a3	Merge pull request #1163 from quantopian/flake8-first BLD: run flake8 first, before tests	2016-04-25 10:50:22 -04:00
Jean Bredeche	d9d0c2f9fc	run flake8 first, before tests	2016-04-25 09:56:44 -04:00
Jean Bredeche	02ded435f6	DEV: Don't log an error if we can't find a matching asset/field/day triple in fetcher data	2016-04-25 09:47:18 -04:00
Maya Tydykov	89412616a6	MAINT: filter rows with nulls in expected columns	2016-04-22 10:38:20 -04:00
Eddie Hebert	a13e336ef5	Merge pull request #1157 from quantopian/use-carray-instead-of-read-all-on-small-size PERF: Improve read time for smaller num of assets.	2016-04-21 22:25:01 -04:00
Richard Frank	baa9850337	Merge pull request #1160 from quantopian/the-garbage-man-cant TST: What if we don't gc...	2016-04-21 20:46:21 -04:00
Eddie Hebert	66d05aa563	PERF: Improve read time for smaller num of assets. The BcolzDailyBarReader was optimized for the pipeline case of reading all assets at once. Now that the reader is also used to support daily history the case of reading a data for a small number of assets is more common, particularly in algorithms that use the history API which have a high rotation of assets (e.g. an algorithm which pipeline uses to set the active universe) Remove the bottleneck in reading a small number of assets by conditionally reading the slice for each asset from the carray, instead of reading the data for all equities and then indexing into that full array. On a certain number of assets, it is still better to read all the data at once. On the Quantopian dataset, which holds data for 20000 about for the last 10 years of equity data (where not all equities trade over the full range), stored in 118 blosc blp files per column, the tipping point where the 'read all' mode wins out between 3000-4000 assets. That number was tested by trying to exercise a worst case scenario where the equities were spread out evenly across the blp files, by stepping along a sorted list of assets that were alive over a query range which spanned 70 trading days. ``` size = 3000 sids = [assets[i] for i in range(0, len(assets), len(assets) / size)][:size] ``` Also, add parameter to WithBcolzDailyBarReader fixture which allows the test to specify what the threshold count for reading all data should be, so that the test_us_equity_pricing can be forced into either mode to make sure that both branches in logic are covered by all test cases. On local dev machine this patch improves the read time of `load_raw_array` for one asset from 100 ms to 96.5 µs. (10^5 improvement.) With reading only asset per call a being an observed common case when populating the non-cached values in USEquityHistoryLoader.	2016-04-21 20:43:52 -04:00
Richard Frank	8c92f2d241	TST: What if we don't gc... Looks like we removed ref cycles elsewhere, so windows builds are passing without this.	2016-04-21 18:41:57 -04:00
Richard Frank	ef9b986b38	Merge pull request #1159 from quantopian/appveyor-conda-reunion Appveyor conda reunion	2016-04-21 17:24:40 -04:00
Richard Frank	b510703169	BLD: Using GCE to prevent the "input line is too long" error when activating a conda env.	2016-04-21 17:00:30 -04:00
Richard Frank	bffe00f931	BLD: Print the download failure reason	2016-04-21 17:00:30 -04:00
Richard Frank	baf10e3148	BLD: Don't accept the download if we tried too many times It's likely we have an incomplete file.	2016-04-21 17:00:30 -04:00
Richard Frank	9a59cd064b	BLD: Download miniconda over SSL	2016-04-21 17:00:30 -04:00
Maya Tydykov	e5ccd814e8	Merge pull request #1143 from quantopian/add-final-val-col-to-estimates ENH: add actual value column to estimates dataset.	2016-04-21 16:23:55 -04:00
Jean Bredeche	cb5ed8d1a8	Merge pull request #1158 from quantopian/yes-we-want-the-broker-order-id BUG: Restoring 'broker_order_id' to Order's dict	2016-04-21 15:20:47 -04:00
Jean Bredeche	2a981dc725	BUG: Restoring 'broker_order_id' to Order's dict More long-term fix is coming later, this restores existing downstream behavior.	2016-04-21 15:18:42 -04:00
Jean Bredeche	f06968f494	Merge pull request #1156 from quantopian/fetcher_bts BUG: Fetcher wasn't working properly in `before_trading_start`.	2016-04-21 15:09:52 -04:00

1 2 3 4 5 ...

3765 Commits