catalyst

mirror of https://github.com/wassname/catalyst.git synced 2026-06-30 23:26:38 +08:00

Author	SHA1	Message	Date
Scott Sanderson	6e8a4b8144	ENH: Improvements to rank(). - Add an `ascending=True` keyword to `rank()`. - Add `top(N)` and `bottom(N)` methods to Factor. These return Filters that pass the top and bottom N elements each day. - Add a slightly faster path for rank(method='ordinal'). I had originally thought the fast path was 2-3x faster because I had my benchmark data axes flipped. The actual speedup is only 5-10%, which means it probably wasn't worth the effort to Cythonize...but we have a slightly faster version now so we might as well use it. - Refactor test_filter and test_factor to make it easier to implement and test transformations on factors. These tests now subclass BaseFFCTestCase, which provides facilities for passing a dict of terms and an "initial_workspace", the values for which are used by SimpleFFCEngine rather than needing to manually manage the inputs and outputs of each term.	2015-08-31 00:32:33 -04:00
Scott Sanderson	a04dcfa6b8	TEST: Rename test.	2015-08-29 23:55:59 -04:00
Scott Sanderson	90e81d0df0	MAINT: Add TermGraph class. Use a subclass of networkx.DiGraph to encapsulate the state of our dependency graph.	2015-08-29 23:55:59 -04:00
Scott Sanderson	780263da06	ENH: Return asset-indexed DataFrame for data.factors. This makes ordering with the returned assets much easier, and there's no performance degradation for non-broadcasting operations on the Index. Timings ------- from random import sample finder = AssetFinder(create_table=False, assets.db') assets = load_8000_assets(finder) AAPL = finder.retrieve_asset(24) RANDOM_ASSETS = sample(assets, 500) df = DataFrame( index=assets, data=np.random.randn(len(assets), 4), columns=['a', 'b', 'c', 'd'], ) df_int = DataFrame( index=map(int, assets), data=np.random.randn(len(assets), 4), columns=['a', 'b', 'c', 'd'], ) %timeit df.loc[24] %timeit df_int.loc[24] 10000 loops, best of 3: 45.3 µs per loop 10000 loops, best of 3: 44.7 µs per loop %timeit df.loc[AAPL] %timeit df_int.loc[AAPL] 10000 loops, best of 3: 45.1 µs per loop 10000 loops, best of 3: 44.8 µs per loop %timeit df.loc[RANDOM_ASSETS] %timeit df_int.loc[RANDOM_ASSETS] 1000 loops, best of 3: 1.53 ms per loop 100 loops, best of 3: 2.18 ms per loop %timeit df.sum() %timeit df_int.sum() 10000 loops, best of 3: 56 µs per loop 10000 loops, best of 3: 55.7 µs per loop %timeit df.index == 3 %timeit df_int.index == 3 1000 loops, best of 3: 253 µs per loop 100000 loops, best of 3: 6.76 µs per loop %timeit df.iloc[:50] %timeit df_int.iloc[:50] 10000 loops, best of 3: 44.3 µs per loop 10000 loops, best of 3: 44 µs per loop	2015-08-26 18:33:54 -04:00
Scott Sanderson	f7039d6f52	ENH: Make data available in before_trading_start.	2015-08-21 12:37:17 -04:00
Richard Frank	30847a10a7	BUG: Interface of load_adjusted_array is to return a list of arrays but MultiColumnLoader was returning a list of lists of arrays in some cases.	2015-08-19 10:12:19 -04:00
Scott Sanderson	b89fc0c028	BUG: Fix error from RequiredWindowLengthMixin. WindowLengthNotSpecified expects an argument.	2015-08-04 01:41:03 -04:00
Scott Sanderson	7bb20eb297	MAINT: Check dates before computing factor_matrix. In SimpleFFCEngine.factor_matrix barf with a useful error if end_date <= start_date.	2015-08-03 12:06:24 -04:00
Scott Sanderson	5da03d2df5	BUG: Make NumExprFilter return ndarray. - Previously it was returning a DataFrame because of how we applied an & with a DataFrame mask. The error was masked by the fact that `np.assert_array_equal` coerces inputs to arrays before comparing. - Added `zp.utils.test_utils.check_arrays`, which checks type equality before calling `np.assert_array_equal`.	2015-08-03 11:59:11 -04:00
Scott Sanderson	ef4f642e62	ENH: Compute engine architecture for FFC API. This patch lays the groundwork for a compute engine designed to facilitate construction of factor-based universe screening and portfolio allocation. It contains: A new module, `zipline.modelling`, containing entities that can be used to express computations as dependency graphs. Each node in such a graph is an instance of the base `Term` class, defined in `zipline.modelling.term`. Dependency graphs are executed by instances of `FFCEngine`, defined in `zipline.modelling.engine`. A new module, `zipline.data.ffc`, containing loaders and dataset definitions for inputs to the modelling API. New `TradingAlgorithm` api methods: `add_factor`, and `add_filter`. These methods can only be called from `initialize`, and are used to inform the algorithm that each day it should compute the given terms. Computed factor results are made available through a new attribute of the `data` object in `before_trading_start` and `handle_data`. Computed filter results control which assets are available in the factor matrix on each day.	2015-07-29 12:30:46 -04:00

10 Commits