- Add an `ascending=True` keyword to `rank()`.
- Add `top(N)` and `bottom(N)` methods to Factor. These return Filters
that pass the top and bottom N elements each day.
- Add a slightly faster path for rank(method='ordinal'). I had
originally thought the fast path was 2-3x faster because I had my
benchmark data axes flipped. The actual speedup is only 5-10%, which
means it probably wasn't worth the effort to Cythonize...but we have a
slightly faster version now so we might as well use it.
- Refactor test_filter and test_factor to make it easier to implement
and test transformations on factors. These tests now subclass
BaseFFCTestCase, which provides facilities for passing a dict of terms
and an "initial_workspace", the values for which are used by
SimpleFFCEngine rather than needing to manually manage the inputs and
outputs of each term.
This makes ordering with the returned assets much easier, and there's no
performance degradation for non-broadcasting operations on the Index.
Timings
-------
from random import sample
finder = AssetFinder(create_table=False, assets.db')
assets = load_8000_assets(finder)
AAPL = finder.retrieve_asset(24)
RANDOM_ASSETS = sample(assets, 500)
df = DataFrame(
index=assets,
data=np.random.randn(len(assets), 4),
columns=['a', 'b', 'c', 'd'],
)
df_int = DataFrame(
index=map(int, assets),
data=np.random.randn(len(assets), 4),
columns=['a', 'b', 'c', 'd'],
)
%timeit df.loc[24]
%timeit df_int.loc[24]
10000 loops, best of 3: 45.3 µs per loop
10000 loops, best of 3: 44.7 µs per loop
%timeit df.loc[AAPL]
%timeit df_int.loc[AAPL]
10000 loops, best of 3: 45.1 µs per loop
10000 loops, best of 3: 44.8 µs per loop
%timeit df.loc[RANDOM_ASSETS]
%timeit df_int.loc[RANDOM_ASSETS]
1000 loops, best of 3: 1.53 ms per loop
100 loops, best of 3: 2.18 ms per loop
%timeit df.sum()
%timeit df_int.sum()
10000 loops, best of 3: 56 µs per loop
10000 loops, best of 3: 55.7 µs per loop
%timeit df.index == 3
%timeit df_int.index == 3
1000 loops, best of 3: 253 µs per loop
100000 loops, best of 3: 6.76 µs per loop
%timeit df.iloc[:50]
%timeit df_int.iloc[:50]
10000 loops, best of 3: 44.3 µs per loop
10000 loops, best of 3: 44 µs per loop
- Previously it was returning a DataFrame because of how we applied an &
with a DataFrame mask. The error was masked by the fact that
`np.assert_array_equal` coerces inputs to arrays before comparing.
- Added `zp.utils.test_utils.check_arrays`, which checks type equality
before calling `np.assert_array_equal`.
This patch lays the groundwork for a compute engine designed to
facilitate construction of factor-based universe screening and portfolio
allocation. It contains:
A new module, `zipline.modelling`, containing entities that can be used
to express computations as dependency graphs. Each node in such a graph
is an instance of the base `Term` class, defined in
`zipline.modelling.term`. Dependency graphs are executed by instances
of `FFCEngine`, defined in `zipline.modelling.engine`.
A new module, `zipline.data.ffc`, containing loaders and dataset
definitions for inputs to the modelling API.
New `TradingAlgorithm` api methods: `add_factor`, and `add_filter`.
These methods can only be called from `initialize`, and are used to
inform the algorithm that each day it should compute the given terms.
Computed factor results are made available through a new attribute of
the `data` object in `before_trading_start` and `handle_data`. Computed
filter results control which assets are available in the factor matrix
on each day.