The AssetDBWriter class and its subclasses will
ultimately be responsible for creating the SQLite
database tables and writing data to these tables.
In the longer term AssetDBWriter and AssetFinder will
be decoupled, sharing only an SQLite connection.
However, for backward compatibility reasons this has
not yet been fully implemented.
Modify tests since AssetFinder no longer has a
metadata_cache attribute.
Previously symbols were resolved to sids based on the end of
simulation date. This commit allows the user to specify the
date for which resolution will take place using a new
set_symbol_lookup_date() API method.
If the user does not use this method the lookup date will
default back to the simulation end date.
- Add an `ascending=True` keyword to `rank()`.
- Add `top(N)` and `bottom(N)` methods to Factor. These return Filters
that pass the top and bottom N elements each day.
- Add a slightly faster path for rank(method='ordinal'). I had
originally thought the fast path was 2-3x faster because I had my
benchmark data axes flipped. The actual speedup is only 5-10%, which
means it probably wasn't worth the effort to Cythonize...but we have a
slightly faster version now so we might as well use it.
- Refactor test_filter and test_factor to make it easier to implement
and test transformations on factors. These tests now subclass
BaseFFCTestCase, which provides facilities for passing a dict of terms
and an "initial_workspace", the values for which are used by
SimpleFFCEngine rather than needing to manually manage the inputs and
outputs of each term.
This makes ordering with the returned assets much easier, and there's no
performance degradation for non-broadcasting operations on the Index.
Timings
-------
from random import sample
finder = AssetFinder(create_table=False, assets.db')
assets = load_8000_assets(finder)
AAPL = finder.retrieve_asset(24)
RANDOM_ASSETS = sample(assets, 500)
df = DataFrame(
index=assets,
data=np.random.randn(len(assets), 4),
columns=['a', 'b', 'c', 'd'],
)
df_int = DataFrame(
index=map(int, assets),
data=np.random.randn(len(assets), 4),
columns=['a', 'b', 'c', 'd'],
)
%timeit df.loc[24]
%timeit df_int.loc[24]
10000 loops, best of 3: 45.3 µs per loop
10000 loops, best of 3: 44.7 µs per loop
%timeit df.loc[AAPL]
%timeit df_int.loc[AAPL]
10000 loops, best of 3: 45.1 µs per loop
10000 loops, best of 3: 44.8 µs per loop
%timeit df.loc[RANDOM_ASSETS]
%timeit df_int.loc[RANDOM_ASSETS]
1000 loops, best of 3: 1.53 ms per loop
100 loops, best of 3: 2.18 ms per loop
%timeit df.sum()
%timeit df_int.sum()
10000 loops, best of 3: 56 µs per loop
10000 loops, best of 3: 55.7 µs per loop
%timeit df.index == 3
%timeit df_int.index == 3
1000 loops, best of 3: 253 µs per loop
100000 loops, best of 3: 6.76 µs per loop
%timeit df.iloc[:50]
%timeit df_int.iloc[:50]
10000 loops, best of 3: 44.3 µs per loop
10000 loops, best of 3: 44 µs per loop