Commit Graph

621 Commits

Author SHA1 Message Date
Stewart Douglas 1c512c5478 TST: Update test_algorithm.py to incorporate TradingEnvironment.write_data 2015-09-10 11:53:25 -04:00
Stewart Douglas 501fd58fdf ENH: Replace update_asset_finder with write_data
The write_data methods invokes the relevant AssetDBWriter subclass
to write data to the database. update_asset_finder is no longer
a relevant method since the AssetFinder is strictly a reader class.
2015-09-10 11:53:24 -04:00
Stewart Douglas 97e980751f MAINT: Integrate asset writer changes into TradingEnvironment 2015-09-10 11:53:23 -04:00
Stewart Douglas abcb6704e8 TST: Modify tests given new AssetFinder behavior 2015-09-10 11:53:22 -04:00
Stewart Douglas 746f70a133 ENH: Create AssetDBWriter class
The AssetDBWriter class and its subclasses will
ultimately be responsible for creating the SQLite
database tables and writing data to these tables.

In the longer term AssetDBWriter and AssetFinder will
be decoupled, sharing only an SQLite connection.
However, for backward compatibility reasons this has
not yet been fully implemented.

Modify tests since AssetFinder no longer has a
metadata_cache attribute.
2015-09-10 11:53:22 -04:00
Stewart Douglas 45adc57267 TST: Simulate running an algo using scripts/run_algo.py
Previously we have not had test coverage of the parse_args() or
run_pipeline() functions invoked when running scripts/run_algo.py.
2015-09-10 11:47:45 -04:00
Stewart Douglas 283c959cc4 MAINT: Move analyze methods into algorithm files 2015-09-10 11:36:14 -04:00
Stewart Douglas e507ee097d TST: Add coverage for set_symbol_lookup_date method 2015-09-08 11:01:04 -04:00
Stewart Douglas 1e31866471 MAINT: FutureChain should only accept Timestamp 2015-09-08 11:01:04 -04:00
Jonathan Kamens 510dc2ae7b TST: test_finance.py can't handle being parallelized 2015-09-08 08:49:11 -04:00
jfkirk cf41373f8f BUG: Symbol look-up now uses the sim_params.period_end as a look-up date 2015-09-01 12:39:03 -04:00
Scott Sanderson 6e8a4b8144 ENH: Improvements to rank().
- Add an `ascending=True` keyword to `rank()`.

- Add `top(N)` and `bottom(N)` methods to Factor.  These return Filters
  that pass the top and bottom N elements each day.

- Add a slightly faster path for rank(method='ordinal').  I had
  originally thought the fast path was 2-3x faster because I had my
  benchmark data axes flipped.  The actual speedup is only 5-10%, which
  means it probably wasn't worth the effort to Cythonize...but we have a
  slightly faster version now so we might as well use it.

- Refactor test_filter and test_factor to make it easier to implement
  and test transformations on factors.  These tests now subclass
  BaseFFCTestCase, which provides facilities for passing a dict of terms
  and an "initial_workspace", the values for which are used by
  SimpleFFCEngine rather than needing to manually manage the inputs and
  outputs of each term.
2015-08-31 00:32:33 -04:00
Scott Sanderson a04dcfa6b8 TEST: Rename test. 2015-08-29 23:55:59 -04:00
Scott Sanderson 90e81d0df0 MAINT: Add TermGraph class.
Use a subclass of networkx.DiGraph to encapsulate the state of our
dependency graph.
2015-08-29 23:55:59 -04:00
Scott Sanderson 780263da06 ENH: Return asset-indexed DataFrame for data.factors.
This makes ordering with the returned assets much easier, and there's no
performance degradation for non-broadcasting operations on the Index.

Timings
-------

    from random import sample
    finder = AssetFinder(create_table=False, assets.db')
    assets = load_8000_assets(finder)
    AAPL = finder.retrieve_asset(24)
    RANDOM_ASSETS = sample(assets, 500)
    df = DataFrame(
        index=assets,
        data=np.random.randn(len(assets), 4),
        columns=['a', 'b', 'c', 'd'],
    )
    df_int = DataFrame(
        index=map(int, assets),
        data=np.random.randn(len(assets), 4),
        columns=['a', 'b', 'c', 'd'],
    )

    %timeit df.loc[24]
    %timeit df_int.loc[24]

    10000 loops, best of 3: 45.3 µs per loop
    10000 loops, best of 3: 44.7 µs per loop

    %timeit df.loc[AAPL]
    %timeit df_int.loc[AAPL]

    10000 loops, best of 3: 45.1 µs per loop
    10000 loops, best of 3: 44.8 µs per loop

    %timeit df.loc[RANDOM_ASSETS]
    %timeit df_int.loc[RANDOM_ASSETS]

    1000 loops, best of 3: 1.53 ms per loop
    100 loops, best of 3: 2.18 ms per loop

    %timeit df.sum()
    %timeit df_int.sum()

    10000 loops, best of 3: 56 µs per loop
    10000 loops, best of 3: 55.7 µs per loop

    %timeit df.index == 3
    %timeit df_int.index == 3

    1000 loops, best of 3: 253 µs per loop
    100000 loops, best of 3: 6.76 µs per loop

    %timeit df.iloc[:50]
    %timeit df_int.iloc[:50]

    10000 loops, best of 3: 44.3 µs per loop
    10000 loops, best of 3: 44 µs per loop
2015-08-26 18:33:54 -04:00
Jonathan Kamens 2521263c06 TST: Prevent some test cases from being split 2015-08-25 11:56:36 -04:00
Scott Sanderson f7039d6f52 ENH: Make data available in before_trading_start. 2015-08-21 12:37:17 -04:00
Richard Frank 30847a10a7 BUG: Interface of load_adjusted_array is to return a list of arrays
but MultiColumnLoader was returning a list of lists of arrays in some
cases.
2015-08-19 10:12:19 -04:00
Scott Sanderson 4b7cef8703 TEST: Clarify test in asset finder.
Fix comment copypasta and add a check for the third sid that should be
found.
2015-08-13 11:46:19 -04:00
Andrew Daniels 48c609debc BUG: Improves lookup_future_chain to handle NaT date args
If lookup_future_chain was provided with an as_of_date or knowledge date that was pandas.NaT, the query we were forming wasn't what we want. Instead, as_of_date, if not NaT, is used for knowledge_date, and if both are NaT, no date filtering is done in the query.
2015-08-05 10:50:14 -04:00
Scott Sanderson b89fc0c028 BUG: Fix error from RequiredWindowLengthMixin.
WindowLengthNotSpecified expects an argument.
2015-08-04 01:41:03 -04:00
Scott Sanderson 7bb20eb297 MAINT: Check dates before computing factor_matrix.
In SimpleFFCEngine.factor_matrix barf with a useful error if end_date <=
start_date.
2015-08-03 12:06:24 -04:00
Scott Sanderson 5da03d2df5 BUG: Make NumExprFilter return ndarray.
- Previously it was returning a DataFrame because of how we applied an &
  with a DataFrame mask.  The error was masked by the fact that
  `np.assert_array_equal` coerces inputs to arrays before comparing.

- Added `zp.utils.test_utils.check_arrays`, which checks type equality
  before calling `np.assert_array_equal`.
2015-08-03 11:59:11 -04:00
jfkirk 67c56f768b ENH: Adds auto-closing feature and implements for Futures 2015-07-31 10:38:44 -04:00
Scott Sanderson f13e9fd125 TEST: Add test asserting dynamic api_methods. 2015-07-29 12:30:46 -04:00
Scott Sanderson ef4f642e62 ENH: Compute engine architecture for FFC API.
This patch lays the groundwork for a compute engine designed to
facilitate construction of factor-based universe screening and portfolio
allocation.  It contains:

A new module, `zipline.modelling`, containing entities that can be used
to express computations as dependency graphs.  Each node in such a graph
is an instance of the base `Term` class, defined in
`zipline.modelling.term`.  Dependency graphs are executed by instances
of `FFCEngine`, defined in `zipline.modelling.engine`.

A new module, `zipline.data.ffc`, containing loaders and dataset
definitions for inputs to the modelling API.

New `TradingAlgorithm` api methods: `add_factor`, and `add_filter`.
These methods can only be called from `initialize`, and are used to
inform the algorithm that each day it should compute the given terms.
Computed factor results are made available through a new attribute of
the `data` object in `before_trading_start` and `handle_data`.  Computed
filter results control which assets are available in the factor matrix
on each day.
2015-07-29 12:30:46 -04:00
jfkirk 8d5bfd3c91 BUG: Aligns performance packet generation between minute and daily modes 2015-07-21 13:25:39 -04:00
Eddie Hebert ace2b5c9e9 PERF: Improve risk metrics update speed.
Remove the DataFrame of headline risk metrics, in favor of a numpy array
for each metric, like the underlying vectors.
2015-07-15 15:36:35 -04:00
Eddie Hebert 27ab36deb2 MAINT: Remove references to minute risk.
The minutely calculation of risk metrics had been removed with a
previous patch, remove vestigial references.

Remove a test which tested the behavior of updating the second minute of
a day.

Remove the logic that changed the datetime index of the risk metrics
depending on emission rate, now only trading_days are needed.

Remove `returns_frequency` parameter since both minute and daily
data frequency always use daily returns.
2015-07-15 15:36:35 -04:00
Eddie Hebert 36319122cc PERF: Change asset finder to be backed by sqlite3.
Attack the startup bottleneck of creating the asset finders caches for a
large universe, which was between 1-2 seconds on development and
production machines.

Instead, allow the AssetFinder to be passed a sqlite3 file that has
already been populated and then hydrate asset objects only when an
equity is referenced for the first time.

To create aforementioned sqlite3, create an AssetFinder with an db_path
and `create_table` set to True. If `create_table` is set to False, the
prepopulated data in the sqlite file found at db_path will be used.

Default behavior is to use an in memory database.

Behavior that changes:

- Fuzzy lookup now only works on one character, that character needs to be
specified at write/metadata consumption time, since the fuzzy lookup key
is created by dropping the character from each symbol.

- Overwriting partially written metadata is no longer
  supported. i.e. some unit tests allowed for inserting just the identifier,
  and then later updating the symbol, end_date, etc.

  Instead of building an upsert behavior at this time, this patch
  changes the unit tests so that the data for each asset is only
  inserted once.

Other notes:

- populate_cache is now removed, since there is no longer a two step
  process of inserting metadata and then realizing that metadata into
  assets. _spawn_asset is rolled into insert_metadata, so that a call to
  insert_metadata both converts the metadata and makes it available in
  the data store.
2015-07-14 09:54:38 -04:00
Andrew Daniels 2ab9f8a63c ENH: Futures API 2015-07-13 09:50:36 -04:00
Eddie Hebert ad4126bf58 STY: Remove unused import.
Mea culpa.
2015-07-10 17:03:44 -04:00
Eddie Hebert 2616a12551 TST: Remove test for lookup of future contract by symbol.
The lookup of future contract by individual symbol is a constraint on
incoming changes of changing how the asset finder stores data.

(i.e. the asset finder is changing so that there are separate tables for
both futures and equities.)

Since this lookup is not yet fully supported, we can add it back in on
top of the new asset finder.
2015-07-10 14:59:33 -04:00
jfkirk efa6d8dbce ENH: Adds a perf tracker method to handle SIDs leaving the universe 2015-07-09 17:03:21 -04:00
Andrew Daniels 7cde3939bf BUG: Determine valid future contracts with notice date
Since most brokers will cease accepting trades by the notice date, contracts should not be considered valid after the notice date. This commit adjusts the lookup_future_chain method to consider all contracts with notice dates on or following the current date invalid.
2015-07-09 15:23:29 -04:00
Eddie Hebert 9688989eba MAINT: Use symbol lookup directly from algorithm.
Instead of using the generic lookup, use the asset finder symbol method
directly when `symbol` is used in an algorithm.
2015-07-08 14:41:02 -04:00
Eddie Hebert f46fef1755 TST: Remove asset finder test for NASDAQ collisions.
The asset finder retrieved from the test environment is empty, so the
test does not end up testing anything, since the test cases loop over
the empty list of sids in the asset finder.

Remove to possibly be added back in and re-implemented after a larger
refactoring of the module.
2015-07-06 10:52:36 -04:00
Andrew Daniels 977c6cfcde MAINT: Consolidates and improves future lookup methods
Removes unused future lookup methods and consolidates everything into lookup_future_chain. Since the FutureChain object will have to hold a root symbol and dates, it should be responsible for cleaning the user input, so this is removed from the lookup method.

Adds knowledge date to future lookups. This makes our definition of valid contracts more flexible. We know about a contract if it starts trading by the knowledge date, and a contract is expired if it expires by the as_of_date.

Also fixes a bug with computing future chains, where contracts were not included in the chain on their expiration date.
2015-07-02 10:34:32 -04:00
jfkirk 1ec70b2a26 ENH: Removes use of lookup_generic in DataFrame index mapping 2015-07-01 13:43:31 -04:00
jfkirk 2421753509 TST: Fixes broken tests for DataFrameSource 2015-07-01 13:43:31 -04:00
jfkirk 258b5ea2ca API: DataFrame/Panel sources expect integer sids, not identifiers
This commit modifies the DataFrameSource and DataPanelSource to accept only Int64Indexes on the incoming data and moves the burden of mapping user identifiers to TradingAlgorithm.run().
2015-07-01 13:43:31 -04:00
jfkirk a4ce9712b8 DEP: Removes sids field from SimulationParameters 2015-07-01 13:43:31 -04:00
jfkirk 31f24a238a DEP: Removes unnecessary identifier_cache from asset_finder
The identifier cache's usage was nearly identical to using lookup_generic, so this commit removes identifier-keyed caching and modifies anything that uses it.
2015-07-01 13:43:31 -04:00
Eddie Hebert 7a1a6ddb37 PERF: Reduce time spent indexing in risk cumulative update.
Instead of using the pandas.Series datetime index for every single
vector, get the index at the beginning of the update loop based on the
dt and then use that index to set the values.

Also, since the dt lookup is no longer needed, store the values as numpy
arrays, which are more lightweight.

Locally, this patch cuts out about 60% of the time spent in the update
method.
2015-07-01 10:52:02 -04:00
Andrew Daniels 759f346c93 BUG: Fixes issues with AssetFinder future lookups
Contracts must have been trading at the as_of_date to be considered valid, and a contract's position in the chain is now zero-indexed.
2015-06-29 09:51:50 -04:00
Andrew Daniels cc77a52322 ENH: Adds future chain cache and future lookups to AssetFinder 2015-06-25 10:18:18 -04:00
Andrew Daniels 46e7b06991 ENH: Adds root_symbol attribute to Future class
Also update AssetFinder to handle root_symbol in meta data
2015-06-25 10:18:18 -04:00
Andrew Daniels 1ae6037a81 TST: Correct and clean up mock futures data for clarity 2015-06-25 10:18:18 -04:00
jfkirk 9291a89599 BUG: Prevents payout of dividend on final trading close 2015-06-24 21:45:55 -04:00
Scott Sanderson a2008f644a TEST: Remove unused test class. 2015-06-24 16:11:14 -04:00