Commit Graph

3787 Commits

Author SHA1 Message Date
Scott Sanderson 2ceeac1237 BUG: Use compat unicode. 2016-05-04 19:58:55 -04:00
Scott Sanderson a29da32252 TEST: Don't assert particular numpy error.
They change from version to version.
2016-05-04 19:40:50 -04:00
Scott Sanderson bd49647ce0 BUG: Fix failure on pandas >= 0.17. 2016-05-04 19:38:28 -04:00
Scott Sanderson 7a4e9fd61a ENH: Make None the default for string columns. 2016-05-04 19:10:19 -04:00
Scott Sanderson b78501e54a BUG: Fix broken isnull() on string classifiers.
Adds a special case in NullFilter to handle LabelArrays correctly.
2016-05-04 17:26:27 -04:00
Scott Sanderson 317ecc8aa8 DOC: Add whatsnew. 2016-05-04 16:31:58 -04:00
Scott Sanderson 5a1ed7b1d3 ENH: Make element_of work for ints too. 2016-05-04 16:31:58 -04:00
Scott Sanderson 4357673221 MAINT: Add unicode to __all__. 2016-05-04 15:56:09 -04:00
Scott Sanderson 0922714bac DOC: Clarify test docstrings. 2016-05-04 15:54:51 -04:00
Scott Sanderson ce4378416a MAINT: Remove lazy imports of Latest.
They're no longer needed to break import cycles.
2016-05-04 15:54:51 -04:00
Scott Sanderson 17b402666c DOC: Fixup docstring. 2016-05-04 15:54:51 -04:00
Scott Sanderson 4d42cddae4 ENH: Fail fast on outputs in CustomClassifier.
We don't support multiple outputs for CustomClassifier because we use
LabelArrays for string classifiers.
2016-05-04 15:54:50 -04:00
Scott Sanderson 620d7648b0 BUG: Tests/bugfixes for LabelArray slicing.
- Fixes a bug where __setitem__ was not called when setting with a slice
  on Python 2 (__setslice__ was called instead), which caused strange
  behavior when setting an empty string.  This is fixed by overriding
  __setslice__ and forwarding to __setitem__.

- Fixes a bug where __getitem__ returned an instance of np.void when
  returning a scalar.  We now correctly return an entry from our
  categoricals.
2016-05-04 15:54:50 -04:00
Scott Sanderson 4dbc7eac56 MAINT: Remove byteswap and newbyteorder from LabelArray. 2016-05-04 15:54:50 -04:00
Scott Sanderson 8de45540f2 ENH: NaN semantics for LabelArray missing values. 2016-05-04 15:54:50 -04:00
Scott Sanderson 2395cbb671 ENH: Use np.void for labelarray storage.
This disables most broken ufuncs
2016-05-04 15:54:50 -04:00
Scott Sanderson 7a65121e6e BUG: contains was renamed to has_substring 2016-05-04 15:54:50 -04:00
Scott Sanderson 5cd7d79818 MAINT: Restore support for bytes/unicode AdjustedArrays. 2016-05-04 15:54:50 -04:00
Scott Sanderson 6b1f0caafc DOC: Clean up comment on `postprocess`. 2016-05-04 15:54:50 -04:00
Scott Sanderson 47e9b107ec DOC: Clean up docstring cruft. 2016-05-04 15:54:50 -04:00
Scott Sanderson 23324b4218 DOC: Add docstring for LabelArray. 2016-05-04 15:54:50 -04:00
Scott Sanderson 1a2ed2724b BUG: Pass correct class to super call. 2016-05-04 15:54:50 -04:00
Scott Sanderson c40bbfae03 TEST: More tests for string predicates. 2016-05-04 15:54:50 -04:00
Scott Sanderson bb6f908036 TEST: Add test for categorical postprocessing. 2016-05-04 15:54:50 -04:00
Scott Sanderson 5f190395ad ENH: Add support for strings in Pipeline.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
  LabelArray is conceptually similar to pandas.Categorical, in that it
  stores data with many duplicate values as indices into an array of
  unique values.  For string data with many duplicates (e.g. time-series
  of tickers or or industry classifications), this provides multiple
  orders of magnitude of improvement when doing string operations,
  especially string comparison/matching operations.

- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
  and a corresponding ObjectOverwrite adjustment.

- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
  This method is called on the final result of any pipeline expression
  after screen filtering has occurred. The default implementation of
  ``postprocess`` is identity, but Classifier overrides it to coerce
  string columns into pandas.Categoricals before presenting them to the
  user.
2016-05-04 15:50:52 -04:00
Eddie Hebert 8756bf2c91 Merge pull request #1177 from quantopian/limit-minute-carrays
PERF: Cap memory usage by minute bar carrays.
2016-05-04 12:47:18 -04:00
Eddie Hebert 1248dcde36 PERF: Cap memory usage by minute bar carrays.
Instead of letting the cache of carrays grow unbounded, use an LRUCache
to cap the number of equities for any given column.

Tested with the size 1000, on an algo that was using pipeline which was
using over 3000, runtimes were similar, but the memory usage was
successfully capped to around 1.2GB.

Also, tested with an algorithm which bought and hold just one equity and
no major slow down was seen when using the LRUCache vs. a dictionary.

We may want to follow this up with an extension to `carray` which is not
as memory hungry per column; e.g. by not loading repeated/similar
metadata or releasing the last read chunk after a certain amount of
time.
2016-05-04 12:08:50 -04:00
Joe Jevnik f3e436a1bf Merge pull request #1173 from quantopian/quandl-wiki-loader
Quandl wiki loader
2016-05-03 19:11:18 -04:00
Joe Jevnik 59c8e371a2 ENH: Updates the cli, data bundles and extensions.
Adds the data bundle concept which makes it easy for users to register
loading functions to build out minute and daily data along with an
assets db and adjustments db. By default we have provided a `quandl`
bundle which pulls from the public domain WIKI dataset. Users may
register new bundles by decorating an ingest function with
`zipline.data.bundles.register(<name>)`. This also provides a
`yahoo_equities` function for creating an ingestion function that will
load a static set of assets from yahoo.

The cli is now structured as a couple of subcommands and has been
changed to `python -m zipline`. The old behavior of `run_algo.py` has
been moved to the `run` subcommand. This is almost entirely the same
except that it now takes the name of the data bundle to use, defaulting
to `quandl`.

The next subcommand is `ingest` which takes the name of
a data bundle to ingest. This will run the loading machinery and write
the data to a specified location that `run` can find.

There is also a `clean` subcommand which deletes the data that was
written with `ingest`.

Extensions have also been added to zipline. This is an experimental
feature where users can provide an extra set of python files to run at
the start of the process. These can be used to configure aspects of
zipline. Right now the only thing that is supported in an extension file
is the registration of a new data bundle.
2016-05-03 18:38:24 -04:00
Andrew Liang 21bc598d85 Merge pull request #1169 from quantopian/beyond_max_day
FIX: Error message for BenchmarkAssetNotAvailableTooLate is wrong
2016-05-02 12:21:39 -04:00
Andrew Liang fb6bda5840 FIX: Error message for BenchmarkAssetNotAvailableTooLate is wrong
Should be '...does not exist on self.trading_days[-1]...' not
self.trading_days[0]
2016-05-02 12:00:35 -04:00
Joe Jevnik efac476976 ENH: make BcolzMinuteBarWriter.write take iterable
Updates the BcolzMinuteBarWriter.write api to allow users to pass their
data as a stream instead of requiring that they loop over their data
externally. This matches the API presented by BcolzDailyBarWriter.
2016-04-29 16:14:48 -04:00
Andrew Liang e73ce0bf2b Merge pull request #1168 from quantopian/fix_crashing_benchmark
FIX: Crashing on calculating benchmarking when no trading days
2016-04-29 14:59:49 -04:00
Andrew Liang bd07e824be FIX: Refactor to pass benchmark_asset to appropriate methods 2016-04-29 14:30:46 -04:00
Andrew Liang 7332586abe FIX: Crashing on calculating benchmarking when no trading days
When we run a simulation that starts and ends on the same weekend,
return an empty series for the benchmark so as to not crash
2016-04-29 14:30:46 -04:00
David Michalowicz 5d5b072112 Merge pull request #1172 from quantopian/identification-please
BUG: Don't crash on dataframes with assets in index.
2016-04-28 15:56:19 -04:00
dmichalowicz 8d1ecb508a Use string_types 2016-04-28 15:28:19 -04:00
Scott Sanderson 85ae664d8c BUG: Don't crash on dataframes with assets in index. 2016-04-28 15:19:57 -04:00
Maya Tydykov 73c3bd6955 Merge pull request #1106 from quantopian/13d_in_pipeline
13d in pipeline
2016-04-28 13:18:47 -04:00
Maya Tydykov 11d666daaa TST: add test for 13d filings dataset
MAINT: add 13d filings to factors init

MAINT: rename constant

MAINT: add event_date_col field
2016-04-28 11:59:49 -04:00
Maya Tydykov e726cc94c9 ENH: add 13d filings dataset to pipeline 2016-04-28 11:53:45 -04:00
Andrew Liang 231c3a58b1 Merge pull request #1166 from quantopian/empty_positions
BUG: Don't save empty positions when user access non-existent position
2016-04-26 16:58:11 -04:00
Andrew Liang 4ffe04e4a5 FIX: Add last_sale_date to Position init for consistency 2016-04-26 16:13:07 -04:00
Andrew Liang d69b960c49 BUG: Don't save empty positions when user access non-existent position
Previously, whenever we try to access a missing value on the Positions
dict, we return a default Position and save it to the dict. Instead,
just return the Position
2016-04-26 13:28:35 -04:00
Jean Bredeche 50f4917341 Merge pull request #1164 from quantopian/get_open_orders_error
DEV: Better error message for sid= in get_open_orders
2016-04-26 13:01:50 -04:00
Andrew Liang 5809ae17f1 DEV: Better error message for sid= in get_open_orders
Let the user to know to use asset= instead
2016-04-26 12:23:57 -04:00
Jean Bredeche 789dba8eca Merge pull request #1165 from quantopian/dont-order-in-bts
BUG: don't allow ordering in before_trading_start
2016-04-26 10:57:05 -04:00
Jean Bredeche c404c60d68 BUG: don't allow ordering in before_trading_start 2016-04-26 10:56:36 -04:00
Maya Tydykov b7765fe0d3 Merge pull request #1153 from quantopian/filter-nulls-in-expected-cols
Filter nulls in expected cols
2016-04-25 16:32:45 -04:00
Jean Bredeche ba20235d83 Merge pull request #1162 from quantopian/handle-missing-fields
DEV: Don't log an error if we can't find a matching asset/field/day triple in Fetcher
2016-04-25 16:22:17 -04:00