Commit Graph

3730 Commits

Author SHA1 Message Date
Maya Tydykov f8aa7c2ef4 TST: add test for case when null in expected column 2016-04-25 11:42:06 -04:00
Maya Tydykov 89412616a6 MAINT: filter rows with nulls in expected columns 2016-04-22 10:38:20 -04:00
Eddie Hebert a13e336ef5 Merge pull request #1157 from quantopian/use-carray-instead-of-read-all-on-small-size
PERF: Improve read time for smaller num of assets.
2016-04-21 22:25:01 -04:00
Richard Frank baa9850337 Merge pull request #1160 from quantopian/the-garbage-man-cant
TST: What if we don't gc...
2016-04-21 20:46:21 -04:00
Eddie Hebert 66d05aa563 PERF: Improve read time for smaller num of assets.
The BcolzDailyBarReader was optimized for the pipeline case of reading
all assets at once.

Now that the reader is also used to support daily history the case of
reading a data for a small number of assets is more common, particularly
in algorithms that use the history API which have a high rotation of
assets (e.g. an algorithm which pipeline uses to set the active
universe)

Remove the bottleneck in reading a small number of assets by
conditionally reading the slice for each asset from the carray, instead
of reading the data for all equities and then indexing into that full
array. On a certain number of assets, it is still better to read all the
data at once. On the Quantopian dataset, which holds data for 20000
about for the last 10 years of equity data (where not all equities trade
over the full range), stored in 118 blosc blp files per column, the
tipping point where the 'read all' mode wins out between 3000-4000
assets.

That number was tested by trying to exercise a worst case scenario where
the equities were spread out evenly across the blp files, by stepping
along a sorted list of assets that were alive over a query range which
spanned 70 trading days.
```
size = 3000
sids = [assets[i] for i in range(0, len(assets), len(assets) /
size)][:size]
```

Also, add parameter to WithBcolzDailyBarReader fixture which allows the
test to specify what the threshold count for reading all data should be,
so that the test_us_equity_pricing can be forced into either mode to
make sure that both branches in logic are covered by all test cases.

On local dev machine this patch improves the read time of `load_raw_array`
for one asset from 100 ms to 96.5 µs. (10^5 improvement.) With reading
only asset per call a being an observed common case when populating the
non-cached values in USEquityHistoryLoader.
2016-04-21 20:43:52 -04:00
Richard Frank 8c92f2d241 TST: What if we don't gc...
Looks like we removed ref cycles elsewhere, so windows builds are
passing without this.
2016-04-21 18:41:57 -04:00
Richard Frank ef9b986b38 Merge pull request #1159 from quantopian/appveyor-conda-reunion
Appveyor conda reunion
2016-04-21 17:24:40 -04:00
Richard Frank b510703169 BLD: Using GCE to prevent the "input line is too long" error
when activating a conda env.
2016-04-21 17:00:30 -04:00
Richard Frank bffe00f931 BLD: Print the download failure reason 2016-04-21 17:00:30 -04:00
Richard Frank baf10e3148 BLD: Don't accept the download if we tried too many times
It's likely we have an incomplete file.
2016-04-21 17:00:30 -04:00
Richard Frank 9a59cd064b BLD: Download miniconda over SSL 2016-04-21 17:00:30 -04:00
Maya Tydykov e5ccd814e8 Merge pull request #1143 from quantopian/add-final-val-col-to-estimates
ENH: add actual value column to estimates dataset.
2016-04-21 16:23:55 -04:00
Jean Bredeche cb5ed8d1a8 Merge pull request #1158 from quantopian/yes-we-want-the-broker-order-id
BUG: Restoring 'broker_order_id' to Order's dict
2016-04-21 15:20:47 -04:00
Jean Bredeche 2a981dc725 BUG: Restoring 'broker_order_id' to Order's dict
More long-term fix is coming later, this restores existing downstream
behavior.
2016-04-21 15:18:42 -04:00
Jean Bredeche f06968f494 Merge pull request #1156 from quantopian/fetcher_bts
BUG: Fetcher wasn't working properly in `before_trading_start`.
2016-04-21 15:09:52 -04:00
Jean Bredeche 9d1e15ddde BUG: Fetcher wasn't working properly in before_trading_start.
We were trying to use the previous day in before_trading_start because
we were looking for the previous market minute, then normalizing it.  That's
no longer the case, as we want to use today's date for fetcher lookups
in before_trading_start.

Also refactored a bit how dataportal determines if a query should be
routed to the fetcher data structures.
2016-04-21 15:09:14 -04:00
David Michalowicz 48f9e6f1c9 Merge pull request #1148 from quantopian/migrate-whatsnew
DOC: Move latest notes to 1.0.0.txt
2016-04-21 15:07:01 -04:00
dmichalowicz 84e5c32cde DOC: Move latest notes to 1.0.0.txt 2016-04-21 12:42:22 -04:00
Jean Bredeche cb42875697 Merge pull request #1152 from quantopian/check-keyword-args
BUG: we were improperly checking iterable kwargs in BarData
2016-04-21 12:32:44 -04:00
Jean Bredeche 6423a2cfbd Merge branch 'master' into check-keyword-args 2016-04-21 12:31:45 -04:00
David Michalowicz 28ffee248f Merge pull request #1119 from quantopian/multiple-factor-outputs
Support multiple outputs for custom factors
2016-04-21 11:51:47 -04:00
Maya Tydykov bd58140b97 ENH: add actual value column to estimates dataset. 2016-04-21 11:45:00 -04:00
Jean Bredeche c323506f40 BUG: we were improperly checking iterable kwargs in BarData 2016-04-21 11:06:46 -04:00
dmichalowicz d9bfcaabde ENH: Support multiple outputs for custom factors 2016-04-21 10:57:29 -04:00
Jean Bredeche 2826226431 Merge pull request #1151 from quantopian/broker-order-id
BUG: need `broker_order_id` for downstream code
2016-04-21 09:04:13 -04:00
Jean Bredeche 898942a940 BUG: need broker_order_id for downstream code 2016-04-21 08:22:21 -04:00
Jean Bredeche 2179553034 Merge pull request #1150 from quantopian/vegas-baby
Use __slots__ in Order object to save memory
2016-04-20 21:28:13 -04:00
Jean Bredeche a1f19dca54 Use __slots__ to save memory 2016-04-20 16:59:55 -04:00
John Ricklefs 842c47b328 ENH: Make arena configurable for SimulationParameters (#1144) 2016-04-19 23:39:27 -04:00
Maya Tydykov 0e3db1f44a Merge pull request #1135 from quantopian/custom-estimates-in-pipeline
Custom estimates in pipeline
2016-04-19 15:01:27 -04:00
Maya Tydykov 1531568899 ENH: add custom dataset for estimize
MAINT: alphabetize constants

MAINT: remove obsolete column

TST: refactor tests to use common code

MAINT: remove unneeded fields from dataset

MAINT: remove obsolete earnings estimates columns and refactor
2016-04-19 11:29:03 -04:00
Jean Bredeche a5f7fc7d6d Merge pull request #1141 from quantopian/rule_interactions
BUG: Week rule plus time rule doesn't work
2016-04-18 20:01:26 -04:00
Andrew Liang 8aac0ab19f BUG: Week rule plus time rule doesn't work
The next trigger for the week rule get recalculated every time
the rule is triggered
2016-04-18 17:05:43 -04:00
Jean Bredeche 620d2a8b22 Merge pull request #1137 from quantopian/normalize-later
PERF: do work later, when needed.
2016-04-16 22:08:50 -04:00
Jean Bredeche 5d3dcc3df4 PERF: do work later, when needed. 2016-04-16 21:39:55 -04:00
Joe Jevnik bc0b117dc9 MAINT: make the data loading apis more consistent.
Changes BcolzDailyBarWriter to not be an abc, data is passed as an
iterator of (sid, dataframe) pairs to the write method.

Changes the AssetsDBWriter to be a single class which accepts an engine
at construction time and has a `write` method for writing dataframes for
the various tables. We no longer support writing the various other data
types, callers should coerce their data into a dataframe themselves. See
zipline.assets.synthetic for some helpers to do this.

Adds many new fixtures and updates some existing fixtures to use the new
ones:

WithDefaultDateBounds
  A fixture that provides the suite a START_DATE and END_DATE. This is
  meant to make it easy for other fixtures to synchronize their date
  ranges without depending on eachother in strange ways. For example,
  WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should
  both have data for the same dates, so they may use depend on
  WithDefaultDates without forcing a dependency between them.

WithTmpDir, WithInstanceTmpDir
  Provides the suite or individual test case a temporary directory.

WithBcolzDailyBarReader
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzDailyBarWriter.write

WithBcolzDailyBarReaderFromCSVs
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from a
  collection of CSV files and then converted into the bcolz data through
  BcolzDailyBarWriter.write_csvs

WithBcolzMinuteBarReader
  Provides the suite a BcolzMinuteBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzMinuteBarWriter.write

WithAdjustmentReader
  Provides the suite a SQLiteAdjustmentReader which reads from an in
  memory sqlite database. The data will be read from dataframes and then
  converted into sqlite with SQLiteAdjustmentWriter.write

WithDataPortal
  Provides each test case a DataPortal object with data from temporary
  resources.
2016-04-15 23:46:10 -04:00
Eddie Hebert 8c64cc80ec Merge pull request #1136 from quantopian/by-sid-and-equity-cache
BUG: Prevent out of order history arrays.
2016-04-15 23:04:08 -04:00
Eddie Hebert 5f9d0a148d BUG: Prevent out of order history arrays.
Fix a bug where if history were called with assets `[1, 2]` and then
subsequently, `[2, 1]`, the loader would return the cached array in
order for `[1, 2]`.

Instead cache an AdjustedArray for each asset, then when a history
window is requested, check if each asset has a sufficient cache, and if
not then read values for the assets which are missing or need to be
refreshed.

An added benefit of this change is that if a subsequent call to history
has a smaller number of assets than the previous, no new data needs to
be read from disk. e.g. a call with assets `[1, 2, 3]` and then `[1, 2]`
would use the cached values for `1` and `2` from the first call.

Conversely, if the second call has more assets, then only the data for
the new assets needs to be retrieved. e.g. a history with `[1, 2]`, then
`[1, 2, 3]` would only need (assuming `1` and `2` have not expired) to
retrieve data for `3`. Unfortunately, the benefit here is not great
because `load_raw_arrays` is optimized for reading many assets, and
pulls the entire daily bar dataset into memory. This change makes tuning
`load_raw_arrays` so that faster reads (e.g. by slicing from the carray
for each asset, instead of pulling all data into a numpy array), when
only a few assets are requested, more beneficial than it would have been
previously.
2016-04-15 22:44:00 -04:00
Andrew Liang 85a2f6fe00 Merge pull request #1134 from quantopian/week_start
BUG: Recalculate trigger for week rule if we miss the first one
2016-04-15 15:24:51 -04:00
Andrew Liang 6d6cd58c3b BUG: Recalculate trigger for week rule if we miss the first one
If we start the simulation on a day so that we miss the trigger
(the first for the sim) for that week, recalculate the trigger
for next week
2016-04-15 15:09:08 -04:00
Andrew Liang b7d9723a54 Merge pull request #1131 from quantopian/week_schedule
BUG: week_end rule with offset=0 skips every other week
2016-04-15 10:32:40 -04:00
Andrew Liang 1ee3c5f049 BUG: week_end rule with offset=0 skips every other week 2016-04-15 10:17:18 -04:00
Jean Bredeche c9c956124a Merge pull request #1133 from quantopian/fix-us-equity-loader
Limit leak in us equity loader on stock rotation
2016-04-15 09:32:06 -04:00
Eddie Hebert e1b376a49b BUG: Add limit to memory growth on sliding windows
Add a cap of 5 sliding windows (one per OHCLV column) to the history
loader's cache of sliding windos.

This prevents unbounded growth on algorithms that call history with a
highly varied list of equities.

To follow is splitting the cache up by column and by sid, so that the
loader does not re-prefetch sids which have already been read with
sufficient data; however this patch is enough to fix the issue where an
algo with high rotation can add up a megabyte per day of memory on
algorithms which rotate on a 5% dollar volume pipeline. With this cap
those algorithms have more plateaus with regard to memory consumption.

This patch requires new dependency of `cachetools` library.
2016-04-14 22:20:02 -04:00
Richard Frank ddb3113d6b BLD: Added cachetools conda recipe 2016-04-14 22:20:02 -04:00
Eddie Hebert 9e2c5d9505 Merge pull request #1130 from quantopian/expiring-cache
ENH: Add expiring cache.
2016-04-14 22:17:25 -04:00
Eddie Hebert ee26b57517 DOC: Add whatsnew entry for ExpiringCache. 2016-04-14 16:10:32 -04:00
Eddie Hebert 76e14eda2f ENH: Add expiring cache.
Add a cache interface which supports expirable entries with a changeable
backend for the cache into which they are entered.

The default cache is a `dict` but could swapped for
`cachetools.LRUCache` or any other cache which supports `__get__`,
`__set__`, and `__del__`.

So that consumers can change the use of `CachedObjects` stored in a
cache from:

```
self._cache = {}

...

try:
    obj = self._cache[key]
    try:
        return obj.unwrap(dt)
    except Expired:
        pass
except KeyError:
    pass

...

self._cache[key] = CachedObject(value, new_expiration)
```

to:

```
self._cache = ExpiringCache(LRUCache(maxsize=6))

...

try:
    return self._cache.get(key, dt)
except KeyError:
    # Get fresh value
    ...

    self._cache.set(key, value, new_expiration)
```
2016-04-14 16:10:32 -04:00
Jean Bredeche 86667685d7 Merge pull request #1129 from quantopian/empty-lists
BUG: support passing an empty list to `data` methods.
2016-04-14 11:35:42 -04:00
Jean Bredeche 63bd7589b7 BUG: support passing an empty list to data methods.
Our type checking code was a bit too aggressive.
2016-04-14 11:11:08 -04:00