Commit Graph

937 Commits

Author SHA1 Message Date
Maya Tydykov 11d666daaa TST: add test for 13d filings dataset
MAINT: add 13d filings to factors init

MAINT: rename constant

MAINT: add event_date_col field
2016-04-28 11:59:49 -04:00
Maya Tydykov e726cc94c9 ENH: add 13d filings dataset to pipeline 2016-04-28 11:53:45 -04:00
Andrew Liang d69b960c49 BUG: Don't save empty positions when user access non-existent position
Previously, whenever we try to access a missing value on the Positions
dict, we return a default Position and save it to the dict. Instead,
just return the Position
2016-04-26 13:28:35 -04:00
Andrew Liang 5809ae17f1 DEV: Better error message for sid= in get_open_orders
Let the user to know to use asset= instead
2016-04-26 12:23:57 -04:00
Jean Bredeche c404c60d68 BUG: don't allow ordering in before_trading_start 2016-04-26 10:56:36 -04:00
Maya Tydykov b7765fe0d3 Merge pull request #1153 from quantopian/filter-nulls-in-expected-cols
Filter nulls in expected cols
2016-04-25 16:32:45 -04:00
Maya Tydykov 0191d9d903 MAINT: move filtering for null date rows back to dataframe
TST: test both next and prev event frame loading and use EventsLoader.

BUG: remove extra arg

MAINT: call list on zip for compatibility with python 3
2016-04-25 16:11:12 -04:00
Maya Tydykov 390295481c TST: add test for blaze loader with null data in date col
MAINT: fix blaze query
2016-04-25 11:42:10 -04:00
Maya Tydykov e41c99d077 MAINT: add an event date col field to each loader
MAINT: add event date col field and filter rows where this field is null

TST: modify tests to filter nulls in event date col

MAINT: calculate value repeats by vectorized computation on separate start and end dates.

MAINT: pass DatetimeIndex instead of list of strings
2016-04-25 11:42:08 -04:00
Maya Tydykov f8aa7c2ef4 TST: add test for case when null in expected column 2016-04-25 11:42:06 -04:00
Jean Bredeche 02ded435f6 DEV: Don't log an error if we can't find a matching asset/field/day triple in fetcher data 2016-04-25 09:47:18 -04:00
Eddie Hebert 66d05aa563 PERF: Improve read time for smaller num of assets.
The BcolzDailyBarReader was optimized for the pipeline case of reading
all assets at once.

Now that the reader is also used to support daily history the case of
reading a data for a small number of assets is more common, particularly
in algorithms that use the history API which have a high rotation of
assets (e.g. an algorithm which pipeline uses to set the active
universe)

Remove the bottleneck in reading a small number of assets by
conditionally reading the slice for each asset from the carray, instead
of reading the data for all equities and then indexing into that full
array. On a certain number of assets, it is still better to read all the
data at once. On the Quantopian dataset, which holds data for 20000
about for the last 10 years of equity data (where not all equities trade
over the full range), stored in 118 blosc blp files per column, the
tipping point where the 'read all' mode wins out between 3000-4000
assets.

That number was tested by trying to exercise a worst case scenario where
the equities were spread out evenly across the blp files, by stepping
along a sorted list of assets that were alive over a query range which
spanned 70 trading days.
```
size = 3000
sids = [assets[i] for i in range(0, len(assets), len(assets) /
size)][:size]
```

Also, add parameter to WithBcolzDailyBarReader fixture which allows the
test to specify what the threshold count for reading all data should be,
so that the test_us_equity_pricing can be forced into either mode to
make sure that both branches in logic are covered by all test cases.

On local dev machine this patch improves the read time of `load_raw_array`
for one asset from 100 ms to 96.5 µs. (10^5 improvement.) With reading
only asset per call a being an observed common case when populating the
non-cached values in USEquityHistoryLoader.
2016-04-21 20:43:52 -04:00
Maya Tydykov e5ccd814e8 Merge pull request #1143 from quantopian/add-final-val-col-to-estimates
ENH: add actual value column to estimates dataset.
2016-04-21 16:23:55 -04:00
Jean Bredeche 9d1e15ddde BUG: Fetcher wasn't working properly in before_trading_start.
We were trying to use the previous day in before_trading_start because
we were looking for the previous market minute, then normalizing it.  That's
no longer the case, as we want to use today's date for fetcher lookups
in before_trading_start.

Also refactored a bit how dataportal determines if a query should be
routed to the fetcher data structures.
2016-04-21 15:09:14 -04:00
Jean Bredeche 6423a2cfbd Merge branch 'master' into check-keyword-args 2016-04-21 12:31:45 -04:00
Maya Tydykov bd58140b97 ENH: add actual value column to estimates dataset. 2016-04-21 11:45:00 -04:00
Jean Bredeche c323506f40 BUG: we were improperly checking iterable kwargs in BarData 2016-04-21 11:06:46 -04:00
dmichalowicz d9bfcaabde ENH: Support multiple outputs for custom factors 2016-04-21 10:57:29 -04:00
Maya Tydykov 1531568899 ENH: add custom dataset for estimize
MAINT: alphabetize constants

MAINT: remove obsolete column

TST: refactor tests to use common code

MAINT: remove unneeded fields from dataset

MAINT: remove obsolete earnings estimates columns and refactor
2016-04-19 11:29:03 -04:00
Andrew Liang 8aac0ab19f BUG: Week rule plus time rule doesn't work
The next trigger for the week rule get recalculated every time
the rule is triggered
2016-04-18 17:05:43 -04:00
Joe Jevnik bc0b117dc9 MAINT: make the data loading apis more consistent.
Changes BcolzDailyBarWriter to not be an abc, data is passed as an
iterator of (sid, dataframe) pairs to the write method.

Changes the AssetsDBWriter to be a single class which accepts an engine
at construction time and has a `write` method for writing dataframes for
the various tables. We no longer support writing the various other data
types, callers should coerce their data into a dataframe themselves. See
zipline.assets.synthetic for some helpers to do this.

Adds many new fixtures and updates some existing fixtures to use the new
ones:

WithDefaultDateBounds
  A fixture that provides the suite a START_DATE and END_DATE. This is
  meant to make it easy for other fixtures to synchronize their date
  ranges without depending on eachother in strange ways. For example,
  WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should
  both have data for the same dates, so they may use depend on
  WithDefaultDates without forcing a dependency between them.

WithTmpDir, WithInstanceTmpDir
  Provides the suite or individual test case a temporary directory.

WithBcolzDailyBarReader
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzDailyBarWriter.write

WithBcolzDailyBarReaderFromCSVs
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from a
  collection of CSV files and then converted into the bcolz data through
  BcolzDailyBarWriter.write_csvs

WithBcolzMinuteBarReader
  Provides the suite a BcolzMinuteBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzMinuteBarWriter.write

WithAdjustmentReader
  Provides the suite a SQLiteAdjustmentReader which reads from an in
  memory sqlite database. The data will be read from dataframes and then
  converted into sqlite with SQLiteAdjustmentWriter.write

WithDataPortal
  Provides each test case a DataPortal object with data from temporary
  resources.
2016-04-15 23:46:10 -04:00
Eddie Hebert 5f9d0a148d BUG: Prevent out of order history arrays.
Fix a bug where if history were called with assets `[1, 2]` and then
subsequently, `[2, 1]`, the loader would return the cached array in
order for `[1, 2]`.

Instead cache an AdjustedArray for each asset, then when a history
window is requested, check if each asset has a sufficient cache, and if
not then read values for the assets which are missing or need to be
refreshed.

An added benefit of this change is that if a subsequent call to history
has a smaller number of assets than the previous, no new data needs to
be read from disk. e.g. a call with assets `[1, 2, 3]` and then `[1, 2]`
would use the cached values for `1` and `2` from the first call.

Conversely, if the second call has more assets, then only the data for
the new assets needs to be retrieved. e.g. a history with `[1, 2]`, then
`[1, 2, 3]` would only need (assuming `1` and `2` have not expired) to
retrieve data for `3`. Unfortunately, the benefit here is not great
because `load_raw_arrays` is optimized for reading many assets, and
pulls the entire daily bar dataset into memory. This change makes tuning
`load_raw_arrays` so that faster reads (e.g. by slicing from the carray
for each asset, instead of pulling all data into a numpy array), when
only a few assets are requested, more beneficial than it would have been
previously.
2016-04-15 22:44:00 -04:00
Andrew Liang 6d6cd58c3b BUG: Recalculate trigger for week rule if we miss the first one
If we start the simulation on a day so that we miss the trigger
(the first for the sim) for that week, recalculate the trigger
for next week
2016-04-15 15:09:08 -04:00
Andrew Liang 1ee3c5f049 BUG: week_end rule with offset=0 skips every other week 2016-04-15 10:17:18 -04:00
Eddie Hebert 76e14eda2f ENH: Add expiring cache.
Add a cache interface which supports expirable entries with a changeable
backend for the cache into which they are entered.

The default cache is a `dict` but could swapped for
`cachetools.LRUCache` or any other cache which supports `__get__`,
`__set__`, and `__del__`.

So that consumers can change the use of `CachedObjects` stored in a
cache from:

```
self._cache = {}

...

try:
    obj = self._cache[key]
    try:
        return obj.unwrap(dt)
    except Expired:
        pass
except KeyError:
    pass

...

self._cache[key] = CachedObject(value, new_expiration)
```

to:

```
self._cache = ExpiringCache(LRUCache(maxsize=6))

...

try:
    return self._cache.get(key, dt)
except KeyError:
    # Get fresh value
    ...

    self._cache.set(key, value, new_expiration)
```
2016-04-14 16:10:32 -04:00
Jean Bredeche 63bd7589b7 BUG: support passing an empty list to data methods.
Our type checking code was a bit too aggressive.
2016-04-14 11:11:08 -04:00
Andrew Liang 8dc3ed73ab FIX: Check types of args passed to api methods on data 2016-04-13 09:47:07 -04:00
Jean Bredeche fac5905c10 Merge pull request #1114 from quantopian/handle-data-optional
ENH: make handle_data optional
2016-04-13 09:31:41 -04:00
Richard Frank 70befd490b MAINT: Don't store data portal everywhere
Removed lots of data portal references that participated in ref cycles
and prevented deterministic cleanup of dbs.
2016-04-12 19:33:22 -04:00
Richard Frank 8b610a2ab7 TST: Cleaned up test references to adjustments db
If we don't clean them up, then windows can delete
the temp dir with the db.
2016-04-12 19:33:22 -04:00
Richard Frank 5254b273b2 PERF: Reimplemented remember_last with a weak_lru_cache
which won't leak instances whose methods have been decorated

(specifically DataPortal instances)

MAINT: Not using functools32 anymore
2016-04-12 19:33:21 -04:00
Richard Frank 32a400a9fb BUG: Fixing bitness issues on 32-bit systems
by being explicit with sizes
2016-04-12 17:07:50 -04:00
Eddie Hebert 8313c8c36c Merge pull request #1125 from quantopian/enforce-sorted-on-minute-bars
BUG: Enforce sorted order on minutes to delete.
2016-04-12 15:18:30 -04:00
Eddie Hebert d27f85e16b BUG: Enforce sorted order on minutes to delete.
The intervals are returned as a set, so order is not guaranteed,
which becomes exposed when reading windows which span multiple years.

The deletion of values from the regular sized minute array assumes that
intervals can be reversed to delete the array from the back.
2016-04-12 14:16:10 -04:00
Jean Bredeche bd5e2b183d BUG: Properly log partially filled sell orders. 2016-04-12 13:57:50 -04:00
Jean Bredeche f6902f0368 BUG: bar_data.history too limiting on iterable types
In before_trading_start, history needs to call
DataPortal.get_adjustments, and that method wasn’t correctly checking
for iterables.
2016-04-11 14:02:27 -04:00
Scott Sanderson 4449f289c2 TEST: Test that the mask is what we expect. 2016-04-07 17:29:47 -04:00
dmichalowicz 8db59b387b TST: Overhaul test case 2016-04-07 17:29:47 -04:00
dmichalowicz 5bae74adda ENH: Allow passing a mask when creating a factor 2016-04-07 17:29:47 -04:00
Eddie Hebert 0a3a2f3653 BUG: Ensure matched input length to minute writer.
When the dts and length of cols are mismatched the writer behaves in
unintended ways. e.g. in a case where a consumer passed dts which had
minutes with no trades removed, but regular (market minute for day)
sized arrays for the data with `0`'s on minutes without trades, the non
trade minutes from cols are written to slots in the output where a trade
is intended.

Protect against this misuse by checking that all lengths are equal when
using the `write_cols` method.

Make a separate `_write_cols` method for use by both `write_cols` and
`write`, since the `write` method which takes a DataFrame has the
matched input length enforced by the DataFrame.
2016-04-07 13:53:59 -04:00
Jean Bredeche 4203c54417 ENH: make handle_data optional 2016-04-07 09:50:09 -04:00
Andrew Liang a8491879ce FIX: time_rules should trigger only at dt specified
Previously, time_rules triggered when the dt specified has passed
2016-04-05 17:51:10 -04:00
Jean Bredeche dc01c45dc4 DEV: Apply adjustments for portfolio and account in BTS
completely copied from https://github.com/quantopian/zipline/pull/1104/

All credit goes to Andrew Liang (@lianga888)
2016-04-05 11:37:34 -04:00
Eddie Hebert 16fd6681a6 ENH: Rewrite of Zipline to use lazy access pattern
More documentation to follow in release notes.

Based on lazy-mainline branch, see for more details.

Also-By: Jean Bredeche <jean@quantopian.com>
Also-By: Andrew Liang <aliang@quantopian.com>
Also-By: Abhijeet Kalyan <akalyan@quantopian.com>
2016-04-04 16:12:58 -04:00
Eddie Hebert be08a77d76 BUG: Prevent writing int max instead of nan.
np.array.astype can not be relied upon to convert nan's reliably to 0

Fix by calling nan_to_num on the float arrays before converting to
uint32.
2016-03-30 14:35:06 -04:00
Maya Tydykov e8185a1512 MAINT: reorganize - move testing mixin to fixtures
BUG: correctly create asset finder

MAINT: rename fixture

STY: fixes for flake8

STY: add space around assignment

MAINT: add var back to constructor

MAINT: remove unused import

MAINT: compare var with None directly

MAINT: fix merge errors
2016-03-29 13:15:16 -04:00
Maya Tydykov 06dd6e958d TST: recfator tests to use fixtures
MAINT: use np.array

MAINT: return cols rather than modifying attribute
2016-03-29 13:12:50 -04:00
Maya Tydykov 8a28e82d32 ENH: add dividends to pipeline
MAINT: remove record date - not needed.

MAINT: restructure dividends dataset.

MAINT: restructure dividends factors.

WIP: update dividends tests.

MAINT: correct the way to get the 'next' event frame.
2016-03-29 13:12:50 -04:00
Scott Sanderson 9a04621781 ENH: Add eq and __ne__ to Classifier. 2016-03-28 15:46:28 -04:00
Scott Sanderson 0ebb72fe0d TEST: Explicitly use int64 everywhere.
Otherwise these tests will fail on 32-bit systems.
2016-03-28 12:21:58 -04:00