Commit Graph

966 Commits

Author SHA1 Message Date
Joe Jevnik d888c4faaa DOC: update docs for api functions 2016-05-06 15:25:30 -04:00
Joe Jevnik 0562179060 Merge pull request #1178 from quantopian/quantopian-quandl
ENH: Adds quantopian-quandl bundle as new default.
2016-05-06 12:53:07 -04:00
Scott Sanderson f618cc94a2 Merge pull request #1187 from quantopian/test-daily-bar-reader-a-bit
BUG: Fix multiple bugs in PanelDailyBarReader.
2016-05-06 11:42:59 -04:00
Scott Sanderson 3395b33f1e BUG: Fix multiple bugs in PanelDailyBarReader.
- Return a value from `verify_all_indices_unique` so that `panel` isn't
  unconditionally `None` in `PanelDailyBarReader`.

- Fix a bug where we always set the volume of every asset to `1e9`.

- Add minimal suite of tests for get_spot_value, which catch both of the
  above.

NOTE: There are still several issues with `PanelDailyBarReader`.  The
docstring for `get_spot_value` claims that it will return -1 on days
where an asset didn't trade, which isn't the case.  It also claims that
it will raise `NoDataOnDate` when a request is made outside the panel
range, but it just raises a KeyError.  We also still have no coverage
for `load_raw_arrays`, so it's likely that there are more bugs lurking.
2016-05-06 10:59:14 -04:00
Andrew Liang 7641247b41 BUG: DAY_END action not emitted during minute emission
Refactor AlgorithmSimulator so that DAY_END is emitted for both
minute and daily emission, and that handling of end-of-minute
and end-of-day are separated
2016-05-06 10:25:44 -04:00
Jean Bredeche a068eb374a Merge pull request #1182 from quantopian/no-more-dups
DEV: Ensure there are no duplicates in the data passed into TradingAlgorithm.run
2016-05-06 09:55:23 -04:00
Joe Jevnik d819721d96 ENH: use more human readable format for bundle ingest directories
We are now using isoformats with ':' replaced with ';'. We cannot use a
normal isoformat because windows does not allow files or directories
with ':' in the name.
2016-05-05 18:22:13 -04:00
Joe Jevnik 89542e33bd ENH: Adds quantopian-quandl bundle as new default.
This data bundle will use the quantopian mirror of the quandl WIKI data
instead of downloading from quandl directly. This dramatically improves
the speed because we do not pay the rate limiting for quandl and we can
send the data in the format zipline expects.
2016-05-05 18:22:13 -04:00
Scott Sanderson bd0f138081 TEST/MAINT: Refactor unique axis verification.
Break it into a standalone function that handles any pandas type.
2016-05-05 14:20:47 -04:00
Jean Bredeche 3f1b0f79f2 DEV: Ensure there are no duplicates in the data passed into TradingAlgorithm.run 2016-05-05 11:54:39 -04:00
Scott Sanderson e0aeda4c3e BUG: Fix bytes/unicode issues in py3. 2016-05-05 01:46:35 -04:00
Scott Sanderson a29da32252 TEST: Don't assert particular numpy error.
They change from version to version.
2016-05-04 19:40:50 -04:00
Scott Sanderson b78501e54a BUG: Fix broken isnull() on string classifiers.
Adds a special case in NullFilter to handle LabelArrays correctly.
2016-05-04 17:26:27 -04:00
Scott Sanderson 5a1ed7b1d3 ENH: Make element_of work for ints too. 2016-05-04 16:31:58 -04:00
Scott Sanderson 0922714bac DOC: Clarify test docstrings. 2016-05-04 15:54:51 -04:00
Scott Sanderson 4d42cddae4 ENH: Fail fast on outputs in CustomClassifier.
We don't support multiple outputs for CustomClassifier because we use
LabelArrays for string classifiers.
2016-05-04 15:54:50 -04:00
Scott Sanderson 620d7648b0 BUG: Tests/bugfixes for LabelArray slicing.
- Fixes a bug where __setitem__ was not called when setting with a slice
  on Python 2 (__setslice__ was called instead), which caused strange
  behavior when setting an empty string.  This is fixed by overriding
  __setslice__ and forwarding to __setitem__.

- Fixes a bug where __getitem__ returned an instance of np.void when
  returning a scalar.  We now correctly return an entry from our
  categoricals.
2016-05-04 15:54:50 -04:00
Scott Sanderson 8de45540f2 ENH: NaN semantics for LabelArray missing values. 2016-05-04 15:54:50 -04:00
Scott Sanderson 2395cbb671 ENH: Use np.void for labelarray storage.
This disables most broken ufuncs
2016-05-04 15:54:50 -04:00
Scott Sanderson 7a65121e6e BUG: contains was renamed to has_substring 2016-05-04 15:54:50 -04:00
Scott Sanderson c40bbfae03 TEST: More tests for string predicates. 2016-05-04 15:54:50 -04:00
Scott Sanderson bb6f908036 TEST: Add test for categorical postprocessing. 2016-05-04 15:54:50 -04:00
Scott Sanderson 5f190395ad ENH: Add support for strings in Pipeline.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
  LabelArray is conceptually similar to pandas.Categorical, in that it
  stores data with many duplicate values as indices into an array of
  unique values.  For string data with many duplicates (e.g. time-series
  of tickers or or industry classifications), this provides multiple
  orders of magnitude of improvement when doing string operations,
  especially string comparison/matching operations.

- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
  and a corresponding ObjectOverwrite adjustment.

- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
  This method is called on the final result of any pipeline expression
  after screen filtering has occurred. The default implementation of
  ``postprocess`` is identity, but Classifier overrides it to coerce
  string columns into pandas.Categoricals before presenting them to the
  user.
2016-05-04 15:50:52 -04:00
Joe Jevnik f3e436a1bf Merge pull request #1173 from quantopian/quandl-wiki-loader
Quandl wiki loader
2016-05-03 19:11:18 -04:00
Joe Jevnik 59c8e371a2 ENH: Updates the cli, data bundles and extensions.
Adds the data bundle concept which makes it easy for users to register
loading functions to build out minute and daily data along with an
assets db and adjustments db. By default we have provided a `quandl`
bundle which pulls from the public domain WIKI dataset. Users may
register new bundles by decorating an ingest function with
`zipline.data.bundles.register(<name>)`. This also provides a
`yahoo_equities` function for creating an ingestion function that will
load a static set of assets from yahoo.

The cli is now structured as a couple of subcommands and has been
changed to `python -m zipline`. The old behavior of `run_algo.py` has
been moved to the `run` subcommand. This is almost entirely the same
except that it now takes the name of the data bundle to use, defaulting
to `quandl`.

The next subcommand is `ingest` which takes the name of
a data bundle to ingest. This will run the loading machinery and write
the data to a specified location that `run` can find.

There is also a `clean` subcommand which deletes the data that was
written with `ingest`.

Extensions have also been added to zipline. This is an experimental
feature where users can provide an extra set of python files to run at
the start of the process. These can be used to configure aspects of
zipline. Right now the only thing that is supported in an extension file
is the registration of a new data bundle.
2016-05-03 18:38:24 -04:00
Andrew Liang fb6bda5840 FIX: Error message for BenchmarkAssetNotAvailableTooLate is wrong
Should be '...does not exist on self.trading_days[-1]...' not
self.trading_days[0]
2016-05-02 12:00:35 -04:00
Joe Jevnik efac476976 ENH: make BcolzMinuteBarWriter.write take iterable
Updates the BcolzMinuteBarWriter.write api to allow users to pass their
data as a stream instead of requiring that they loop over their data
externally. This matches the API presented by BcolzDailyBarWriter.
2016-04-29 16:14:48 -04:00
Andrew Liang e73ce0bf2b Merge pull request #1168 from quantopian/fix_crashing_benchmark
FIX: Crashing on calculating benchmarking when no trading days
2016-04-29 14:59:49 -04:00
Andrew Liang 7332586abe FIX: Crashing on calculating benchmarking when no trading days
When we run a simulation that starts and ends on the same weekend,
return an empty series for the benchmark so as to not crash
2016-04-29 14:30:46 -04:00
Maya Tydykov 11d666daaa TST: add test for 13d filings dataset
MAINT: add 13d filings to factors init

MAINT: rename constant

MAINT: add event_date_col field
2016-04-28 11:59:49 -04:00
Maya Tydykov e726cc94c9 ENH: add 13d filings dataset to pipeline 2016-04-28 11:53:45 -04:00
Andrew Liang d69b960c49 BUG: Don't save empty positions when user access non-existent position
Previously, whenever we try to access a missing value on the Positions
dict, we return a default Position and save it to the dict. Instead,
just return the Position
2016-04-26 13:28:35 -04:00
Andrew Liang 5809ae17f1 DEV: Better error message for sid= in get_open_orders
Let the user to know to use asset= instead
2016-04-26 12:23:57 -04:00
Jean Bredeche c404c60d68 BUG: don't allow ordering in before_trading_start 2016-04-26 10:56:36 -04:00
Maya Tydykov b7765fe0d3 Merge pull request #1153 from quantopian/filter-nulls-in-expected-cols
Filter nulls in expected cols
2016-04-25 16:32:45 -04:00
Maya Tydykov 0191d9d903 MAINT: move filtering for null date rows back to dataframe
TST: test both next and prev event frame loading and use EventsLoader.

BUG: remove extra arg

MAINT: call list on zip for compatibility with python 3
2016-04-25 16:11:12 -04:00
Maya Tydykov 390295481c TST: add test for blaze loader with null data in date col
MAINT: fix blaze query
2016-04-25 11:42:10 -04:00
Maya Tydykov e41c99d077 MAINT: add an event date col field to each loader
MAINT: add event date col field and filter rows where this field is null

TST: modify tests to filter nulls in event date col

MAINT: calculate value repeats by vectorized computation on separate start and end dates.

MAINT: pass DatetimeIndex instead of list of strings
2016-04-25 11:42:08 -04:00
Maya Tydykov f8aa7c2ef4 TST: add test for case when null in expected column 2016-04-25 11:42:06 -04:00
Jean Bredeche 02ded435f6 DEV: Don't log an error if we can't find a matching asset/field/day triple in fetcher data 2016-04-25 09:47:18 -04:00
Eddie Hebert 66d05aa563 PERF: Improve read time for smaller num of assets.
The BcolzDailyBarReader was optimized for the pipeline case of reading
all assets at once.

Now that the reader is also used to support daily history the case of
reading a data for a small number of assets is more common, particularly
in algorithms that use the history API which have a high rotation of
assets (e.g. an algorithm which pipeline uses to set the active
universe)

Remove the bottleneck in reading a small number of assets by
conditionally reading the slice for each asset from the carray, instead
of reading the data for all equities and then indexing into that full
array. On a certain number of assets, it is still better to read all the
data at once. On the Quantopian dataset, which holds data for 20000
about for the last 10 years of equity data (where not all equities trade
over the full range), stored in 118 blosc blp files per column, the
tipping point where the 'read all' mode wins out between 3000-4000
assets.

That number was tested by trying to exercise a worst case scenario where
the equities were spread out evenly across the blp files, by stepping
along a sorted list of assets that were alive over a query range which
spanned 70 trading days.
```
size = 3000
sids = [assets[i] for i in range(0, len(assets), len(assets) /
size)][:size]
```

Also, add parameter to WithBcolzDailyBarReader fixture which allows the
test to specify what the threshold count for reading all data should be,
so that the test_us_equity_pricing can be forced into either mode to
make sure that both branches in logic are covered by all test cases.

On local dev machine this patch improves the read time of `load_raw_array`
for one asset from 100 ms to 96.5 µs. (10^5 improvement.) With reading
only asset per call a being an observed common case when populating the
non-cached values in USEquityHistoryLoader.
2016-04-21 20:43:52 -04:00
Maya Tydykov e5ccd814e8 Merge pull request #1143 from quantopian/add-final-val-col-to-estimates
ENH: add actual value column to estimates dataset.
2016-04-21 16:23:55 -04:00
Jean Bredeche 9d1e15ddde BUG: Fetcher wasn't working properly in before_trading_start.
We were trying to use the previous day in before_trading_start because
we were looking for the previous market minute, then normalizing it.  That's
no longer the case, as we want to use today's date for fetcher lookups
in before_trading_start.

Also refactored a bit how dataportal determines if a query should be
routed to the fetcher data structures.
2016-04-21 15:09:14 -04:00
Jean Bredeche 6423a2cfbd Merge branch 'master' into check-keyword-args 2016-04-21 12:31:45 -04:00
Maya Tydykov bd58140b97 ENH: add actual value column to estimates dataset. 2016-04-21 11:45:00 -04:00
Jean Bredeche c323506f40 BUG: we were improperly checking iterable kwargs in BarData 2016-04-21 11:06:46 -04:00
dmichalowicz d9bfcaabde ENH: Support multiple outputs for custom factors 2016-04-21 10:57:29 -04:00
Maya Tydykov 1531568899 ENH: add custom dataset for estimize
MAINT: alphabetize constants

MAINT: remove obsolete column

TST: refactor tests to use common code

MAINT: remove unneeded fields from dataset

MAINT: remove obsolete earnings estimates columns and refactor
2016-04-19 11:29:03 -04:00
Andrew Liang 8aac0ab19f BUG: Week rule plus time rule doesn't work
The next trigger for the week rule get recalculated every time
the rule is triggered
2016-04-18 17:05:43 -04:00
Joe Jevnik bc0b117dc9 MAINT: make the data loading apis more consistent.
Changes BcolzDailyBarWriter to not be an abc, data is passed as an
iterator of (sid, dataframe) pairs to the write method.

Changes the AssetsDBWriter to be a single class which accepts an engine
at construction time and has a `write` method for writing dataframes for
the various tables. We no longer support writing the various other data
types, callers should coerce their data into a dataframe themselves. See
zipline.assets.synthetic for some helpers to do this.

Adds many new fixtures and updates some existing fixtures to use the new
ones:

WithDefaultDateBounds
  A fixture that provides the suite a START_DATE and END_DATE. This is
  meant to make it easy for other fixtures to synchronize their date
  ranges without depending on eachother in strange ways. For example,
  WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should
  both have data for the same dates, so they may use depend on
  WithDefaultDates without forcing a dependency between them.

WithTmpDir, WithInstanceTmpDir
  Provides the suite or individual test case a temporary directory.

WithBcolzDailyBarReader
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzDailyBarWriter.write

WithBcolzDailyBarReaderFromCSVs
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from a
  collection of CSV files and then converted into the bcolz data through
  BcolzDailyBarWriter.write_csvs

WithBcolzMinuteBarReader
  Provides the suite a BcolzMinuteBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzMinuteBarWriter.write

WithAdjustmentReader
  Provides the suite a SQLiteAdjustmentReader which reads from an in
  memory sqlite database. The data will be read from dataframes and then
  converted into sqlite with SQLiteAdjustmentWriter.write

WithDataPortal
  Provides each test case a DataPortal object with data from temporary
  resources.
2016-04-15 23:46:10 -04:00