Commit Graph

30 Commits

Author SHA1 Message Date
Scott Sanderson b0a93d57a0 DOC: Clarify expect_bounded docstring. 2016-09-02 13:33:55 -04:00
Scott Sanderson 8b2446aec6 ENH: Dont allow length=1 regressions/correlations.
They're not meaningful, and they cause warnings from numpy.

Implemented in terms of a new preprocessor, `expect_bounded`, which
takes a tuple of `upper_bound` and `lower_bound`.
2016-09-02 12:49:09 -04:00
Scott Sanderson 115f055c83 MAINT: Clean up downsampling boilerplate.
Consolidate docs and mixin applications into one place.
2016-08-17 16:52:09 -04:00
Joe Jevnik bfa3e6f153 TST: 32b compat doctests 2016-06-21 15:07:03 -04:00
Joe Jevnik efd7bf72c3 TST: py3 compat doctests 2016-06-21 15:07:03 -04:00
Joe Jevnik 5925107052 TST: fix doctests to actually run 2016-06-21 15:07:03 -04:00
Scott Sanderson 3395b33f1e BUG: Fix multiple bugs in PanelDailyBarReader.
- Return a value from `verify_all_indices_unique` so that `panel` isn't
  unconditionally `None` in `PanelDailyBarReader`.

- Fix a bug where we always set the volume of every asset to `1e9`.

- Add minimal suite of tests for get_spot_value, which catch both of the
  above.

NOTE: There are still several issues with `PanelDailyBarReader`.  The
docstring for `get_spot_value` claims that it will return -1 on days
where an asset didn't trade, which isn't the case.  It also claims that
it will raise `NoDataOnDate` when a request is made outside the panel
range, but it just raises a KeyError.  We also still have no coverage
for `load_raw_arrays`, so it's likely that there are more bugs lurking.
2016-05-06 10:59:14 -04:00
Jean Bredeche a068eb374a Merge pull request #1182 from quantopian/no-more-dups
DEV: Ensure there are no duplicates in the data passed into TradingAlgorithm.run
2016-05-06 09:55:23 -04:00
Scott Sanderson bd0f138081 TEST/MAINT: Refactor unique axis verification.
Break it into a standalone function that handles any pandas type.
2016-05-05 14:20:47 -04:00
Scott Sanderson 5f190395ad ENH: Add support for strings in Pipeline.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
  LabelArray is conceptually similar to pandas.Categorical, in that it
  stores data with many duplicate values as indices into an array of
  unique values.  For string data with many duplicates (e.g. time-series
  of tickers or or industry classifications), this provides multiple
  orders of magnitude of improvement when doing string operations,
  especially string comparison/matching operations.

- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
  and a corresponding ObjectOverwrite adjustment.

- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
  This method is called on the final result of any pipeline expression
  after screen filtering has occurred. The default implementation of
  ``postprocess`` is identity, but Classifier overrides it to coerce
  string columns into pandas.Categoricals before presenting them to the
  user.
2016-05-04 15:50:52 -04:00
Joe Jevnik 59c8e371a2 ENH: Updates the cli, data bundles and extensions.
Adds the data bundle concept which makes it easy for users to register
loading functions to build out minute and daily data along with an
assets db and adjustments db. By default we have provided a `quandl`
bundle which pulls from the public domain WIKI dataset. Users may
register new bundles by decorating an ingest function with
`zipline.data.bundles.register(<name>)`. This also provides a
`yahoo_equities` function for creating an ingestion function that will
load a static set of assets from yahoo.

The cli is now structured as a couple of subcommands and has been
changed to `python -m zipline`. The old behavior of `run_algo.py` has
been moved to the `run` subcommand. This is almost entirely the same
except that it now takes the name of the data bundle to use, defaulting
to `quandl`.

The next subcommand is `ingest` which takes the name of
a data bundle to ingest. This will run the loading machinery and write
the data to a specified location that `run` can find.

There is also a `clean` subcommand which deletes the data that was
written with `ingest`.

Extensions have also been added to zipline. This is an experimental
feature where users can provide an extra set of python files to run at
the start of the process. These can be used to configure aspects of
zipline. Right now the only thing that is supported in an extension file
is the registration of a new data bundle.
2016-05-03 18:38:24 -04:00
Andrew Liang 5809ae17f1 DEV: Better error message for sid= in get_open_orders
Let the user to know to use asset= instead
2016-04-26 12:23:57 -04:00
Joe Jevnik bc0b117dc9 MAINT: make the data loading apis more consistent.
Changes BcolzDailyBarWriter to not be an abc, data is passed as an
iterator of (sid, dataframe) pairs to the write method.

Changes the AssetsDBWriter to be a single class which accepts an engine
at construction time and has a `write` method for writing dataframes for
the various tables. We no longer support writing the various other data
types, callers should coerce their data into a dataframe themselves. See
zipline.assets.synthetic for some helpers to do this.

Adds many new fixtures and updates some existing fixtures to use the new
ones:

WithDefaultDateBounds
  A fixture that provides the suite a START_DATE and END_DATE. This is
  meant to make it easy for other fixtures to synchronize their date
  ranges without depending on eachother in strange ways. For example,
  WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should
  both have data for the same dates, so they may use depend on
  WithDefaultDates without forcing a dependency between them.

WithTmpDir, WithInstanceTmpDir
  Provides the suite or individual test case a temporary directory.

WithBcolzDailyBarReader
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzDailyBarWriter.write

WithBcolzDailyBarReaderFromCSVs
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from a
  collection of CSV files and then converted into the bcolz data through
  BcolzDailyBarWriter.write_csvs

WithBcolzMinuteBarReader
  Provides the suite a BcolzMinuteBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzMinuteBarWriter.write

WithAdjustmentReader
  Provides the suite a SQLiteAdjustmentReader which reads from an in
  memory sqlite database. The data will be read from dataframes and then
  converted into sqlite with SQLiteAdjustmentWriter.write

WithDataPortal
  Provides each test case a DataPortal object with data from temporary
  resources.
2016-04-15 23:46:10 -04:00
Scott Sanderson 1f237d43a3 MAINT: Make preprocessor factories closures. 2016-03-25 15:11:18 -04:00
Scott Sanderson 1245552340 DEV: Add expect_dimensions preprocessor. 2016-03-25 15:11:18 -04:00
Scott Sanderson b85eb36da8 TEST: Add test for demean example. 2016-03-25 15:11:18 -04:00
Joe Jevnik a3dbf7590e TST: doctest failure 2016-01-13 16:36:20 -05:00
Joe Jevnik 6280614a69 DOC: whatsnew 2016-01-13 16:36:20 -05:00
Joe Jevnik 5351b60a4c ENH: adds optionally for preprocessors 2016-01-13 15:26:37 -05:00
Joe Jevnik 5a235bdaef ENH: allows users to specify the cutoff time for data query in blaze
loaders

This allows people to set their cutoff time to the time they will
actually execute 'before_trading_start'. Currently this is just passed
to the constructor of the loader; however, I would like to make this
managed by the algorithm simulation runner. This would help keep all of
the loaders in sync and lock 'before_trading_start's execution to the
time the data is queried for.
2016-01-13 15:26:13 -05:00
Scott Sanderson b6175de5f1 DOC/TEST: Add doctest and docs for coerce kwargs. 2016-01-12 17:51:13 -05:00
Scott Sanderson 43b6344d5f ENH: Add `coerce` preprocessor. 2016-01-12 17:36:36 -05:00
Scott Sanderson 4469fbef76 MAINT: Just the value if dtype doesn't have a name. 2015-12-10 17:34:38 -05:00
llllllllll 48536add73 TST: fix doctests 2015-12-09 11:22:13 -05:00
Scott Sanderson 8220d1ee86 ENH: Adds support for different typed adjusted arrays and adds an
EarningsCalendar loader.

- Moves most of AdjustedArray back into Python. The window iterator is
  the only part that's performance-intensive.

- Adds a bootleg templating system for creating specialized versions of
  AdjustedArrayWindow for each concrete type we care about.

- Adds support for differently dtyped terms in pipeline. This allows us
  to use datetime64s which are needed in the EarningsCalendar.

- Adds EarningsCalendar dataset for the next and previous earnings
  announcements in pipeline.

- Adds in memory loader for EarningsCalendar.

- Adds blaze loader for EarningsCalendar.
2015-12-08 20:24:06 -05:00
llllllllll 4238391f6f DOC: docstring cleanup 2015-10-19 16:35:03 -04:00
llllllllll 3fb91e4d39 MAINT: cleanup doctests 2015-10-19 16:35:03 -04:00
llllllllll e9ec709453 MAINT: expect_value doctest and lambda over toolz 2015-10-19 16:35:03 -04:00
llllllllll 22ffe3fe49 ENH: adds expect_element preprocessor 2015-10-19 16:35:03 -04:00
Stewart Douglas 4e2039c9b0 ENH: Coerce user input with API method decorator
Previously we have capitalized input strings at different levels in
our code: in the user-facing API methods and in the asset finder.
This commit moves input string capitalization exclusively to the API
method to which the string was supplied. Specifically, the string is
capitalized by a preprocess API method decorator. The preprocess
decorator passes the input string to the newly defined
ensure_upper_case() method, which returns a TypeError if the argument
supplied is not a string.

ensure_upper_case() is defined in a new file, zipline/utils/input_validation.py.
The existing expect_types() method is also moved there.

Various tests in tests/test_assets.py are modified to account for the
fact that the asset finder method lookup_symol() no longer capitalizes
its supplied argument.
2015-10-08 15:41:33 -04:00