39 Commits

Author SHA1 Message Date
Conner Fromknecht 99efa7a9f3 Fixed catalyst tests except example tests 2017-06-19 14:43:10 -07:00
Ana Ruelas 092951470a DOC: Fix invalid sphinx sections 2017-06-05 15:52:57 -04:00
Scott Sanderson 22df0a9cb9 MAINT/STY: Upgrade flake8 and fix new failures. 2017-05-15 11:45:04 -04:00
Joe Jevnik 0123bb8a97 ENH: prune the graph based on the initial workspace 2016-10-28 15:04:18 -04:00
Scott Sanderson a8b67d352e MAINT: Refactor in prep for downsampled terms.
- Split out extra_rows handling into an `ExecutionPlan` subclass.
  `ExecutionPlan` now requires the dates and calendar against which a
  set of terms will be computed, and now defers to a term's
  `compute_extra_rows` method when deciding how many extra rows are
  required to compute for that term. This will allow downsampled terms
  to request enough extra rows to guarantee that we can maintain consistent
  calculation dates.

  As a consequence of the above, `TermGraph` now only deals with logical
  dependencies, not with metadata surrounding extra row calculations.
  This means that TermGraph can be used to generate dependency
  visualizations in interactive contexts where we don't yet have a
  calendar or start/end dates.

- Refactored test_{filter,factor,classifier} to use check_terms instead
  of run_graph.  This makes it easier to make changes to TermGraph,
  since the testing interface is now to simply provide a dict of terms.

- Refactored BasePipelineTestCase to use fixtures to create an asset
  finder.  This fixes a potential leak of the test's asset db, which was
  not being explicitly cleaned up.

- Refactored test_technical to use BasePipelineTestCase.

- Added a new special term, `InputDates()`, which can be used to request
  date labels for inputs.  Like `AssetExists`, `InputDates` is provided
  in the initial workspace by default.

- Added a default (failing) `_compute` method to `AssetExists` which
  provides a more useful error than AttributeError.
2016-08-17 16:52:09 -04:00
Jean Bredeche e6af4e4f1b ENH: made exchange a required parameter to Asset and its subclasses
This required updating a lot of tests.
2016-08-02 23:21:39 -04:00
Scott Sanderson 49bb8264dc ENH: Finish adding groupby to rank/top/bottom.
- Added test coverage for grouped and masked top/bottom.

- Added test coverage for grouped rank on datetime factors.

- Fixed an issue where grouped rank would fail on datetime inputs
  because unary-negative isn't defined for datetimes.  We now instead
  directly invoke a function from rank.pyx that does the normalizations
  as neeeded.

- Fixed an issue where GroupedRowTransform assumed that it produced the
  same dtype as its input.  This isn't true for rank() of a
  datetime-dtype factor.  GroupedRowTransform now takes a required dtype
  parameter.

- Similarly, fixed an issue where GroupedRowTransform assumed that its
  missing_value was the same as its parent's, which isn't true for
  rank() of a datetime-dtype factor.  GroupedRowTransform now takes a
  required dtype parameter.

- Fixed an issue where Factor.demean() and Factor.zscore() weren't
  properly cached because their static_identity included a closure that
  was dynamically generated on each invocation.  They both now always
  use a function defined at module scope.
2016-07-26 02:57:35 -04:00
Joe Jevnik 958d455a7a ENH: Support default params for terms 2016-07-12 18:49:24 -04:00
dmichalowicz 393f82e81e ENH: Add single-column input/output capabilities to pipeline terms 2016-06-23 10:24:09 -04:00
Joe Jevnik 5925107052 TST: fix doctests to actually run 2016-06-21 15:07:03 -04:00
dmichalowicz 86486803b6 BUG: custom factor outputs naming collisions 2016-05-25 15:41:16 -04:00
Scott Sanderson 65de1215e0 Merge pull request #1204 from quantopian/tell-me-what-my-choices-were
Tell me what my choices were
2016-05-19 18:52:04 -04:00
dmichalowicz 1ec0bced6d ENH: Add builtin factors for correlation and regression 2016-05-18 15:11:12 -04:00
Scott Sanderson 4a513360b6 ENH: Include choices in no-output-found errormsg. 2016-05-17 17:51:24 -04:00
Joe Jevnik 784d5f4a16 Merge pull request #1199 from quantopian/boybands-factor
BollingerBands factor
2016-05-13 15:35:10 -04:00
Joe Jevnik 78db90a858 STY: flake8 2016-05-12 17:01:17 -04:00
Joe Jevnik f494d6f0d1 BUG: Fix check that pipeline argument is hashable.
Adds test coverage for the caes where it is not hashable.
2016-05-11 21:37:12 -04:00
Scott Sanderson 8b1136d9d5 ENH: Validate missing_values at term construction.
Finds bugs in several bad tests that were constructing invalid terms.
2016-05-10 19:43:56 -04:00
Scott Sanderson 4d42cddae4 ENH: Fail fast on outputs in CustomClassifier.
We don't support multiple outputs for CustomClassifier because we use
LabelArrays for string classifiers.
2016-05-04 15:54:50 -04:00
Scott Sanderson 5f190395ad ENH: Add support for strings in Pipeline.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
  LabelArray is conceptually similar to pandas.Categorical, in that it
  stores data with many duplicate values as indices into an array of
  unique values.  For string data with many duplicates (e.g. time-series
  of tickers or or industry classifications), this provides multiple
  orders of magnitude of improvement when doing string operations,
  especially string comparison/matching operations.

- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
  and a corresponding ObjectOverwrite adjustment.

- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
  This method is called on the final result of any pipeline expression
  after screen filtering has occurred. The default implementation of
  ``postprocess`` is identity, but Classifier overrides it to coerce
  string columns into pandas.Categoricals before presenting them to the
  user.
2016-05-04 15:50:52 -04:00
dmichalowicz d9bfcaabde ENH: Support multiple outputs for custom factors 2016-04-21 10:57:29 -04:00
Scott Sanderson 3c53b4944b TEST: Test not calling super()._validate. 2016-03-19 19:09:16 -04:00
Scott Sanderson 53d3b0855b ENH: Add support for Classifiers.
Classifiers are computations that represent grouping keys. They can be
used in conjuction with normalization functions like ``zscore`` or
``demean`` to perform normalizations over subsets of a dataset.

Notable changes:

- Added ``demean()`` and ``zscore()`` methods to ``Factor``.

- Added a classifier versions of ``Latest`` and ``CustomTermMixin``.
  The .latest attribute of int64 dataset columns no produces a
  classifier by default.

- Added ``Everything``, a classifier that maps all data to the same
  value.

- Added ``zipline.lib.normalize``, which implements a naive, pure-Python
  grouped normalize function.  This will likely be moved to Cython in a
  subsequent PR.
2016-03-19 17:04:28 -04:00
Scott Sanderson 535d05e714 MAINT: Remove notion of "atomic" pipeline terms.
Replace it by distinguishing between "Loadable" and "Computable".

This is useful because it's now  possible to write computable terms that
don't require  any inputs  (e.g. an `Always`  filter or  an `Everything`
classifier).
2016-03-08 13:49:45 -05:00
Scott Sanderson d889f8b08b BUG: Don't use deprecated attribute of exception. 2016-02-16 13:43:25 -05:00
Scott Sanderson 0115cdc46c MAINT: Fail fast on unsupported dtypes. 2016-02-12 21:23:47 -05:00
Scott Sanderson 09be7acaa8 TEST: Test forwarding of missing_value. 2016-02-12 21:23:47 -05:00
Scott Sanderson c105735574 DEV: Add support for specifying missing_value.
Consequently, enable support for `int`-dtyped Factors and BoundColumns.
2016-02-12 21:23:47 -05:00
Scott Sanderson a96dd70634 MAINT: Rename ConstantLoader to PrecomputedLoader. 2016-02-12 21:21:19 -05:00
Scott Sanderson 0c15f50231 TEST: Add dedicated testing dataset. 2016-02-12 21:20:18 -05:00
Scott Sanderson 28fdecc98b ENH: Make .latest return a Filter on bool columns. 2016-02-12 21:20:18 -05:00
Scott Sanderson 5f49fa22cb MAINT: Upgrade numpy and fix warnings.
Mostly fixes ambiguous calls to numpy.full, and uses explicitly-united
NaT values.
2016-02-11 18:46:39 -05:00
Joe Jevnik 68cf236944 TST: Add test case for adding columns in subclass 2015-12-29 10:12:39 -05:00
llllllllll 32baac4e4b ENH: Make datasets have subclass relationships 2015-12-22 12:25:30 -05:00
Scott Sanderson 2235a53581 ENH: Add EWMA and DollarVolume factors. 2015-12-11 22:13:27 -05:00
Scott Sanderson 8220d1ee86 ENH: Adds support for different typed adjusted arrays and adds an
EarningsCalendar loader.

- Moves most of AdjustedArray back into Python. The window iterator is
  the only part that's performance-intensive.

- Adds a bootleg templating system for creating specialized versions of
  AdjustedArrayWindow for each concrete type we care about.

- Adds support for differently dtyped terms in pipeline. This allows us
  to use datetime64s which are needed in the EarningsCalendar.

- Adds EarningsCalendar dataset for the next and previous earnings
  announcements in pipeline.

- Adds in memory loader for EarningsCalendar.

- Adds blaze loader for EarningsCalendar.
2015-12-08 20:24:06 -05:00
Richard Frank 2dabda6b76 MAINT: Reworked Term atomicity 2015-10-12 16:11:19 -04:00
Richard Frank e880fa3e34 PERF: Batch load atomic terms by dataset
Added CompositeTerm and now we dispatch more generally on atomic
2015-10-12 10:48:28 -04:00
Scott Sanderson f82a01841b MAINT: Rename ALL the things.
zipline.modelling.* -> zipline.pipeline.*
zipline.data.ffc.loaders -> zipline.pipeline.loaders
tests/modelling -> tests/pipeline
2015-10-01 18:03:53 -04:00