- Split out extra_rows handling into an `ExecutionPlan` subclass.
`ExecutionPlan` now requires the dates and calendar against which a
set of terms will be computed, and now defers to a term's
`compute_extra_rows` method when deciding how many extra rows are
required to compute for that term. This will allow downsampled terms
to request enough extra rows to guarantee that we can maintain consistent
calculation dates.
As a consequence of the above, `TermGraph` now only deals with logical
dependencies, not with metadata surrounding extra row calculations.
This means that TermGraph can be used to generate dependency
visualizations in interactive contexts where we don't yet have a
calendar or start/end dates.
- Refactored test_{filter,factor,classifier} to use check_terms instead
of run_graph. This makes it easier to make changes to TermGraph,
since the testing interface is now to simply provide a dict of terms.
- Refactored BasePipelineTestCase to use fixtures to create an asset
finder. This fixes a potential leak of the test's asset db, which was
not being explicitly cleaned up.
- Refactored test_technical to use BasePipelineTestCase.
- Added a new special term, `InputDates()`, which can be used to request
date labels for inputs. Like `AssetExists`, `InputDates` is provided
in the initial workspace by default.
- Added a default (failing) `_compute` method to `AssetExists` which
provides a more useful error than AttributeError.
- Added test coverage for grouped and masked top/bottom.
- Added test coverage for grouped rank on datetime factors.
- Fixed an issue where grouped rank would fail on datetime inputs
because unary-negative isn't defined for datetimes. We now instead
directly invoke a function from rank.pyx that does the normalizations
as neeeded.
- Fixed an issue where GroupedRowTransform assumed that it produced the
same dtype as its input. This isn't true for rank() of a
datetime-dtype factor. GroupedRowTransform now takes a required dtype
parameter.
- Similarly, fixed an issue where GroupedRowTransform assumed that its
missing_value was the same as its parent's, which isn't true for
rank() of a datetime-dtype factor. GroupedRowTransform now takes a
required dtype parameter.
- Fixed an issue where Factor.demean() and Factor.zscore() weren't
properly cached because their static_identity included a closure that
was dynamically generated on each invocation. They both now always
use a function defined at module scope.
The previous algorithm assumed that the group labels were integers. It
produced nonsense with LabelArrays (though sadly didn't crash because
numpy promotes None and void to object).
Classifiers are computations that represent grouping keys. They can be
used in conjuction with normalization functions like ``zscore`` or
``demean`` to perform normalizations over subsets of a dataset.
Notable changes:
- Added ``demean()`` and ``zscore()`` methods to ``Factor``.
- Added a classifier versions of ``Latest`` and ``CustomTermMixin``.
The .latest attribute of int64 dataset columns no produces a
classifier by default.
- Added ``Everything``, a classifier that maps all data to the same
value.
- Added ``zipline.lib.normalize``, which implements a naive, pure-Python
grouped normalize function. This will likely be moved to Cython in a
subsequent PR.
Renames zipline.utils.test_utils to zipline.testing
Adds zipline.testing.fixtures.ZiplineTestCase to manage setup and
teardown and adds mixins to define fixtures like an asset finder or
trading calendar.
EarningsCalendar loader.
- Moves most of AdjustedArray back into Python. The window iterator is
the only part that's performance-intensive.
- Adds a bootleg templating system for creating specialized versions of
AdjustedArrayWindow for each concrete type we care about.
- Adds support for differently dtyped terms in pipeline. This allows us
to use datetime64s which are needed in the EarningsCalendar.
- Adds EarningsCalendar dataset for the next and previous earnings
announcements in pipeline.
- Adds in memory loader for EarningsCalendar.
- Adds blaze loader for EarningsCalendar.