Pandas 0.18 doesn't like having null-ish values in categoricals. Fixing
this properly requires re-thinking the semantics for missing_value on
pipeline terms, so we're punting on that until after we've upgraded to
0.18.
Pandas 0.18 deprecated passing "null-ish" values to pd.categorical. The
expectation, instead, is that you use categorical's native support for
missing data, which means the user will always get NaN's for missing
entries of the categorical.
A follow-up to this change should probably drop support for custom
missing values entirely and to use LabelArray/categorical for integer
data.
The previous algorithm assumed that the group labels were integers. It
produced nonsense with LabelArrays (though sadly didn't crash because
numpy promotes None and void to object).
- Fixes a bug where __setitem__ was not called when setting with a slice
on Python 2 (__setslice__ was called instead), which caused strange
behavior when setting an empty string. This is fixed by overriding
__setslice__ and forwarding to __setitem__.
- Fixes a bug where __getitem__ returned an instance of np.void when
returning a scalar. We now correctly return an entry from our
categoricals.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
LabelArray is conceptually similar to pandas.Categorical, in that it
stores data with many duplicate values as indices into an array of
unique values. For string data with many duplicates (e.g. time-series
of tickers or or industry classifications), this provides multiple
orders of magnitude of improvement when doing string operations,
especially string comparison/matching operations.
- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
and a corresponding ObjectOverwrite adjustment.
- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
This method is called on the final result of any pipeline expression
after screen filtering has occurred. The default implementation of
``postprocess`` is identity, but Classifier overrides it to coerce
string columns into pandas.Categoricals before presenting them to the
user.