Commit Graph

13 Commits

Author SHA1 Message Date
Joe Jevnik 5925107052 TST: fix doctests to actually run 2016-06-21 15:07:03 -04:00
Scott Sanderson f7e9281b14 BUG: Fix groupby with string columns.
The previous algorithm assumed that the group labels were integers. It
produced nonsense with LabelArrays (though sadly didn't crash because
numpy promotes None and void to object).
2016-05-10 16:57:59 -04:00
Scott Sanderson 9fd8ec180d BUG: View with specific int dtype.
Just viewing as int is broken on win32.
2016-05-05 02:13:14 -04:00
Scott Sanderson 2ceeac1237 BUG: Use compat unicode. 2016-05-04 19:58:55 -04:00
Scott Sanderson bd49647ce0 BUG: Fix failure on pandas >= 0.17. 2016-05-04 19:38:28 -04:00
Scott Sanderson 620d7648b0 BUG: Tests/bugfixes for LabelArray slicing.
- Fixes a bug where __setitem__ was not called when setting with a slice
  on Python 2 (__setslice__ was called instead), which caused strange
  behavior when setting an empty string.  This is fixed by overriding
  __setslice__ and forwarding to __setitem__.

- Fixes a bug where __getitem__ returned an instance of np.void when
  returning a scalar.  We now correctly return an entry from our
  categoricals.
2016-05-04 15:54:50 -04:00
Scott Sanderson 4dbc7eac56 MAINT: Remove byteswap and newbyteorder from LabelArray. 2016-05-04 15:54:50 -04:00
Scott Sanderson 8de45540f2 ENH: NaN semantics for LabelArray missing values. 2016-05-04 15:54:50 -04:00
Scott Sanderson 2395cbb671 ENH: Use np.void for labelarray storage.
This disables most broken ufuncs
2016-05-04 15:54:50 -04:00
Scott Sanderson 47e9b107ec DOC: Clean up docstring cruft. 2016-05-04 15:54:50 -04:00
Scott Sanderson 23324b4218 DOC: Add docstring for LabelArray. 2016-05-04 15:54:50 -04:00
Scott Sanderson c40bbfae03 TEST: More tests for string predicates. 2016-05-04 15:54:50 -04:00
Scott Sanderson 5f190395ad ENH: Add support for strings in Pipeline.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
  LabelArray is conceptually similar to pandas.Categorical, in that it
  stores data with many duplicate values as indices into an array of
  unique values.  For string data with many duplicates (e.g. time-series
  of tickers or or industry classifications), this provides multiple
  orders of magnitude of improvement when doing string operations,
  especially string comparison/matching operations.

- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
  and a corresponding ObjectOverwrite adjustment.

- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
  This method is called on the final result of any pipeline expression
  after screen filtering has occurred. The default implementation of
  ``postprocess`` is identity, but Classifier overrides it to coerce
  string columns into pandas.Categoricals before presenting them to the
  user.
2016-05-04 15:50:52 -04:00