Commit Graph

17 Commits

Author SHA1 Message Date
Conner Fromknecht 2770648acb Fixed catalyst tests except example tests 2017-06-19 14:43:10 -07:00
Scott Sanderson e49f4c6149 ENH: Improve error message on bad return. 2017-06-07 17:07:19 -04:00
Scott Sanderson ad10349992 TEST: Test map returning None. 2017-06-07 15:28:15 -04:00
Scott Sanderson cfe4df8f2b TEST: Test map ignores missing with None. 2017-06-07 14:16:17 -04:00
Scott Sanderson e995e6f2ed ENH: Add relabel method to string classifiers.
- Adds a `map` method to `LabelArray` that maps a unary function over
  the categories of a LabelArray, shrinking the underyling codes if
  possible.

- Adds a new `.relabel` method to string-dtype classifiers that maps a
  unary function over the unique elements of the underlying LabelArray.
  This is useful for things like cleaning noisy label data.
2017-06-07 13:14:12 -04:00
Joe Jevnik beb4bd8c6e BUG: fix label array code dtype condense 2017-03-08 20:54:57 -05:00
Joe Jevnik 44563d351b TST: add roundtrip check 2017-03-01 15:15:16 -05:00
Joe Jevnik 735d98fb6e TST: add tests for inferred width labelarray 2017-02-07 16:28:13 -05:00
Joe Jevnik d378c9fca3 ENH: store the 'codes' for a labelarray in the narrowest int type possible 2017-02-02 20:58:36 -05:00
Scott Sanderson a1273cd669 MAINT: Fix warnings from numpy labelarray methods. 2016-09-20 17:12:07 -04:00
Scott Sanderson e0aeda4c3e BUG: Fix bytes/unicode issues in py3. 2016-05-05 01:46:35 -04:00
Scott Sanderson a29da32252 TEST: Don't assert particular numpy error.
They change from version to version.
2016-05-04 19:40:50 -04:00
Scott Sanderson 620d7648b0 BUG: Tests/bugfixes for LabelArray slicing.
- Fixes a bug where __setitem__ was not called when setting with a slice
  on Python 2 (__setslice__ was called instead), which caused strange
  behavior when setting an empty string.  This is fixed by overriding
  __setslice__ and forwarding to __setitem__.

- Fixes a bug where __getitem__ returned an instance of np.void when
  returning a scalar.  We now correctly return an entry from our
  categoricals.
2016-05-04 15:54:50 -04:00
Scott Sanderson 8de45540f2 ENH: NaN semantics for LabelArray missing values. 2016-05-04 15:54:50 -04:00
Scott Sanderson 2395cbb671 ENH: Use np.void for labelarray storage.
This disables most broken ufuncs
2016-05-04 15:54:50 -04:00
Scott Sanderson 7a65121e6e BUG: contains was renamed to has_substring 2016-05-04 15:54:50 -04:00
Scott Sanderson 5f190395ad ENH: Add support for strings in Pipeline.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
  LabelArray is conceptually similar to pandas.Categorical, in that it
  stores data with many duplicate values as indices into an array of
  unique values.  For string data with many duplicates (e.g. time-series
  of tickers or or industry classifications), this provides multiple
  orders of magnitude of improvement when doing string operations,
  especially string comparison/matching operations.

- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
  and a corresponding ObjectOverwrite adjustment.

- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
  This method is called on the final result of any pipeline expression
  after screen filtering has occurred. The default implementation of
  ``postprocess`` is identity, but Classifier overrides it to coerce
  string columns into pandas.Categoricals before presenting them to the
  user.
2016-05-04 15:50:52 -04:00