15 Commits

Author SHA1 Message Date
Conner Fromknecht 99efa7a9f3 Fixed catalyst tests except example tests 2017-06-19 14:43:10 -07:00
Scott Sanderson 61feedbd16 TST: Add test for missing values in relabel. 2017-06-07 18:21:13 -04:00
Scott Sanderson 3b8a6b543e BUG: Fix NoneType comparisons in PY3. 2017-06-07 18:21:03 -04:00
Scott Sanderson 5b9d5fecfb ENH: Add relabel method to string classifiers.
- Adds a `map` method to `LabelArray` that maps a unary function over
  the categories of a LabelArray, shrinking the underyling codes if
  possible.

- Adds a new `.relabel` method to string-dtype classifiers that maps a
  unary function over the unique elements of the underlying LabelArray.
  This is useful for things like cleaning noisy label data.
2017-06-07 13:14:12 -04:00
Joe Jevnik d07f133579 STY: remove unused imports and method, clean up docs 2016-10-28 15:04:18 -04:00
Joe Jevnik af3e1016a0 TST: add tests for postprocess and to_workspace_value 2016-10-28 15:04:18 -04:00
Scott Sanderson 8b1136d9d5 ENH: Validate missing_values at term construction.
Finds bugs in several bad tests that were constructing invalid terms.
2016-05-10 19:43:56 -04:00
Scott Sanderson 2431aaefb5 BUG: Fix bad error message for element_of.
It referred to the wrong method name (`is_element`).
2016-05-10 16:57:59 -04:00
Scott Sanderson e0aeda4c3e BUG: Fix bytes/unicode issues in py3. 2016-05-05 01:46:35 -04:00
Scott Sanderson b78501e54a BUG: Fix broken isnull() on string classifiers.
Adds a special case in NullFilter to handle LabelArrays correctly.
2016-05-04 17:26:27 -04:00
Scott Sanderson 5a1ed7b1d3 ENH: Make element_of work for ints too. 2016-05-04 16:31:58 -04:00
Scott Sanderson c40bbfae03 TEST: More tests for string predicates. 2016-05-04 15:54:50 -04:00
Scott Sanderson 5f190395ad ENH: Add support for strings in Pipeline.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
  LabelArray is conceptually similar to pandas.Categorical, in that it
  stores data with many duplicate values as indices into an array of
  unique values.  For string data with many duplicates (e.g. time-series
  of tickers or or industry classifications), this provides multiple
  orders of magnitude of improvement when doing string operations,
  especially string comparison/matching operations.

- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
  and a corresponding ObjectOverwrite adjustment.

- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
  This method is called on the final result of any pipeline expression
  after screen filtering has occurred. The default implementation of
  ``postprocess`` is identity, but Classifier overrides it to coerce
  string columns into pandas.Categoricals before presenting them to the
  user.
2016-05-04 15:50:52 -04:00
Scott Sanderson 9a04621781 ENH: Add eq and __ne__ to Classifier. 2016-03-28 15:46:28 -04:00
Scott Sanderson 758d6c74fc ENH: Add isnull and notnull for classifiers. 2016-03-25 15:11:18 -04:00