catalyst

mirror of https://github.com/wassname/catalyst.git synced 2026-06-29 04:25:37 +08:00

Author	SHA1	Message	Date
Joe Jevnik	5925107052	TST: fix doctests to actually run	2016-06-21 15:07:03 -04:00
Scott Sanderson	f7e9281b14	BUG: Fix groupby with string columns. The previous algorithm assumed that the group labels were integers. It produced nonsense with LabelArrays (though sadly didn't crash because numpy promotes None and void to object).	2016-05-10 16:57:59 -04:00
Scott Sanderson	9fd8ec180d	BUG: View with specific int dtype. Just viewing as int is broken on win32.	2016-05-05 02:13:14 -04:00
Scott Sanderson	2ceeac1237	BUG: Use compat unicode.	2016-05-04 19:58:55 -04:00
Scott Sanderson	bd49647ce0	BUG: Fix failure on pandas >= 0.17.	2016-05-04 19:38:28 -04:00
Scott Sanderson	620d7648b0	BUG: Tests/bugfixes for LabelArray slicing. - Fixes a bug where __setitem__ was not called when setting with a slice on Python 2 (__setslice__ was called instead), which caused strange behavior when setting an empty string. This is fixed by overriding __setslice__ and forwarding to __setitem__. - Fixes a bug where __getitem__ returned an instance of np.void when returning a scalar. We now correctly return an entry from our categoricals.	2016-05-04 15:54:50 -04:00
Scott Sanderson	4dbc7eac56	MAINT: Remove byteswap and newbyteorder from LabelArray.	2016-05-04 15:54:50 -04:00
Scott Sanderson	8de45540f2	ENH: NaN semantics for LabelArray missing values.	2016-05-04 15:54:50 -04:00
Scott Sanderson	2395cbb671	ENH: Use np.void for labelarray storage. This disables most broken ufuncs	2016-05-04 15:54:50 -04:00
Scott Sanderson	47e9b107ec	DOC: Clean up docstring cruft.	2016-05-04 15:54:50 -04:00
Scott Sanderson	23324b4218	DOC: Add docstring for LabelArray.	2016-05-04 15:54:50 -04:00
Scott Sanderson	c40bbfae03	TEST: More tests for string predicates.	2016-05-04 15:54:50 -04:00
Scott Sanderson	5f190395ad	ENH: Add support for strings in Pipeline. - Adds a new class, ``LabelArray``, which is a subclass of np.ndarray. LabelArray is conceptually similar to pandas.Categorical, in that it stores data with many duplicate values as indices into an array of unique values. For string data with many duplicates (e.g. time-series of tickers or or industry classifications), this provides multiple orders of magnitude of improvement when doing string operations, especially string comparison/matching operations. - Adds a new generic object "specialization" for `AdjustedArrayWindow`, and a corresponding ObjectOverwrite adjustment. - Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``. This method is called on the final result of any pipeline expression after screen filtering has occurred. The default implementation of ``postprocess`` is identity, but Classifier overrides it to coerce string columns into pandas.Categoricals before presenting them to the user.	2016-05-04 15:50:52 -04:00

13 Commits