- Return a value from `verify_all_indices_unique` so that `panel` isn't
unconditionally `None` in `PanelDailyBarReader`.
- Fix a bug where we always set the volume of every asset to `1e9`.
- Add minimal suite of tests for get_spot_value, which catch both of the
above.
NOTE: There are still several issues with `PanelDailyBarReader`. The
docstring for `get_spot_value` claims that it will return -1 on days
where an asset didn't trade, which isn't the case. It also claims that
it will raise `NoDataOnDate` when a request is made outside the panel
range, but it just raises a KeyError. We also still have no coverage
for `load_raw_arrays`, so it's likely that there are more bugs lurking.
Refactor AlgorithmSimulator so that DAY_END is emitted for both
minute and daily emission, and that handling of end-of-minute
and end-of-day are separated
We are now using isoformats with ':' replaced with ';'. We cannot use a
normal isoformat because windows does not allow files or directories
with ':' in the name.
This data bundle will use the quantopian mirror of the quandl WIKI data
instead of downloading from quandl directly. This dramatically improves
the speed because we do not pay the rate limiting for quandl and we can
send the data in the format zipline expects.
- Fixes a bug where __setitem__ was not called when setting with a slice
on Python 2 (__setslice__ was called instead), which caused strange
behavior when setting an empty string. This is fixed by overriding
__setslice__ and forwarding to __setitem__.
- Fixes a bug where __getitem__ returned an instance of np.void when
returning a scalar. We now correctly return an entry from our
categoricals.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
LabelArray is conceptually similar to pandas.Categorical, in that it
stores data with many duplicate values as indices into an array of
unique values. For string data with many duplicates (e.g. time-series
of tickers or or industry classifications), this provides multiple
orders of magnitude of improvement when doing string operations,
especially string comparison/matching operations.
- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
and a corresponding ObjectOverwrite adjustment.
- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
This method is called on the final result of any pipeline expression
after screen filtering has occurred. The default implementation of
``postprocess`` is identity, but Classifier overrides it to coerce
string columns into pandas.Categoricals before presenting them to the
user.
Instead of letting the cache of carrays grow unbounded, use an LRUCache
to cap the number of equities for any given column.
Tested with the size 1000, on an algo that was using pipeline which was
using over 3000, runtimes were similar, but the memory usage was
successfully capped to around 1.2GB.
Also, tested with an algorithm which bought and hold just one equity and
no major slow down was seen when using the LRUCache vs. a dictionary.
We may want to follow this up with an extension to `carray` which is not
as memory hungry per column; e.g. by not loading repeated/similar
metadata or releasing the last read chunk after a certain amount of
time.