Minutely data can now be appended to bcolz files even when
minutes in the same day have already been written. For example,
previously attempting to write data for the minute 2016-05-11 16:30
would raise an exception if any OHLCV data for 2016-05-11 had been
written to the same file.
Trying to overwrite existing minutes still raises a
BcolzMinuteOverlappingData exception.
Note that previously all sids' bcolz files ended at the same time.
This is no longer necessarily the case. The last record in each
sid's bcolz file now corresponds to the latest minute for which
OHLCV data is provided to the writer.
STY: remove unused imports
MAINT: change dtype to object for compatibility with python3
MAINT: rename pipeline columns and constants for clarity
MAINT: rename column
MAINT: add back cash amount constant
BUG: fix field names
BUG: pass remaining args
WIP: make buyback units parameterized so that user can choose
BUG: fix filtering based on units parameter
WIP: test for undesired units
Revert "WIP: make buyback units parameterized so that user can choose"
This reverts commit df3b838d525bff5026eba1d81865c6645d534c88.
The previous algorithm assumed that the group labels were integers. It
produced nonsense with LabelArrays (though sadly didn't crash because
numpy promotes None and void to object).
- Return a value from `verify_all_indices_unique` so that `panel` isn't
unconditionally `None` in `PanelDailyBarReader`.
- Fix a bug where we always set the volume of every asset to `1e9`.
- Add minimal suite of tests for get_spot_value, which catch both of the
above.
NOTE: There are still several issues with `PanelDailyBarReader`. The
docstring for `get_spot_value` claims that it will return -1 on days
where an asset didn't trade, which isn't the case. It also claims that
it will raise `NoDataOnDate` when a request is made outside the panel
range, but it just raises a KeyError. We also still have no coverage
for `load_raw_arrays`, so it's likely that there are more bugs lurking.
Refactor AlgorithmSimulator so that DAY_END is emitted for both
minute and daily emission, and that handling of end-of-minute
and end-of-day are separated
We are now using isoformats with ':' replaced with ';'. We cannot use a
normal isoformat because windows does not allow files or directories
with ':' in the name.
This data bundle will use the quantopian mirror of the quandl WIKI data
instead of downloading from quandl directly. This dramatically improves
the speed because we do not pay the rate limiting for quandl and we can
send the data in the format zipline expects.
- Fixes a bug where __setitem__ was not called when setting with a slice
on Python 2 (__setslice__ was called instead), which caused strange
behavior when setting an empty string. This is fixed by overriding
__setslice__ and forwarding to __setitem__.
- Fixes a bug where __getitem__ returned an instance of np.void when
returning a scalar. We now correctly return an entry from our
categoricals.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
LabelArray is conceptually similar to pandas.Categorical, in that it
stores data with many duplicate values as indices into an array of
unique values. For string data with many duplicates (e.g. time-series
of tickers or or industry classifications), this provides multiple
orders of magnitude of improvement when doing string operations,
especially string comparison/matching operations.
- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
and a corresponding ObjectOverwrite adjustment.
- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
This method is called on the final result of any pipeline expression
after screen filtering has occurred. The default implementation of
``postprocess`` is identity, but Classifier overrides it to coerce
string columns into pandas.Categoricals before presenting them to the
user.
Adds the data bundle concept which makes it easy for users to register
loading functions to build out minute and daily data along with an
assets db and adjustments db. By default we have provided a `quandl`
bundle which pulls from the public domain WIKI dataset. Users may
register new bundles by decorating an ingest function with
`zipline.data.bundles.register(<name>)`. This also provides a
`yahoo_equities` function for creating an ingestion function that will
load a static set of assets from yahoo.
The cli is now structured as a couple of subcommands and has been
changed to `python -m zipline`. The old behavior of `run_algo.py` has
been moved to the `run` subcommand. This is almost entirely the same
except that it now takes the name of the data bundle to use, defaulting
to `quandl`.
The next subcommand is `ingest` which takes the name of
a data bundle to ingest. This will run the loading machinery and write
the data to a specified location that `run` can find.
There is also a `clean` subcommand which deletes the data that was
written with `ingest`.
Extensions have also been added to zipline. This is an experimental
feature where users can provide an extra set of python files to run at
the start of the process. These can be used to configure aspects of
zipline. Right now the only thing that is supported in an extension file
is the registration of a new data bundle.
Updates the BcolzMinuteBarWriter.write api to allow users to pass their
data as a stream instead of requiring that they loop over their data
externally. This matches the API presented by BcolzDailyBarWriter.