Pandas 0.18 doesn't like having null-ish values in categoricals. Fixing
this properly requires re-thinking the semantics for missing_value on
pipeline terms, so we're punting on that until after we've upgraded to
0.18.
In pandas 0.18, the behavior of ``nth()`` changed so that Grouper no
longer can be easily used to recover group labels.
Instead of using the built-in grouper behavior, we use a groupby on two
arrays we build ourselves. This recovers the original behavior, and is
about 2x faster as a bonus.
- Fixes a warning on indexing with a float that ultimately came from
pd.Timedelta.total_seconds(). Adds ``timedelta_to_integral_seconds``
and ``timedelta_to_integral_minutes()`` functions and replaces various
usages of ``int(delta.total_seconds())`` with them.
- Fixes a warnings triggered in ``_create_daily_stats`` from
passing tz-aware datetimes to np.datetime64.
One year NYSE test that buys a lot triggers 492,963 calls to
minute_to_session_label. Only 98924 ~(390 * 252) make it past the
cache and trigger the heavier computation.
Remove module scope invocations of `get_calendar('NYSE')`, which cuts
zipline import time in half on my machine. This make the zipline CLI
noticeably more responsive, and it reduces memory consumed at import
time from 130MB to 90MB.
Before:
$ time python -c 'import zipline'
real 0m1.262s
user 0m1.128s
sys 0m0.120s
After:
$ time python -c 'import zipline'
real 0m0.676s
user 0m0.536s
sys 0m0.132s
They're not meaningful, and they cause warnings from numpy.
Implemented in terms of a new preprocessor, `expect_bounded`, which
takes a tuple of `upper_bound` and `lower_bound`.
The new TradingCalendar method is called `minute_index_to_session_labels`.
It takes a DatetimeIndex of in-order market minutes and returns a
DatetimeIndex of the corresponding sessions.
The new method is approximately 100x faster than mapping
`minute_to_session_label` over a large DatetimeIndex.
Encapsulate the shared global calendar map in an object.
This allows consumers that don't want to participate in custom
registration to pass around a calendar dispatcher, and would make it
easier to support contextual management of the global calendar map if we
want to do that in the future.
As a bonus, we now only create one instance of each calendar, instead of
one per alias.
Previously, run_algorithm caused an error if run on raw (non-bundle)
data, because of uninitialized variables. Initializing those variables
to None to allow run_algorithm to work with Panel data, etc.
Also, run_algorithm did not create sim_params for the TradingAlgorithm
instance it created; this kicked the can to TradingAlgorithm, which
gets default sim_params with data_frequency 'daily'. To support minute
bars, changing run_algorithm to create its own sim_params with the
data_frequency specified in its arguments.