For a pipeline doing simple computations on USEquityPricing data, we
were spending ~60% of `run_pipeline` loading adjustments. Almost all of
that time was spent in calls to `DatetimeIndex.get_loc` to find the
indices of adjustment `eff_date`s.
This optimizes the eff_date lookups by pre-populating a cache of
seconds-since-epoch timestamps that we expect to see, and falling back
to `np.searchsorted` on cache misses.
In testing, this reduces the time to compute a 1-year pipeline with 30
and 90 day moving averages from 3.1 seconds to 0.9 seconds.
Put the logic for reading and writing the equity price and adjustment
data into a module located in data, making it distinct from the pipeline
loader usage of the formats.
This prepares for both incoming changes of how adjustments are written,
(which includes using the bcolz daily reader as an input), as well as
eventually providing the readers to a DataPortal object.
Previously we have capitalized input strings at different levels in
our code: in the user-facing API methods and in the asset finder.
This commit moves input string capitalization exclusively to the API
method to which the string was supplied. Specifically, the string is
capitalized by a preprocess API method decorator. The preprocess
decorator passes the input string to the newly defined
ensure_upper_case() method, which returns a TypeError if the argument
supplied is not a string.
ensure_upper_case() is defined in a new file, zipline/utils/input_validation.py.
The existing expect_types() method is also moved there.
Various tests in tests/test_assets.py are modified to account for the
fact that the asset finder method lookup_symol() no longer capitalizes
its supplied argument.
Improves the query for futures contract to use the date that comes first
in time (between notice_date and expiration_date) to determine cotnract
validity. If one of these is missing, we'll use the other.
Also modifies the query to order the resulting contracts by their
expiration_date if available, and to use their notice_date if not.
This is an optimization where we're building an environment but not
using its finder. Ideally, the consumer would use just the calendar,
but it's not fully featured quite yet.