Also remove test that compares risk metrics batch to iterative,
since the 'iterative' calculations, replaced by the cumulative
calculations, will intentionally drift from the results in the risk
report due to annualization and other factors.
Work towards having separate calculations for the fixed periods versus
the cumulative/headline risk metrics.
Different sumbodules for each type should help make the calculations
type distinct and easier to find.
In anticipation of splitting apart the different risk classes
into their own submodules, a distinct risk module should help
organize those new classes.
For consistency, datetimes returned by the trading calendar should
always show HHMMSS of midnight UTC. Not only is this useful for
consistency, but it also allows us to check if a particular date() is
in an array of these datetimes, because they will hash to the same
thing. For example:
early_closes = get_early_closes()
... later ...
if current_bar_datetime.date() in early_closes:
... today closes early ...
If if the datetimes returned by the trading calendar functions don't
have 00:00:00 for HHMMSS, then the "in" check above will fail because
the date and the datetimes in early_closes won't hash to the same
thing.
If a stock stops gettign updated values, e.g. if a stock rolls out
of a universe strategy, currently the underlying batch transform
for TALib may have nans (which is another issue that could be addressed),
the nans cause crashes when passed to some TALib function, e.g. Bollinger
Bands are incompatible with all nan values.
So, drop sids that only have nan values for the current data panel.
Since these modules are not requirements, make the name more clear
about the distinction. Especiall, so that build scripts do not pick
up this file when including wildcards whit a requirements prefix.
The defaultdict behavior was allowing both algo code and
TradingAlgorithm wrappers to add unintended keys.
Remove use of defaultdict in favor of a dictionary that explicitly
adds the values in tradesimulation, otherwise allow a KeyError
if the bar is indexed with a sid that doesn't exist.
Also, when iterating over the keys in the data bar, only return
those keys that have pricing data.
The deepcopy of events into the EventWindow's ticks was causing
a significant increase in memory consumption, e.g. an algorithm with
almost 200 sids and 14 vwaps removing the deepcopy reduces the amount
of memory consumed by about 40%.
The downside is that if an event's properties are changed, which is
not advised, later on, then the signal derived from vwap etc.
may be changed.
For maintainer use, requires AWS credentials for the account where
the `zipline-test-data` bucket is hosted.
Script does the following steps which used to be manual:
- Create a key name based on the md5 of the answer key file.
- Upload the answer key to S3 bucket.
- Make the file publically downloadable over HTTP.
Fix the spreadsheet to apply a factor of COUNT / COUNT - 1
to the COVAR value.
Also, go back to using the C[1][1] index instead of calculating
var independently.
Use recent change to benchmark variance in the beta calculation,
instead of referring to the 4th quadrant of the covariance.
Also, read answers from answer key for corroboration of beta values.
The np.cov call needs a ddof of 0 to match the answer key, which uses
Excel's VAR.
When switching np.cov to use a ddof of 0, the benchmark variance is
no longer the 4th quadrant of the cov result, so use np.var directly.
Add reference to updated answer key with benchmark variance cells,
and use the new cells as the reference for the benchmark variance
test.
The values changed from the original hardcoded values, due to the
change to close over close benchmarks.
To help tentpole the risk results with an alternative implementation.
Also, the spreadsheet provided is what the original answers were based
from, reading from said spreadsheet should help prevent drift.
So that the answer key does not onerous on the SCM repo size, add a
utility to download the answer key automatically.
Prevent re-download on every test suite run if the local answer key
matches the latest version.
The risk tests originally were based on a spread sheet, with the
results of returns etc copy and pasted into the `test_risk` module.
Include the spreadsheet and read the values directly using a Python
Excel spreadsheet library.