Continue on path of converting values stored inside of risk metrics
to use a DataFrame instead of storing multiple lists.
Also, the need for latest_dt in getting the current volatility for
the sharpe calculation, shows that we need to set the lastest_dt at
the beginning of the update loop.
Indexes to risk answers were pointing to a previous version.
Also, provide the risk cumulative answers as a pd.Series,
so that it is easier to compare to values produced by risk class.
to account for minimum price variation.
On an order to buy, between .05 below to .95 above a penny, use that penny.
On an order to sell, between .05 above to .95 below a penny, use that penny.
Eventually, all cumulative metrics, (alpha, beta, etc.) will be
stored in the same DataFrame
For easier tracking of dt to values during debugging, but should be
some performance gains as well.
Correct the annualization factor from being 1/sqrt(252), since
the annualization was applied to the volatility, by including
252 in the Sharpe's numerator.
Currently, just provide a way to render to some of the data extracted.
Intended to have more thorough documentation of the spreadsheet,
explaining derivation/calculations in each sheet and column.
Add indexes for Sharpe, returns and other values needed for
reading answers for cumulative risk metrics.
Prepare for unit test and matching change of implementation.
Refactor the reading of values from the Excel spreadsheet so that
parsers are configurable by index.
Needed so that we can parse columns that have dates, in addition
to floats as previously.
Instead of using the indexes defined in the answer key class
to index back into the answer key object, populate the answers
so that they are available as members of the answer key object.
Update period risk test to use new answer key structure.
Also, remove the rounding behavior from the answer sheet, leaving
the rounding to the consumer of the answer key values, so that
the values can be retrieved from the spreadsheet during answer
key __init__ without knowledge of the decimal point that the calling
code expects.
Correspondingly, change period risk tests to use
np.testing.assert_almost_equal when doing floating point comparison.
Move to a new notebook, the AnswerKeyLink will be for a permalink
to the current version of the answer key, of which the output
won't be too noisy in git.
The annotation notebook will also be kept in source control, but
without output, since the table html output is large.
For more improved viewing experience via nbviewer.ipyhton.org,
include the output of the notebook.
When saving/updating this file, a fresh kernel and evaluation of
the entire notebook should be used so that the cell numbers stay
in order.
Read tests.risk.answer_key module into an IPython notebook, to start
a base to which answer key values and explanations can be added and
display without access to Excel.
For now, the notebook just provides the latest download link for
the spreadsheet.
Update answer key for cumulative risk:
- Annualization of Sharpe
- Use 10 year period
- Use of daily returns vectors instead of compounded return scalar.
No tests or risk module code are currently reading off of this sheet,
but developing it ahead of work in risk module so that the sheet can
be examined and vetted.
Also remove test that compares risk metrics batch to iterative,
since the 'iterative' calculations, replaced by the cumulative
calculations, will intentionally drift from the results in the risk
report due to annualization and other factors.
Work towards having separate calculations for the fixed periods versus
the cumulative/headline risk metrics.
Different sumbodules for each type should help make the calculations
type distinct and easier to find.
For maintainer use, requires AWS credentials for the account where
the `zipline-test-data` bucket is hosted.
Script does the following steps which used to be manual:
- Create a key name based on the md5 of the answer key file.
- Upload the answer key to S3 bucket.
- Make the file publically downloadable over HTTP.
Fix the spreadsheet to apply a factor of COUNT / COUNT - 1
to the COVAR value.
Also, go back to using the C[1][1] index instead of calculating
var independently.
Use recent change to benchmark variance in the beta calculation,
instead of referring to the 4th quadrant of the covariance.
Also, read answers from answer key for corroboration of beta values.
The np.cov call needs a ddof of 0 to match the answer key, which uses
Excel's VAR.
When switching np.cov to use a ddof of 0, the benchmark variance is
no longer the 4th quadrant of the cov result, so use np.var directly.
Add reference to updated answer key with benchmark variance cells,
and use the new cells as the reference for the benchmark variance
test.
The values changed from the original hardcoded values, due to the
change to close over close benchmarks.
So that the answer key does not onerous on the SCM repo size, add a
utility to download the answer key automatically.
Prevent re-download on every test suite run if the local answer key
matches the latest version.
The risk tests originally were based on a spread sheet, with the
results of returns etc copy and pasted into the `test_risk` module.
Include the spreadsheet and read the values directly using a Python
Excel spreadsheet library.
The TALib transform only supported operating on the first value
of a given batch transform panel row.
Instead of returning the one value, even if an panel with multiple
sids was provided, return a dictionary that maps stock to TALib
result.
Add a `bars` keyword arg, as is used with BatchTransform.
Also, instead of overwriting the window_length kwarg with timeperiod,
always use the lookback value from the created TALib function,
as timeperiod will be an input into that value if it exists.
Calculate `window_length` in minute mode so that there are enough
days to cover the minutes in the timeperiod.