Make the ordering in which processing of event types both explicit and
independent of the sort ordering of the incoming sources.
The overhead of creating the list per snapshot and the iterators appears
to be marginal in the minute data case when tested locally.
This patch is intended as part of the path towards making the trade
simulation loop not depend on consuming and tracking every trade event.
The timing of where last_sale_date was needed to be changed was proving
difficult to adapt in the previous model.
Should also allow the removal of sorting of the various source streams.
By having both the trade simulation main loop route events to "process"
methods based on event type and the process methods also checking event
type, there was some duplicated effort in doing that comparison many
times.
A particular case where this was noted in profiling was for the
`process_event` function which was checking if the type was not a trade
and returning early, when in a larger universe of stocks the value
returned False 99% of the time.
Instead provide separate process functions specific to each type,
e.g. e.g. `process_trade` and `process_transaction` and route traffic to
those functions in tradesimulation.
For a universe of 160 stocks on both no-op algo and an algo that rebuys
its universe every day, saw about a 10% increase locally.
Also:
- Add process_benchmark to blotter since internal subclass relies on
logic on benchmark, this allows the internal process_trade to be a
`pass`.
- Add warning on unrecoginzed event types.
The risk containers that are actually used for reports use the
'cumulative' style container which has an index of days, not minutes.
The minute containers and copying of data etc. were causing an expanding
memory footprint.
The intraday_risk_metrics is being removed since the values are not
used; cumulative risk metrics with the last value updated to the latest
close has been used for some time.
Before the removal of intraday_risk_metrics, the position trackers
passing of benchmark returns to the cumulative risk metrics needs to no
longer depend on the calculations done by the intraday stats. So instead
use the all_benchmark_returns stored in the tracker directly.
The correct thing to look at to figure out where the root of the
zipline tree is, is `zipline.__file__`, not `zipline.__path__`. The
latter could contain multiple directories in it, and is not intended
to be `os.path.join()`ed as the previous code was doing.
Rather than drop files temporarily into the master security lists
directory during unit tests, create temporary directories for the
tests. This avoids issues when the tests are being run at the same
time as other code that uses the real security lists data.
For beta calculation:
Remove `.dropna` , since it was creating a new
Series and Index which inflated memory usage as algorithm run time
progressed.
For downside risk calculations:
Instead of using pd.Series calculations, pass the underlying
numpy array which have already been sliced to the exact dt, so that the
call to `round` does not create a new Series.
Because we use ordered_pip.sh to install requirements files, we want
dependencies in requirements_dev.txt to be listed _before_ the things
that depend on them, rather than after. Otherwise, with
ordered_pip.sh, stuff will get installed implicitly, and perhaps the
wrong version.
Remove use of defaultdict for orders_by_modified, which was causing an
empty list to be added every time to_dict was called with a specified
dt.
Nnoticed in the minute emission case when hunting another memory leak,
every simulation minute a new Timestamp and list was created and never
let go.
Only fill limit order if impacted fill price is better than the limit price.
If a limit order is partially filled, only fill the remaining shares if the
impacted fill price is better than the limit price.