The creation of a new portfolio ndict on each call of handle_data
was creating a very high performance overhead.
Instead, we use the same the portfolio object for each event,
and replace the values contained within.
Gains some performance by using a 'regular' object instead of
an ndict.
Also, directly sets up the values that we return, instead of going in
between with __core_dict and then removing values.
In it's entirety performanc.as_portfolio is the current
highest bottleneck, working on reducing time spent in that function.
Uses heapq.merge to sort input from mulitple sources instead of
our own sort module.
From profiling heapq.merge is more efficient than our own efforts.
update_universe is a bottleneck on large data sets.
A large portion of that bottleneck is the call to getitem while
looping over the keys, so using update while passing along the internal
__dict__
Seeing about a 40% improvement.
So that an Event can use an initial dict to set all values,
instead of needing to set initial values one by one.
i.e. enables:
```
foo = Event({'bar': 1, 'baz': 2})
```
in favor of:
```
foo = Event()
foo.bar = 1
foo.baz = 2
```
Previously, the list was generated, but only used to calculate
the number of days in the environment.
With exposing this list, working towards a path where the simulation
uses the trading days to determine when to handle market closes.
The delta was ensuring that the backtester wouldn't exceed the
delta of a bar if it were being run against live data.
However, this extra overhead of getting the current time on each
side of the handle_data adds a penalty in pure backtest mode.
Also, it makes the backtest results potentially non-repeatable,
since it is sensitive to current conditions on a box for processing
time.
Favoring having the timeout handled by whatever is running the
zipline algorithm.
There are only 6 trading days between the open and close specified
in test_perf test.
Also, removes getting the period_end off of the last trade,
since the test can now use the end date specified for the trading
environment.
Previously, on days that were trading days, but there with no
event data to process for that day, performance metrics were
not emitted, since the handling was based on having an event
trigger the daily performance metric.
Handled by grouping together performance messages, on market open,
for all days since the last market close.
Also, changes perf_tracker unit test to simulate missing data.
Taken from @richafrank's branch handling the same case.
Two reasons for removal:
- On the path of removing most non-postconditional asserts.
Since the asserts on every message is incurring a
non-insignificant penalty on large datasets.
- Since the assert was invoked as a function, the 'right side'
of the assert statement, i.e. the error message was being invoked
as a function, discovered since the __repr__ of the message was
high on the bottleneck list.
The main bottle neck here was using `len`.
A boolean check is a sufficient test for more items in the queue.
Also, uses all instead of several functions.
When run over large amounts of data the use of ndict's gets and sets
become a large bottleneck, around 1/5th of the CPU time is spent
in ndict's __setattr__, __getattr__, etc.
By switching to an object for an event,
we reduce the penalty significantly.
Removes asserts that check for event being an ndict, as well as those
that assume a certain behavior of the __contains__ method for events.