update_universe is a bottleneck on large data sets.
A large portion of that bottleneck is the call to getitem while
looping over the keys, so using update while passing along the internal
__dict__
Seeing about a 40% improvement.
So that an Event can use an initial dict to set all values,
instead of needing to set initial values one by one.
i.e. enables:
```
foo = Event({'bar': 1, 'baz': 2})
```
in favor of:
```
foo = Event()
foo.bar = 1
foo.baz = 2
```
Previously, the list was generated, but only used to calculate
the number of days in the environment.
With exposing this list, working towards a path where the simulation
uses the trading days to determine when to handle market closes.
The delta was ensuring that the backtester wouldn't exceed the
delta of a bar if it were being run against live data.
However, this extra overhead of getting the current time on each
side of the handle_data adds a penalty in pure backtest mode.
Also, it makes the backtest results potentially non-repeatable,
since it is sensitive to current conditions on a box for processing
time.
Favoring having the timeout handled by whatever is running the
zipline algorithm.
There are only 6 trading days between the open and close specified
in test_perf test.
Also, removes getting the period_end off of the last trade,
since the test can now use the end date specified for the trading
environment.
Previously, on days that were trading days, but there with no
event data to process for that day, performance metrics were
not emitted, since the handling was based on having an event
trigger the daily performance metric.
Handled by grouping together performance messages, on market open,
for all days since the last market close.
Also, changes perf_tracker unit test to simulate missing data.
Taken from @richafrank's branch handling the same case.
Two reasons for removal:
- On the path of removing most non-postconditional asserts.
Since the asserts on every message is incurring a
non-insignificant penalty on large datasets.
- Since the assert was invoked as a function, the 'right side'
of the assert statement, i.e. the error message was being invoked
as a function, discovered since the __repr__ of the message was
high on the bottleneck list.
The main bottle neck here was using `len`.
A boolean check is a sufficient test for more items in the queue.
Also, uses all instead of several functions.
When run over large amounts of data the use of ndict's gets and sets
become a large bottleneck, around 1/5th of the CPU time is spent
in ndict's __setattr__, __getattr__, etc.
By switching to an object for an event,
we reduce the penalty significantly.
Removes asserts that check for event being an ndict, as well as those
that assume a certain behavior of the __contains__ method for events.
In a previous patch, the transform argument had been changed
from days to window_length.
The README example was thus not able to be run, updating algo
so that it runs against current versions of transforms.
Moving to main requirements, since zipline will not run without
the treasury and benchmark data that we requests to fetch.
Upgrade is to keep current with latest release.
Salient changes since last version:
- Adds non-holiday closings to trading calendar.
- Forward filling of missing treasury data.
- Improves handling of treasury data when the backtest's
end date day is not a market day.
- Adds option to forward fill data in batch transform.