Refactor PerformanceTracker, Blotter, and AlgorithmSimulator to
work with handling the end of a bar at the AlgorithmSimulator level
instead of within PerformanceTracker.
- PerforamnceTracker and Blotter are longer generators,
both provide functions to process events instead.
- AlgorithmSimulator calls each from within the loop running
over the data generator.
- Change test_perf_tracker utility to be compatible with change
away from PerformanceTracker as a generator.
Has the effect of:
- Fixing the timing of order emission.
- Allow minutely emission of benchmarks, which was prevented
by the extra grouping previously caused by Blotter.
Minutely emission also depends on work for streaming benchmarks
through performance and risk at a minute granularity.
- Create different benchmark containers in performance
depending on emission rate.
- Add a minute close method which updates algorithm and
benchmark returns, and calculates the risk metrics
depending on those methods.
- Provide fake 0.0 values for annualized metrics like
sharpe, sortino, and information, until we figure out
how they should be treated in the context of minutely
calculation.
*NOTE* This does not fully work without the changes to the
simulation loop by @fawce
Eventually should to either return None or remove
progress completely, but in the meantime, return a
constant of 1.0 for progress of minute emissions.
Also, factor out the daily calculation into a property
instead of calculating during process.
Instead of creating a set of perf messages for each event during minute
emission mode, only include the messages on the last event in the bar.
Should cut down on calculations/serialization as well, as work towards
doing more 'end of bar' logic for minute benchmarks.
To fix the grouping of events so that (dt, events) ordering
is preserved, the tracking of order states needs to change
in the following way.
Change how order keeps track of dates:
- Change order's dt field to reflect modified date.
- Add a created field.
Change how performance keeps track of orders by:
- Map dt to transactions
- Map dt to orders
- Map order ids to keep track of updated orders.
The emission of order updates from the blotter were incorrect,
and subsequently, performance.
Previously, only the first action of the order was emitted,
fix so that all status updates are emitted.
Instead of creating a list of benchmarks in the risk module,
stream benchmarks through the system as events, starting from the
algorithm generator.
Works towards more easily setting arbritrary pricing data as
a a benchmark, as well as working towards live minutely benchmarks.
- Add transaction and order types
- Move TransactionSimulator from trading.py to tradesimulation.py
(only used by other members of the tradesimulation module)
- Make Transaction an independent event, like dividend
- Add Blotter class.
- Flatten the transaction events to be independent of trade bar events
- Make orders into events that reach performance (need to add
handling)
- Issue IDs to orders and tracking each transaction's order id.
- Make volume share slippage fill orders independently, rather than
aggregating them into a single transaction.
- Perf tracker holds orders, serializes them with transactions.
- Order state defined and maintained by order class.
- Minutely emission of orders based on last_modified date.
Also, fix double emission of performance results with the last minute.
Change the perf tracker unit tests so that it doesn't rely on an
'extra' event triggering emission.
Unlike daily, minute emission now emits at the end of the bar in
the PerformanceTracker.transform instead of waiting for the next event.
During minute emissions, it is still helpful to have a final daily
performance result, analogous to what would be the final packet in
a daily emitted backtest, so that all transactions, etc. are contained
in one place.
Prevent an extra performance result with the timestamp of the midnight of
the day from being emitted.
Fix by setting the `saved_dt` value with the dt of the first event,
before entering into the main performance loop, otherwise a performance
result with a midnight timestamp and data from just the first event is
emitted.
As currently implemented, progress doesn't currently make sense with
minutely results. Dropping the field from the results so should help
reduce some noise.
The intraday performance results were emitting all transactions
for the entire day up to that point, instead of the desired transaction
list for the current timestamp.
Add a `dt` parameter to the `to_dict` method of PerformancePeriod so
that the transactions are limited to a specific datetime.
When the parameter is `None`, a todays_performance object will
function as previously with returning all transactions for the day.
# Please enter the commit message for your changes. Lines starting
Wires up performance tracker so that when `emission_rate` is set
to `minute`, the performance packets are sent out every minute,
instead of once per day.
Please note, the performance packets that are generated are not
ready for prime time consumption, this patch is merely a step towards
hooking up the ability to inspect minute data.
Known issues:
- The packets do not currently include risk information.
Since we need to consider how this affects the denominators
of the risk calculations.
Slight refactoring of grouping the tracking variables in the
PerformanceTracker together.
So that it's easier to see which are config members and which are
members used to track internal state.
- perf modified to let non-performance related events flow through.
- changes to support streaming non-trading data through batch transforms
and for mixing in sids with just custom data.
- allowing CUSTOM events to flow through to transforms.
- Added logic to maintain pre-specified sid filter.
So that both computational and memory overhead is reduced,
this turns off serializing positions for cumulative performance.
Positions were essentially being doubled up by being stored
in both cumalative and daily.
So that transactions are kept by default.
This prepares for the addition of the serialize flag added by
@fawce.
Setting the default to True, so that the flags will be aligned.
- added LSE reference rrules calendar (thanks to Edward Johns)
- added tests to verify LSE environment matches rrule calendar
- added a test to verify global environment behavior can be set.
- moved DailyReturn class to trading to eliminate circularity from
risk <-> trading.
- updated TradingEnvironment to be a context manager. This allows users
to run algorithms in individually isolated environments in one python
process. This is useful for managing multiple algorithms in a single
ipython notebook.
- added comments to explain behavior and useage of the global environment
Global state for the financial simulation environment is accessed through the
zipline.finance.trading module, which now contains a module variable:
environment.
Parameters are passed into an algorithm as a keyword argument, sim_params.
SimulationParameters creates a trading day index for the test period that
can be used to find trading days, calculate distance between trading days,
and other common operations. The sim params index is just selected from the
global state.
================
Details:
- adding delorean to the requirements.
- made index symbol a parameter for loading the benchmark data. changed
messagepack storage to be symbol specific.
- ported risk, performance, algorithm, transforms, batch transforms
and associated tests to use simulation parameters and global environment
- factory and sim factory use global state and sim params
- factory method parameter names now reflect the class expected
With this patch, on the close of markets we "fast forward" to midnight of the
next trading day and calculate the dividend payments. This patch assumes that
the dividend dates are all at midnight UTC.
Algorithm returns and the risk calculations that depend on them now include
cash dividends. This commit does _not_ provide an API for user algorithms to
access dividends.
PerformanceTracker expects the dividend data to arrive as events, similar to
the way that Trades arrive. Dividends are expected to have adjusted payment
amounts that are inline with adjusted trades.
PerformanceTracker maintains state of all the unpaid dividends in the position
objects held in PerformancePeriod. Dividend objects contain all the relevant
dates (declared, ex, payment) as well as net and gross amounts. Dividends are
removed from the list as they are paid. Cash flow is not incremented until the
payment day. This creates the possibility of a dividend being owed but not
paid or realized before the end of a test. For example, a dividend with an
ex_date of today may have a pay date 2 weeks in the future. Right now the
algorithm does not receive any credit for unpaid dividends.
Tests cover buying/selling around the ex_date and payment_date, and checking
that the performance calculated is as expected.
Since the position amount and price ndarrays are one dimensional
and use real numbers, we do not need the overhead of the extra
case handling provided by numpy.vdot, which comes at a cost of
performance.
With thanks to @jlowin, for pointing out the better fit of numpy.dot.
Gets almost 100x speed up over iterating over the values and
summing up the values in Python.
Farms out the work to numpy and atlas by using the vector dot
product of the amounts and last sale prices.
Adds some wiring of keeping track of an index into the numpy arrays
for each position, so that value can be overwritten as events update
those amounts and sale prices.
Instead of doing the rollover by creating a new PerformancePeriod,
introduces a `rollover` method that resets the values that need
to be fresh in a new period, and moves the ending values to starting
values, and leaves positions intact.
This isn't a major runtime improvement in of itself, but it does
allow us to more easily keep track of position values from period
to period, which other improvements will use.
Instead of creating a new ndict for each position on every event,
we change the values in the object that held the previous position.
The creation of new objects on each event was incurring too much
overhead.
Changes the position type returned by performance module.
For improved speed, changes from ndict to a simple Python object,
since the cost of setting ndict values is too expensive for the
number of times that positions are returned.
Also, changes the containing type of the positions to be dictionary
with the __missing__ overloaded, instead of the ndict that had that
behavior, to reduce the penalty of using ndicts.
The creation of a new portfolio ndict on each call of handle_data
was creating a very high performance overhead.
Instead, we use the same the portfolio object for each event,
and replace the values contained within.
Gains some performance by using a 'regular' object instead of
an ndict.
Also, directly sets up the values that we return, instead of going in
between with __core_dict and then removing values.
In it's entirety performanc.as_portfolio is the current
highest bottleneck, working on reducing time spent in that function.
Previously, on days that were trading days, but there with no
event data to process for that day, performance metrics were
not emitted, since the handling was based on having an event
trigger the daily performance metric.
Handled by grouping together performance messages, on market open,
for all days since the last market close.
Also, changes perf_tracker unit test to simulate missing data.
Taken from @richafrank's branch handling the same case.
When run over large amounts of data the use of ndict's gets and sets
become a large bottleneck, around 1/5th of the CPU time is spent
in ndict's __setattr__, __getattr__, etc.
By switching to an object for an event,
we reduce the penalty significantly.
Removes asserts that check for event being an ndict, as well as those
that assume a certain behavior of the __contains__ method for events.
Moved grouping by date earlier in the pipeline of generators,
prior to any date-dependent state getting involved. Grouping
pulls from the pipeline until the start of the next group,
which is in the next day. The effect of grouping after
slippage but before handle_data is that slippage and the algo
are out of sync by a transaction.