Change calculate_results to take explicit parameters for sim_params, env
and benchmark_events instead of reading those values off of the TestCase
instance.
This prepares for tests setting specific sim_params in each test case,
which is needed for an incoming refactoring of how the test data is set up.
Instead of calculating the position values for each stat result, e.g.
gross_exposure, net_liquidity etc.; get the positions upfront and then
calculate the period and position stats in order, passing each value
explicitly to the ones that follow it in the dependency chain.
e.g. the gross_value depends on the long_value and the short_value,
which called the position_values property for calculating both the
long_value and the short_value.
Removing the repeated calls to position_values (and
position_exposures) removes the need for the caching the last sale
prices and position amounts in separate vectors, since it is inexpensive
enough to read those values off of the positions dictionary held in the
position tracker.
This patch gives a small gain to ~500 sized portfolios, but the main
intent is to clear the path to not storing last_sale_prices on the
position objects at all. Removing all of the caching layer in this class
makes that change easier to apply. Removing the extra calls to
position_values also made this class easier to step through/reason about
when splicing in the new last sale price access, as well.
This commit removes the ability to reference a shared TradingEnvironment through the zipline.finance.trading module. In place, the classes that require a TradingEnvironment, or its child AssetFinder, contain their own references to those objects.
This commit also adds serialization utilities that allow for the pickling/unpickling of objects without unintentionally their TradingEnvironments or AssetFinders.
The write_data methods invokes the relevant AssetDBWriter subclass
to write data to the database. update_asset_finder is no longer
a relevant method since the AssetFinder is strictly a reader class.
Attack the startup bottleneck of creating the asset finders caches for a
large universe, which was between 1-2 seconds on development and
production machines.
Instead, allow the AssetFinder to be passed a sqlite3 file that has
already been populated and then hydrate asset objects only when an
equity is referenced for the first time.
To create aforementioned sqlite3, create an AssetFinder with an db_path
and `create_table` set to True. If `create_table` is set to False, the
prepopulated data in the sqlite file found at db_path will be used.
Default behavior is to use an in memory database.
Behavior that changes:
- Fuzzy lookup now only works on one character, that character needs to be
specified at write/metadata consumption time, since the fuzzy lookup key
is created by dropping the character from each symbol.
- Overwriting partially written metadata is no longer
supported. i.e. some unit tests allowed for inserting just the identifier,
and then later updating the symbol, end_date, etc.
Instead of building an upsert behavior at this time, this patch
changes the unit tests so that the data for each asset is only
inserted once.
Other notes:
- populate_cache is now removed, since there is no longer a two step
process of inserting metadata and then realizing that metadata into
assets. _spawn_asset is rolled into insert_metadata, so that a call to
insert_metadata both converts the metadata and makes it available in
the data store.
By having both the trade simulation main loop route events to "process"
methods based on event type and the process methods also checking event
type, there was some duplicated effort in doing that comparison many
times.
A particular case where this was noted in profiling was for the
`process_event` function which was checking if the type was not a trade
and returning early, when in a larger universe of stocks the value
returned False 99% of the time.
Instead provide separate process functions specific to each type,
e.g. e.g. `process_trade` and `process_transaction` and route traffic to
those functions in tradesimulation.
For a universe of 160 stocks on both no-op algo and an algo that rebuys
its universe every day, saw about a 10% increase locally.
Also:
- Add process_benchmark to blotter since internal subclass relies on
logic on benchmark, this allows the internal process_trade to be a
`pass`.
- Add warning on unrecoginzed event types.
on the number of per-tick update that occur since they were duplicated
per each PerformancePeriod. Also opens up the path to cythonizing the
entire object
Previously the last sale price was not correctly being set on
positions when the transaction arrived before the trade event.
The last sale price was defaulted to zero and never updated. This resulted
in one holding stocks that were bough >>0 and now had value 0 from
the perspective of returns. The returns would display correctly again
when the next trade of that security happened. For most securities trading is
frequent enough that there's no issue, but for some illiquid ones it took
hours to fix itself.
Updated test_perf_tracking:TestPerformanceTracker.test_minute_tracker
This test was based on assuming that last_sale_price was zero,
allowing the sharpe ratio to be calculated. The sharpe ratio can no longer
be calculated for this specific tested scenario and the test has been changed
accordingly.
Removes support for handling dividends as part of the algorithm
simulation stream, replacing it with an API in `TradingAlgorithm` for
supplying dividends as a DataFrame.
The function that handles a market close for daily frequency changed from
`handle_market_close` to `handle_market_close_daily`.
The function that is called at on the closing minute each day when running
minutely changed from `handle_intraday_close` to
`handle_intraday_market_close`.
Upgrade pep8 1.4.6 -> 1.5.7
Upgrade pyflakes 0.7.3 -> 0.8.1
Also, tweak some line indentations which now show up as errors,
because of the fixes/changes to visual indent detection between
pep8 versions.
recent 2 for 1 stock split, where 1 class C share was distributed
for each share of class A held.
Now a dividend can specify a sid and ratio of stock that will be paid
to owners of the original security. If the ratio is 2.0, then for every
existing share, two shares will be paid.
Use date sorted sources instead, instead of sorting with second
argument of Event, etc. since the `heapq.merge` behavior is using
the second part of the tuple, thus requiring a richer set of comparison
methods, which would only be used in the test context.
Use `date_sorted_sources` instead, so that sorting is done on algo time
and source id.
Use the six module to import functions and types that are
consistent between Python 2 and 3, so that one code base can
support both versions.
- Use integer types instead of int and long.
- Use string_types instead of basestring.
- Account for iteritems, itervalues, iterkeys.
- Use six.moves for filter and zip, reduce
- Use compatible bytes for md5 hasher.
- xrange and range
- Use `print()` function for all print calls
- Fix strip and format calls that were on the outside of the
print function for some reason.
(Which were breaking in Python 3 because of print returning None.)
- Remove commented out print calls.
A bug in the create_random_simulation_parameters allows the period
start to be a non-trading day.
That bug was causing the commission tests to randomly fail, e.g.
when the period start was on Good Friday, because the commission was
created on hour three of Good Friday, instead of the next Monday.
When it hit that case, the test commission is never processed.
Defend against that bug by using the first open of the simulation
parameters which is more guaranteed to be during market hours,
when creating the test commission.
This is in place of fixing the bug in the random parameters function
or making the parameters non-random, which are other potential fixes.
Remove the lists of DailyReturn objects in favor of using pd.Series
to store the return values.
Should make it easier to inspect the values when stepping through,
make the windowing of data to a certain range more facile by using,
and have some performance increases due to removing object creation
and member access.
These tests use the random simulation parameters, which is leading
to an intermittent failure.
We may want to consider removing the randomness, but in the meantime
the randomness is exposing a case where the cost basis is not the value
expected, so logging the sim parameter values to help track down what
parameters cause the failure.
So that the units match the other risk calculations, also
use annualized returns for beat and alpha.
Update answer key to match values calculated on the first day.
Also, update performance tracker test so that the returns used
are fractional instead of > 1, so that the annualized numbers are
more in line with real world values.