Fixes a bug where we'd fail to raise an error if the start/end of a
history window call don't aren't in the loader's calendar.
We were started dropping this error after a previous change swapped out
calls to `index.get_loc` with calls to `index.searchsorted` to avoid
creating hash tables in pandas.
When opening with a new `end_session`, i.e. opening for append, write the new
end session to the metadata.
Fixes an issue where the calendar on minute bar readers did not include the
recently appended day, causing reads on the last values to fail.
According, update append test to read a value, instead of checking table length.
Remove need for a consumer that is editing an existing minute bars directory to
reread the values which should not change from the metadata.
Add a test to the append on new day and truncate, which would be the common
usage of this method.
From pep-0008:
```
Always use a def statement instead of an assignment statement that binds a
lambda expression directly to an identifier.
Yes:
def f(x): return 2*x
No:
f = lambda x: 2*x
The first form means that the name of the resulting function object is
specifically 'f' instead of the generic '<lambda>'. This is more useful for
tracebacks and string representations in general. The use of the assignment
statement eliminates the sole benefit a lambda expression can offer over an
explicit def statement (i.e. that it can be embedded inside a larger expression)
```
This format is intended for storing data for all sids of an asset type,
e.g. equities or futures for a session. bcolz is not used to avoid the overhead
of creating the directories and files for each asset (which numbers around ~8000
for active equities) can be removed since the update is meant to be read at
once, instead of supporting the random access pattern needed by the simulation.
This patch only adds the reader/writer pair, with the management of finding the
paths to delta files and the application of the updates to the bcolz write left
to internal loader code.
Also, the update reader interface is intentionally constrained to the data for
an entire session to allow for an implementation that allows for mid-session updates.
Previously, if input to the BcolzMinuteBarWriter had the first bar on a
non-trading minute, the next trading session would be considered the
"first day" in the input. Now, we consider the previous trading session
the "first day".
The intention is to correctly associate minutes after official trading
hours on half days with session that closed early, not the following
session (a future improvement here would be to not accept minutes
outside trading hours).
Otherwise, we either raise an exception or filter out all unsafe values.
This addresses an issue where the BcolzMinuteBarWriter would scale up
values to convert to uint32, but the resulting values were too large,
and would be mangled.
Based on the approach we take in the BcolzDailyBarWriter.
When the following conditions occur,
- a `nan` occurred after a half day (e.g. on the Monday after
Thanksgiving, where the Friday would be a half day.)
-data was written to the span between the early close and where the market close
would have been if it were not an early close session
- a `nan` also occured on the last minute of the early market session.
the exisitng implementation would incorrectly return a `nan` when requesting a
forward filled price.
The steps that caused this error were.
1. Request for `'price'` on the market open of the day after the early close.
2. `nan` is found for that minute
3. `get_last_traded_dt` is called, and finds a volume that occurs after the
early close. e.g. `18:47` when the market close was `18:00`.
4. The minute position for `18:47` is used, when calling
`find_positon_of_minute`, since that value is after the `market_close` the
minute is set to the position of `18:00`` due to the delta logic in
5. Since there is also no data in at `18:00`, a `nan` is returned, even though
there were valid minutes earlier in the session. e.g. a non-zero volume at
`16:47` should have been used, but was not.
Fix by checking the current minute against the minute close when searching for
the last traded minute. If the minute is greater than the market close for the
corresponding day, continue the search until the minute position is within the
trading session.
This could also be fixed by enforcing that only zeros can be written between an
early close and the minute where the close would have been, but this fix allows
the reader to work with existing data.
The rolls are already calculated and assigned to `rolls_by_asset` earlier in the
`load_raw_arrays` method, so remove the duplication.
The change should not affect results.
The use of `slice_indexer` on all market minutes was taking about 110ms on my
development machine.
This change to getting the start and end indices changes the entire `_calendar`
method to take 10ms on the same machine.
Noticed while creating a `HistoryLoader` in a notebook context.
Use `roll_style` not `roll`.
Also, add test case to cover using the session bar reader `get_value`,
by adding a test which uses `close`, since only `contract` was being
exercised, which does not exercise the session daily bar reader.
In preparation for using `DataPortal` in notebooks, remove restriction on
the `HistoryLoader` to dates that are monotonically increasing. Notebook
usage of the `DataPortal` is more useful when the end of the history
window can be arbitrary dates without having to restart the notebook kernel.
Due to the implementation of the prefetch and caching logic, the end
date of history calls could previously only increase. e.g. `2016-11-01`,
`2016-11-02`, `2016-11-03`. This pattern was sufficient for backtesting
and live simulations, since the current time of the algorithm only ever increases.
With this change, which resets the underlying sliding window when the
last fetched idx is greater than the
Now calls to history in the same process with end dates such
`2016-11-01`, `2016-10-31`, `2015-11-02` should work.
To support using a `DataPortal` and `HistoryLoader` in a notebook, allow
the prefetch length to be configurable, so that it can be set to 0.
Unlike backtesting where the prefetch is useful for repeated history
windows viewed from datetimes which are monotonically increasing by a
small amount, the notebook usage of history windows needs only to
retrieve the exact data needed for the window specified.
This patch also fixes some boundary conditions related to rolls and
adjustments which were uncovered by querying for the adjustments with an
end date near the end of the window.
Rename _get_daily_window_for_sids to _get_daily_window_data.
Rename _get_minute_window_for_assets to _get_minute_window_data.
Rename _get_daily_data to get_daily_spot_value.
Instead of using the difference between the session close of the front
contract before the roll and and the open of back contract on the
beginning of the roll, use the close of both at the end of the session
before the roll.
The closes of the session prior to roll is in lieu of settlement data.
There have been cases where the requested start or end date is not in
the history calendar.
Add the beginning and of the calendar to the KeyError to give more
detail to figure out root cause.
Add roll style which takes the volume of the contracts into account.
If the volume moves from the front to the back before the auto close
date, the roll is put at that session.
Also, factors out some of the common logic shared with calendar based rolls.
Match the behavior of the minute bar reader, now that the session and
minute bar readers share a common interface.
isnull is slightly slower than checking against -1; however, n cases
where we check against illiquid trades in a tight loop, volume is
checked which is not using nan. The change here should be marginal with
regards to performance.
The last traded dt provided from the session bar reader which resamples
from minutes should provide a dt that is a session label, not one that
is at the minute frequency.
If a KeyError occurred in the adjustment logic, the exception would be
swallowed by the try block, which was intended to just check whether or
not there was an adjustment reader adjusted.
Discovered when some logic in a futures adjustment reader were failing
because of a mismatch of minute and session labels, which resulted in no
adjustments during windows when there should have been.
The minute to session sampling reading was creating two DataFrame
objects, the first to hold the minute data, and then a second returned
by the `DataFrame.groupby` to sample down to sessions.
Instead use the arrays returned by the minute readers `load_raw_arrays`
and implement sampling logic which takes advantage that the minutes
being passed start with the first minute of the first session and end
with the last minute of the last session.
On my machine this takes the tests in `test/test_continuous_futures`
from ~4.0 to about ~0.1 seconds.
Add `.adj('mul')` and `.adj('add')` methods on ContinuousFuture, which
when used with `history`, will calculate and apply adjustments so that
the values are adjusted to account for discounts and premiums during
rolls.
Example usage in an algo:
```
from zipline.api import continuous_future
def initialize(context):
context.cl_add = continuous_future('CL', offset=0, roll='calendar').adj('add')
context.cl_mul = continuous_future('CL', offset=0, roll='calendar').adj('mul')
context.cl = continuous_future('CL', offset=0, roll='calendar')
schedule_function(print_history)
def print_history(context, data):
frame = data.history([context.cl, context.cl_add, context.cl_mul],
['price', 'sid'],
20,
'1d')
print 'unadjusted'
print frame.loc[:, :, context.cl]
print 'adjusted add'
print frame.loc[:, :, context.cl_add]
print 'adjusted mul'
print frame.loc[:, :, context.cl_mul]
```
Start making the equity adjustments calculations for the history loader
conform to the same method signature as `load_adjustments` provided by
`SQLiteAdjustmentReader, so that an `AdjustmentReader` interface can
begin to take form.
This prepares for creating a `DispatchAdjustmentReader` which will route
adjustment calculations for equities to the
`HistoryCompatibleUSEquityAdjustmentReader` and continuous futures to a
not yet implemented adjustment reader. All of these readers will share
the `load_adjustments` method.
Add a perspective offset to `AdjustedArrayWindow` and `AdjustedArray`,
so that `HistoryLoader` does not need to twiddle with offsets to support
viewing the data from the bar after end of the window, (Which is the
case when a '1d' history window is retrieved in minute mode, which is
explained in the docstring for `HistoryLoader.history`)
Presently, this simplifies the logic in
`HistoryLoader._get_adjustments_in_range`, and other incoming
AdjustmentReader's, (e.g. the roll based adjustment reader for continous
futures.) This patch should also make it easier for history and pipeline
to converge on a singular `load_adjustments` method.
Enable unadjusted history for continuous futures.
The history array is filled by the values for the underlying contracts,
where the contract used changes based on rolls.
e.g., if a `1d` history window was over the range
`2016-01-20` -> `2016-02-29` with contracts with a suffix of `F16` that
rolls at the beginning of the session on `2016-01-26`, `G16` on
`2016-02-26`, and `H16` on `2016-03-26`. The `2016-01-20` ->
`2016-01-25` portion would use the values for `F16', the `2016-01-26` ->
`2016-02-25` portion would use `G16` and the `2016-02-26` ->
`2016-02-29` portion would use `H16`.
Using the same contracts as above, a `1m` history window over the range
(using a timezone of US/Eastern) `2016-01-25 4:00PM` -> `2016-01-25
7:00PM` would fill the `4:00PM` -> `6:00PM` portion with data for `F16`
and the `6:01PM` -> `7:00PM` portion with data for `G16`, since the
beginning of the `2016-01-26` session is `2016-01-25 6:01PM`.
Supports `1d` and `1m`.
Also adds the `sid` field to `history` to assist in showing the active
contract at each dt in the window.
Add `chain`field to current, as well as supporting methods in DataPortal
and OrderedContracts.
Enables the following example:
```
from zipline.api import continuous_future
def initialize(context):
context.primary_cl = continuous_future('CL', offset=0, roll='calendar')
schedule_function(print_current_chain)
def print_current_chain(context, data):
chain = data.current_chain(context.primary_cl)
print 'datetime={0}'.format(get_datetime())
print 'primary={0}'.format(chain[0])
print 'secondary={0}'.format(chain[1])
print 'tertiary={0}'.format(chain[2])
```
```
datetime=2015-12-23 14:31:00+00:00
primary=Future(1058201602 [CLG16])
secondary=Future(1058201603 [CLH16])
tertiary=Future(1058201604 [CLJ16])
```
Also:
- make return types of OrderedContracts methods compatible across
architectures. (Noticed while adding `active_chain` method.)
- Add year suffix to future contract names in test data.