This patch lays the groundwork for a compute engine designed to
facilitate construction of factor-based universe screening and portfolio
allocation. It contains:
A new module, `zipline.modelling`, containing entities that can be used
to express computations as dependency graphs. Each node in such a graph
is an instance of the base `Term` class, defined in
`zipline.modelling.term`. Dependency graphs are executed by instances
of `FFCEngine`, defined in `zipline.modelling.engine`.
A new module, `zipline.data.ffc`, containing loaders and dataset
definitions for inputs to the modelling API.
New `TradingAlgorithm` api methods: `add_factor`, and `add_filter`.
These methods can only be called from `initialize`, and are used to
inform the algorithm that each day it should compute the given terms.
Computed factor results are made available through a new attribute of
the `data` object in `before_trading_start` and `handle_data`. Computed
filter results control which assets are available in the factor matrix
on each day.
On Ubuntu (assume this is true for all posix) tickers containing a slash char ("CRD/A", "BRK/A", both valid tickers with yahoo api accessible timeseries) lead to a path error in loader.py line 286.
Python 2 and 3 throw different exception types when a file does
not exist.
Catch both exception types to trigger the download, so that the
loader works under both Python versions.
The compatibility between the two versions was made easier by
letting pandas handle the heavy lifting, so pass filenames to the
pandas serialization methods, instead of dealing doing the file
handling and reading/writing within the data module.
Use the six module to import functions and types that are
consistent between Python 2 and 3, so that one code base can
support both versions.
- Use integer types instead of int and long.
- Use string_types instead of basestring.
- Account for iteritems, itervalues, iterkeys.
- Use six.moves for filter and zip, reduce
- Use compatible bytes for md5 hasher.
- xrange and range
- Use `print()` function for all print calls
- Fix strip and format calls that were on the outside of the
print function for some reason.
(Which were breaking in Python 3 because of print returning None.)
- Remove commented out print calls.
Check for whether or not the index's timezone is UTC or not before
attempting to localize, since an already localized index throws an
error when tz_localize is called.
Remove the lists of DailyReturn objects in favor of using pd.Series
to store the return values.
Should make it easier to inspect the values when stepping through,
make the windowing of data to a certain range more facile by using,
and have some performance increases due to removing object creation
and member access.
The dump and update of curves were both using the entire history.
So instead of having the update use a different code path, always
use dump and overwrite.
Both unit tests and repeated runs while developing an algorithm
can benefit from having a local copy of the Yahoo data, instead
of doing a network call each time.
Store the web request results as a csv file in a cache directory,
named by symbol and date range.
This utility was referring to functions that had been long since
removed in the loader module.
If the utility is still needed by some, it can be added back in,
but using the pandas read/write instead of msgpack.
Instead of writing our own serialization using msgpack, leverage
the csv serialization provided by pandas.
Also, lessens the need for msgpack and functions in date_utils.
Before we were setting benchmark returns on the first day
to 0. This commit changes this by calculating the benchmark
return from open to close.
According to @eherbert this is also what the answer key does.
The loader module printed some warning messages, these could
be changed to use a logger, but for now convert to use the print
function for compatibility with Python 3.
Python 3 requires using dot syntax for relative imports,
otherwise the import is treated as an absolute import, i.e.
an import of a module from outside of the project.
By using dot syntax now, imports should be compatible with both
Python 2.7 and Python 3.
Make adjustments for using Python built-in ElementTree instead of lxml
based lxml.
lxml was edited out during pulling in of memory friendly loading of
treasury curves, however some of the use of ETree was lxml specific.
Mea culpa.
On ranges with missing data from Yahoo, e.g.:
On 2013-04-2 the date range of April 2013-03-29 failed because
of the first day in the range being Good Friday, and the API not
yet updating for the Monday after.
Handle the 404 that is found by raising and warning that no
benchmark data was found, but continuing on.