Commit Graph

65 Commits

Author SHA1 Message Date
jkleint 82273e296f Propagate exceptions in loader to prevent variable reference before use
`data.loader.ensure_benchmark_data()` was trying to use data after an exception was raised loading it.  The code was logging and swallowing exceptions; this re-raises.
2016-09-23 15:55:55 -07:00
Scott Sanderson a8a2cc1582 PERF: Remove module-scope calendar creations.
Remove module scope invocations of `get_calendar('NYSE')`, which cuts
zipline import time in half on my machine. This make the zipline CLI
noticeably more responsive, and it reduces memory consumed at import
time from 130MB to 90MB.

Before:

$ time python -c 'import zipline'

real    0m1.262s
user    0m1.128s
sys     0m0.120s

After:

$ time python -c 'import zipline'

real    0m0.676s
user    0m0.536s
sys     0m0.132s
2016-09-06 09:57:23 -04:00
Jean Bredeche 6fb4923cc7 Re-implemented the Calendar API.
Instead of having separate ExchangeCalendar and TradingSchedule objects, we
now just have TradingCalendar.  The TradingCalendar keeps track of each
session (defined as a contiguous set of minutes between an open and a close).
It's also responsible for handling the grouping logic of any given minute
to its containing session, or the next/previous session if it's not a market
minute for the given calendar.
2016-07-12 13:13:50 -04:00
jfkirk 26742dda67 MAINT: Removes obsolete tradingcalendar module 2016-06-08 13:34:19 -04:00
Joe Jevnik 0bd790d122 MAINT: replace usages of pandas.io.data.DataReader with pandas_datareader.data.DataReader 2016-05-20 11:23:28 -04:00
Joe Jevnik 59c8e371a2 ENH: Updates the cli, data bundles and extensions.
Adds the data bundle concept which makes it easy for users to register
loading functions to build out minute and daily data along with an
assets db and adjustments db. By default we have provided a `quandl`
bundle which pulls from the public domain WIKI dataset. Users may
register new bundles by decorating an ingest function with
`zipline.data.bundles.register(<name>)`. This also provides a
`yahoo_equities` function for creating an ingestion function that will
load a static set of assets from yahoo.

The cli is now structured as a couple of subcommands and has been
changed to `python -m zipline`. The old behavior of `run_algo.py` has
been moved to the `run` subcommand. This is almost entirely the same
except that it now takes the name of the data bundle to use, defaulting
to `quandl`.

The next subcommand is `ingest` which takes the name of
a data bundle to ingest. This will run the loading machinery and write
the data to a specified location that `run` can find.

There is also a `clean` subcommand which deletes the data that was
written with `ingest`.

Extensions have also been added to zipline. This is an experimental
feature where users can provide an extra set of python files to run at
the start of the process. These can be used to configure aspects of
zipline. Right now the only thing that is supported in an extension file
is the registration of a new data bundle.
2016-05-03 18:38:24 -04:00
Eddie Hebert 37d341bb95 BUG: Truncate treasury curves to env min date. 2016-03-08 17:14:28 -05:00
Joe Jevnik 5eb453675d BUG: don't fail if you cannot make a webrequest 2016-02-11 18:46:43 -05:00
Jeremiah Lowin e44c0d42e1 Replace print with logger.info 2016-01-25 19:22:28 -05:00
dmichalowicz 4f24a32c45 BUG: Benchmark and treasury curves data missing on first download 2015-12-21 13:38:24 -05:00
Scott Sanderson 43ac9eab5c ENH: Check getmtime on download locations.
Rather than repeatedly try and fail to download data that's not yet
available, only try to download again if we haven't successfully
downloaded in the last hour.
2015-11-13 18:06:04 -05:00
Scott Sanderson 1b7d0c9477 MAINT: Add __future__ print function import.
We do print(stock) in this file, which happens to work in py2, but is
confusing.
2015-11-13 18:06:04 -05:00
Scott Sanderson 75f7c44223 BUG: Better check for last date.
Use get_loc to find the trading day that ended 2 days before now.
2015-10-25 16:37:59 -04:00
Scott Sanderson cabe22ae8e ENH: Always use Adjusted Close for benchmarks.
Previously we were using Close, and we calculated returns on the first
day of a window against the Open for that day.  We now always look back
an extra day to get the previous day's close.
2015-10-25 16:37:59 -04:00
Scott Sanderson d82cfb1e64 MAINT: Final polish on loader rewrites.
- Fixes an issue with the canadian treasury loader where it would never
  have enough data to not redownload because it can only download data
  in the last 10 years.
- Uses module objects directly instead of lazy imports.
- Adds lots of docstrings.
2015-10-25 16:37:59 -04:00
Scott Sanderson 24d26f9e63 MAINT: Rewrite the benchmark loader. 2015-10-25 16:37:59 -04:00
Scott Sanderson 8c38278783 ENH: Rewrite treasury loader using pandas.
Replaces our custom XML parsing with a single call to `pd.read_csv`
against the federal reserve's API.  This produces nearly identical
results as compared to the old loader, but it's dramatically simpler and
roughly 10x faster on my machine.

The average difference in magnitude between new and old is approximately
10e-7, and only one entry is different to a degree greater than the
number of significant figures provided by treasury.gov.

Additionally, the new loader correctly ignores Columbus Day of 2010, for
which the old loader erroneously produced an all-NaN row.

This also changes the interface that treasury modules modules are
required to implement. Modules must now supply a `get_treasury_data`
function that returns a `DataFrame` with a daily `DatetimeIndex` and a
column for each supported treasury duration.

Detailed comparison between results from new and old loader::

    from zipline.data.treasuries import get_treasury_data
    new = get_treasury_data() # New implementation
    old = pd.read_csv(  # Previously cached data
        '/home/ssanderson/.zipline/data/treasury_curves.csv'
        parse_dates=[0],
        index_col=0,
    )
    # These columns were unused.
    del old['tid']; del old['date']
    old = old.tz_localize('UTC')
    old.dropna(how='all')
    # old data erroneously contained an all-NaN entry for Columbus Day
    # in 2010.  Remove before comparing.
    old = old.dropna(how='all')

    In [25]: len(new) == len(old)
    Out[25]: True

    In [26]: abs(old - new).max()
    Out[26]:
    10year    2.000000e-04
    1month    6.938894e-18
    1year     1.000000e-04
    20year    1.000000e-04
    2year     2.000000e-04
    30year    1.000000e-04
    3month    1.000000e-03
    3year     1.000000e-04
    5year     1.387779e-17
    6month    1.000000e-04
    7year     1.000000e-04
    dtype: float64

    In [27]: abs(old - new).mean()
    Out[27]:
    10year    3.097414e-08
    1month    4.396534e-19
    1year     1.548707e-08
    20year    3.624502e-08
    2year     4.646120e-08
    30year    1.830496e-08
    3month    1.549427e-07
    3year     1.548707e-08
    5year     1.702619e-18
    6month    1.548707e-08
    7year     1.548707e-08
    dtype: float64

Since www.treasury.gov only reports values up to three significant
digits, we should only care about differences of greater than 1e-3.

There is exactly one such difference: the entry for the three month bond
on 1999-10-01::

    In [60]: new[(abs(new - old) >= 1e-3).any(axis=1)].T
    Out[60]:
    Time Period  1999-10-01 00:00:00+00:00
    1month                             NaN
    3month                          0.0498
    6month                          0.0501
    1year                           0.0530
    2year                           0.0573
    3year                           0.0583
    5year                           0.0590
    7year                           0.0622
    10year                          0.0600
    20year                          0.0657
    30year                          0.0615

    In [61]: old[(abs(new - old) >= 1e-3).any(axis=1)].T
    Out[61]:
            1999-10-01 00:00:00+00:00
            10year                     0.0600
            1month                        NaN
            1year                      0.0530
            20year                     0.0657
            2year                      0.0573
            30year                     0.0615
            3month                     0.0488
            3year                      0.0583
            5year                      0.0590
            6month                     0.0501
            7year                      0.0622

The US Treasury website (our old source) provides a value of 0.488 here,
whereas the Federal Reserve site (our new source) provides a value of
0.498.
2015-10-25 16:37:59 -04:00
Scott Sanderson 3c954af08c MAINT: Just do searchsorted with the date.
Previously we were converting our date to a string, then calling
`searchsorted` on the DatetimeIndex with the string, which would cause
pandas to convert the string back into a date to actually do the lookup.
2015-10-25 16:37:59 -04:00
Scott Sanderson 854b6638b2 MAINT: Remove default values from dump_treasury_curves.
We never call the function without passing them explicitly.
2015-10-25 16:37:59 -04:00
Scott Sanderson ef4f642e62 ENH: Compute engine architecture for FFC API.
This patch lays the groundwork for a compute engine designed to
facilitate construction of factor-based universe screening and portfolio
allocation.  It contains:

A new module, `zipline.modelling`, containing entities that can be used
to express computations as dependency graphs.  Each node in such a graph
is an instance of the base `Term` class, defined in
`zipline.modelling.term`.  Dependency graphs are executed by instances
of `FFCEngine`, defined in `zipline.modelling.engine`.

A new module, `zipline.data.ffc`, containing loaders and dataset
definitions for inputs to the modelling API.

New `TradingAlgorithm` api methods: `add_factor`, and `add_filter`.
These methods can only be called from `initialize`, and are used to
inform the algorithm that each day it should compute the given terms.
Computed factor results are made available through a new attribute of
the `data` object in `before_trading_start` and `handle_data`.  Computed
filter results control which assets are available in the factor matrix
on each day.
2015-07-29 12:30:46 -04:00
jfkirk b84ac01cbf ENH: Adds futures trading and asset management logic to TradingAlgorithm and performance classes 2015-06-11 11:35:49 -04:00
Eddie Hebert 0fa44471be MAINT: Change expected type of treasury curves from load to DataFrame.
Instead of converting the curves back and forth from dictionaries to
DataFrame and back, use the DataFrame format when passing to
environment.
2015-04-20 10:26:09 -04:00
Benjamin Berman ef598c7130 BUG: Handle a ValueError on from_csv calls
The cached market data could be corrupted. Pandas raises a ValueError in
that case, and this error handles it.
2015-04-14 12:40:37 -04:00
Scott Sanderson 885db87dea MAINT: Use logger instead of printing in loader.py
Makes it easier to filter logs when they're not desired.
2015-04-14 12:40:37 -04:00
warren-oneill b62fadc76f adding NYSE trading_day and trading_days as default in load_market_data() 2015-04-08 16:57:23 -04:00
warren-oneill aa872afdf4 adding updates from master 2015-04-08 16:57:12 -04:00
warren-oneill 49c168b3d0 adding trading_day and trading_days as variables to load_market_data 2015-04-08 16:56:13 -04:00
Jonathan Kamens e942275108 STY: Flake8
Upgrade the version of the flake8, pep8, and mccabe PyPI packages, and
make the code changes necessary for compatibility with the updated
packages.
2015-03-19 17:21:25 -04:00
Jonathan Kamens c46a3afa3c BUG: Don't download benchmarks / treasury curves unnecessary
Fix an off-by-one error which was causing us to download the benchmark
and treasury curves over and over again even when they weren't needed.
2015-03-08 09:31:50 -04:00
Luke Schiefelbein 1542b41fbd BUG: Fix price caching for tickers with '/' char
On Ubuntu (assume this is true for all posix) tickers containing a slash char ("CRD/A", "BRK/A", both valid tickers with yahoo api accessible timeseries) lead to a path error in loader.py line 286.
2014-11-19 11:26:27 +01:00
Thomas Wiecki 820115f7be MAINT: Replace iterkv with iteritems.
iterkv is being deprecated as of pandas 0.14.
2014-10-22 17:25:37 +02:00
twiecki 4bdecd6402 STY: PEP8 fixes. 2014-03-26 20:46:20 +09:00
Eddie Hebert 71cda461c5 BUG: Fix check for cached public data for Python 2.7
Python 2 and 3 throw different exception types when a file does
not exist.

Catch both exception types to trigger the download, so that the
loader works under both Python versions.
2014-01-07 17:19:16 -05:00
Eddie Hebert 46ab748dd2 MAINT: Use pandas for data cache file I/O
The compatibility between the two versions was made easier by
letting pandas handle the heavy lifting, so pass filenames to the
pandas serialization methods, instead of dealing doing the file
handling and reading/writing within the data module.
2014-01-07 12:01:08 -05:00
Eddie Hebert b4959e46cf MAINT: Use six for Python 3 compatible names and behavior.
Use the six module to import functions and types that are
consistent between Python 2 and 3, so that one code base can
support both versions.

- Use integer types instead of int and long.
- Use string_types instead of basestring.
- Account for iteritems, itervalues, iterkeys.
- Use six.moves for filter and zip, reduce
- Use compatible bytes for md5 hasher.
- xrange and range
2014-01-07 11:33:50 -05:00
Eddie Hebert 54ddd1c109 MAINT: print function clean up in preparation for Python 3
- Use `print()` function for all print calls
- Fix strip and format calls that were on the outside of the
  print function for some reason.
  (Which were breaking in Python 3 because of print returning None.)
- Remove commented out print calls.
2014-01-04 20:55:43 -05:00
David Stephens e45528458f ENH: Added functionality to download Canadian treasury curves.
Added automatic switching of treasury curves based on index sent to environment.
2013-12-27 13:27:43 -05:00
Eddie Hebert 50800a9863 BUG: Fix data cache filepath on Windows.
Prevent the ':' char, generated by converting a datetime to a string,
from creating on incompatible filepath for Windows.
2013-11-18 20:37:45 -05:00
Eddie Hebert 43b85cffb0 MAINT: Calculate tradingcalendar with days beyond the current day.
To make 'next open' calculations more straight ahead, calculate more
than enough days in the trading calendar.
2013-11-11 15:48:44 -05:00
Eddie Hebert 797cb8ece3 BUG: Fix bad reference to benchmark timezone in loader. 2013-11-11 14:39:11 -05:00
Eddie Hebert 89793e371c MAINT: Protect loader against Series saved with no tz.
Checking for tz.UTC is not sufficient, since it is possible for
the index.tz value to be None.
2013-11-11 14:17:14 -05:00
Eddie Hebert c45c1a22e1 BUG: Only localize benchmark index if it is naive.
Check for whether or not the index's timezone is UTC or not before
attempting to localize, since an already localized index throws an
error when tz_localize is called.
2013-10-29 13:17:58 -04:00
Eddie Hebert 2d64ab8bfe BUG: Fix naive timestamps in benchmarks.
Always convert the benchmarks to UTC, not just on reload.
2013-10-29 08:36:53 -04:00
Eddie Hebert 37c56b9aa4 MAINT: Use Series throughout for daily returns.
Remove the lists of DailyReturn objects in favor of using pd.Series
to store the return values.

Should make it easier to inspect the values when stepping through,
make the windowing of data to a certain range more facile by using,
and have some performance increases due to removing object creation
and member access.
2013-10-19 23:06:18 -04:00
Eddie Hebert 71f03e9537 BUG: Ensure loading benchmarks include latest dates.
The Series `.append` does not update in-place, assign the value
to `saved_benchmarks` so that we update the newest benchmarks.
2013-10-07 12:17:26 -04:00
Eddie Hebert 6ac5d49573 MAINT: Remove duplicated treasury loading code.
The dump and update of curves were both using the entire history.
So instead of having the update use a different code path, always
use dump and overwrite.
2013-10-02 11:10:15 -04:00
Eddie Hebert 5ddc134379 ENH: Cache daily data to eliminate repeat network calls.
Both unit tests and repeated runs while developing an algorithm
can benefit from having a local copy of the Yahoo data, instead
of doing a network call each time.

Store the web request results as a csv file in a cache directory,
named by symbol and date range.
2013-10-01 15:04:02 -04:00
Eddie Hebert b44fc20e4e MAINT: Remove msgpack as a dependency.
Now that the data serialization uses pandas, msgpack is no longer
needed.
2013-10-01 14:28:11 -04:00
Thomas Wiecki a66f45b598 MAINT: Moving yahoo loader from factory to utils. 2013-10-01 14:09:26 -04:00
Eddie Hebert b65f7f42c0 BUG: Fix updating treasury curves.
A transpose back to the serialization shape was left out.

Also, fixes empty return from update.
2013-10-01 11:57:04 -04:00