Commit Graph

67 Commits

Author SHA1 Message Date
Eddie Hebert f4891b0a08 TST: Key trading calendar fixture with Asset types
Instead of using strings of 'equities' and 'futures', use the Asset
subclasses to key the trading calendar fixtures.
2016-08-08 03:49:48 -04:00
Eddie Hebert dd2c7db22d TST: Use sum for volume on daily data resample.
Change the mock minute data to no longer use an increasing arange, so
that a days worth of minute data can be summed and fit inside of a
uint32.

This change was required because of working on new test data that looked
like [0, 100, 200, 0, ] which was resulting in a daily rollup of 0 data,
when the coverage needed a non-0 value.

Also, factor out the resampling function, with an eye on a making it
easier to convert from minute bars to daily bars during ingest/load
processes.
2016-08-05 14:24:14 -04:00
Eddie Hebert e934c6aeaf TST: Make room for multiple calendars in tests.
When adding fixtures for futures data, there will be a need for multiple
calendars in the fixture ecosystem. e.g. a test that includes both
equities and futures would need an overall calendar which encompasses
both equities and futures; however, the test data for equities should
still still be limited to the bounds set by the NYSE calendar.

Make the fixtures that setup trading calendars and values dervied from
the trading calendar (e.g. trading sessions) accept an iterable of
calendars which need to be created, then populate those values into a
dict keyed by the calendar name.

Change `WithNYSETradingDays` to include sessions in the name,
since we are moving to session as the name for the 'day' unit.

Provide `trading_days` which is really "NYSE trading sessions` on
`WithTradingSessions` for backwards compatibility.
2016-08-05 12:17:27 -04:00
Jean Bredeche 9ae725b940 ENH: update register_calendar API to take a specific name 2016-08-02 23:12:07 -04:00
Joe Jevnik 4265a13edf Revert "Merge pull request #1354 from quantopian/revert-1302-point-in-time-asset-db"
This reverts commit 3b633011c6, reversing
changes made to 70ac5323de.
2016-08-02 14:25:10 -04:00
Joe Jevnik 814a2be7b7 Revert "Point in time asset db" 2016-07-27 23:29:08 -04:00
Joe Jevnik bc10447b9e TST: add assert_equal dispatch for other ndframe objects 2016-07-26 13:34:58 -04:00
Jean Bredeche 5a0f840917 Clean up daily bar reader/writer to take advantage of new trading calendar. The reader
is backwards-compatible with the previous format.

In USEquityLoader, use dailyreader's trading_calendar.

This is backwards compatible and will fall back to the NYSE calendar if
the reader doesn’t have a calendar specified.
2016-07-15 15:13:57 -04:00
Jean Bredeche 295cfa3846 Fix some mistakes from a previous merge.
No tests failed, which was worrisome.  Will file issues to take a look
later.
2016-07-14 15:40:36 -04:00
Jean Bredeche e22108b7ef Merge pull request #1312 from quantopian/24-5-backtesting
Re-implemented the calendar API.
2016-07-14 10:05:18 -04:00
Joe Jevnik 958d455a7a ENH: Support default params for terms 2016-07-12 18:49:24 -04:00
Jean Bredeche 6fb4923cc7 Re-implemented the Calendar API.
Instead of having separate ExchangeCalendar and TradingSchedule objects, we
now just have TradingCalendar.  The TradingCalendar keeps track of each
session (defined as a contiguous set of minutes between an open and a close).
It's also responsible for handling the grouping logic of any given minute
to its containing session, or the next/previous session if it's not a market
minute for the given calendar.
2016-07-12 13:13:50 -04:00
Eddie Hebert 51eda06323 MAINT: Add equity to naming of bar data classes.
In preparation of adding futures, add equity to the names of both the
classes and methods for writing bcolz data. Futures data will use a
different minutes per day with a separate reader. This change will allow
both equity and futures fixtures to be side by side.

Also, break out the method which generates the dataframes and trading
days member into fixtures (`EquityMinuteBarData` and
`EquityDailyBarData`) on which the `*BarReader` fixture depends.  This
fixture is separated out to enable reader/writers in different formats
to use the same data setup. (There is internal code which needs to write
minute and daily bar data in a database format.)
2016-06-30 08:21:42 -04:00
dmichalowicz 393f82e81e ENH: Add single-column input/output capabilities to pipeline terms 2016-06-23 10:24:09 -04:00
Richard Frank 69b6cff964 Merge pull request #1289 from quantopian/wildcard
wildcard object and doctests
2016-06-22 18:09:57 -04:00
Joe Jevnik d608e0af4f Merge pull request #1276 from quantopian/blaze-loader-checkpoints
ENH: add ffill checkpointing to blaze core loader
2016-06-21 16:48:08 -04:00
Joe Jevnik 5925107052 TST: fix doctests to actually run 2016-06-21 15:07:03 -04:00
Joe Jevnik c0d08f9c0d TST: Adds wildcard object for assert_equal 2016-06-20 14:20:18 -04:00
Eddie Hebert 9f02f147b0 Merge pull request #1283 from quantopian/custom-paths-for-fixtures
TST: Allow customization of various fixture paths.
2016-06-20 10:08:27 -04:00
Joe Jevnik cb67ee425e TST: coverage 2016-06-17 17:59:56 -04:00
Eddie Hebert d6793e7a71 TST: Allow customization of various fixture paths.
To support testing configurations which need control over the full path
to the asset, adjustment, and equity bcolz directories; which is
required by some of our internal testing which exercises servers which
coordinate these files via a date slug in the full path.

Also, by allowing customization of the full path, it is now possible to
have the AssetFinder and AdjustmentReader sqlite databases be written to
disk, which is also required for our server testing setup.
2016-06-17 16:13:31 -04:00
Richard Frank 3d7f63f8c1 MAINT: Removed unused ExceptionSource
No longer used since our lazy data access changes.
2016-06-15 10:43:20 -04:00
Scott Sanderson bc302beec9 MAINT: Rework event datasets.
- Refactored EventsLoader and BlazeEventsLoader to not require a
  subclass per dataset.  Instead, you now pass a map from columns to
  event fields directly to the EventsLoader constructor.

- Removed a large number of Quantopian-specific datasets and associated
  tests.

- Rewrote the core logic of EventsLoader and BlazeEventsLoader to share
  index calculations across multiple requested columns.

- Fixed a bug where event fields were incorrectly forward-filled when
  null values were present in an event.
2016-06-10 19:22:27 -04:00
Andrew Daniels 02a91ec4ab MAINT: Removes the set_first_trading_day method of DataPortal
Since the first trading day is now passed directly to the DataPortal on
init, there's no need for a method that does this. Moves all the
additional logic/assignments into the init. Also corrects an issue where
we would never create certain attributes if self._first_trading_day was
None.

Adds the ability to specify the first trading day for a data portal in a
test case when using the WithDataPortal fixture.
2016-06-08 13:34:23 -04:00
jfkirk d437a5d675 MAINT: Rebase fixes 2016-06-08 13:34:23 -04:00
jfkirk 2a8f69fc01 MAINT: DataPortal env -> asset_finder 2016-06-08 13:34:22 -04:00
jfkirk d9fc514fa8 TST: Adds TradingSchedule test fixture 2016-06-08 13:34:20 -04:00
jfkirk 26742dda67 MAINT: Removes obsolete tradingcalendar module 2016-06-08 13:34:19 -04:00
jfkirk 241abda2a5 STY: Flake8 2016-06-08 13:34:19 -04:00
jfkirk 4b7390ac81 WIP: Refactors tests to use TradingSchedule 2016-06-08 13:34:19 -04:00
jfkirk c8304e8601 ENH: Adds ExchangeCalendar, TradingSchedule, and implementations
Conflicts:
	tests/data/test_minute_bars.py
	tests/data/test_us_equity_pricing.py
	tests/finance/test_slippage.py
	tests/pipeline/test_engine.py
	tests/pipeline/test_us_equity_pricing_loader.py
	tests/serialization_cases.py
	tests/test_algorithm.py
	tests/test_assets.py
	tests/test_bar_data.py
	tests/test_benchmark.py
	tests/test_exception_handling.py
	tests/test_fetcher.py
	tests/test_finance.py
	tests/test_history.py
	tests/test_perf_tracking.py
	tests/test_security_list.py
	tests/utils/test_events.py
	zipline/algorithm.py
	zipline/data/data_portal.py
	zipline/data/us_equity_loader.py
	zipline/errors.py
	zipline/finance/trading.py
	zipline/testing/core.py
	zipline/utils/events.py
2016-06-08 13:34:18 -04:00
Andrew Daniels 71f12ec272 MAINT: Adds first_trading_day arg to DataPortal
Instead of inferring it from the minute/daily writer, we now require the
first trading day to be passed explicitly, so the creator of the
DataPortal controls what is used as the first trading day.
2016-06-02 13:16:43 -04:00
Eddie Hebert 2f80e94203 TST: Enable sourcing daily data from minute data.
Allow `WithBcolzDailyBarData` to opt-in to reading data defined by
`WithBcolzMinuteBarData`, so that the daily and minute test for the same
asset and dts correlate between the two readers.
The correlation is relevant for history tests which blend daily and
minute data.

Also, make the test data for the split and mergers assets in the minute
suite align at the thousands place if the adjustmets are applied
correctly, by starting the prices with a base of 4000 and then halving
the start value each day.
2016-06-02 12:28:53 -04:00
Scott Sanderson c03bbbc928 BUG: Delete attrs before firing callbacks.
Prevents failures to remove sqlite files when cleaning up temporary
directories.
2016-05-25 14:17:57 -04:00
Joe Jevnik 784d5f4a16 Merge pull request #1199 from quantopian/boybands-factor
BollingerBands factor
2016-05-13 15:35:10 -04:00
Joe Jevnik a345e6f3f5 TST: Clean up metaclass usage in fixtures 2016-05-12 17:00:51 -04:00
Joe Jevnik 9b76731143 ENH: adds with_metaclasses and tests for metautils 2016-05-12 15:58:19 -04:00
Maya Tydykov 6b60e447a0 MAINT: incorporate string support
STY: remove unused imports

MAINT: change dtype to object for compatibility with python3

MAINT: rename pipeline columns and constants for clarity

MAINT: rename column
2016-05-12 10:50:31 -04:00
Scott Sanderson 8de45540f2 ENH: NaN semantics for LabelArray missing values. 2016-05-04 15:54:50 -04:00
Scott Sanderson bb6f908036 TEST: Add test for categorical postprocessing. 2016-05-04 15:54:50 -04:00
Scott Sanderson 5f190395ad ENH: Add support for strings in Pipeline.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
  LabelArray is conceptually similar to pandas.Categorical, in that it
  stores data with many duplicate values as indices into an array of
  unique values.  For string data with many duplicates (e.g. time-series
  of tickers or or industry classifications), this provides multiple
  orders of magnitude of improvement when doing string operations,
  especially string comparison/matching operations.

- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
  and a corresponding ObjectOverwrite adjustment.

- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
  This method is called on the final result of any pipeline expression
  after screen filtering has occurred. The default implementation of
  ``postprocess`` is identity, but Classifier overrides it to coerce
  string columns into pandas.Categoricals before presenting them to the
  user.
2016-05-04 15:50:52 -04:00
Joe Jevnik 59c8e371a2 ENH: Updates the cli, data bundles and extensions.
Adds the data bundle concept which makes it easy for users to register
loading functions to build out minute and daily data along with an
assets db and adjustments db. By default we have provided a `quandl`
bundle which pulls from the public domain WIKI dataset. Users may
register new bundles by decorating an ingest function with
`zipline.data.bundles.register(<name>)`. This also provides a
`yahoo_equities` function for creating an ingestion function that will
load a static set of assets from yahoo.

The cli is now structured as a couple of subcommands and has been
changed to `python -m zipline`. The old behavior of `run_algo.py` has
been moved to the `run` subcommand. This is almost entirely the same
except that it now takes the name of the data bundle to use, defaulting
to `quandl`.

The next subcommand is `ingest` which takes the name of
a data bundle to ingest. This will run the loading machinery and write
the data to a specified location that `run` can find.

There is also a `clean` subcommand which deletes the data that was
written with `ingest`.

Extensions have also been added to zipline. This is an experimental
feature where users can provide an extra set of python files to run at
the start of the process. These can be used to configure aspects of
zipline. Right now the only thing that is supported in an extension file
is the registration of a new data bundle.
2016-05-03 18:38:24 -04:00
Joe Jevnik efac476976 ENH: make BcolzMinuteBarWriter.write take iterable
Updates the BcolzMinuteBarWriter.write api to allow users to pass their
data as a stream instead of requiring that they loop over their data
externally. This matches the API presented by BcolzDailyBarWriter.
2016-04-29 16:14:48 -04:00
Maya Tydykov 0191d9d903 MAINT: move filtering for null date rows back to dataframe
TST: test both next and prev event frame loading and use EventsLoader.

BUG: remove extra arg

MAINT: call list on zip for compatibility with python 3
2016-04-25 16:11:12 -04:00
Maya Tydykov e41c99d077 MAINT: add an event date col field to each loader
MAINT: add event date col field and filter rows where this field is null

TST: modify tests to filter nulls in event date col

MAINT: calculate value repeats by vectorized computation on separate start and end dates.

MAINT: pass DatetimeIndex instead of list of strings
2016-04-25 11:42:08 -04:00
Eddie Hebert a13e336ef5 Merge pull request #1157 from quantopian/use-carray-instead-of-read-all-on-small-size
PERF: Improve read time for smaller num of assets.
2016-04-21 22:25:01 -04:00
Eddie Hebert 66d05aa563 PERF: Improve read time for smaller num of assets.
The BcolzDailyBarReader was optimized for the pipeline case of reading
all assets at once.

Now that the reader is also used to support daily history the case of
reading a data for a small number of assets is more common, particularly
in algorithms that use the history API which have a high rotation of
assets (e.g. an algorithm which pipeline uses to set the active
universe)

Remove the bottleneck in reading a small number of assets by
conditionally reading the slice for each asset from the carray, instead
of reading the data for all equities and then indexing into that full
array. On a certain number of assets, it is still better to read all the
data at once. On the Quantopian dataset, which holds data for 20000
about for the last 10 years of equity data (where not all equities trade
over the full range), stored in 118 blosc blp files per column, the
tipping point where the 'read all' mode wins out between 3000-4000
assets.

That number was tested by trying to exercise a worst case scenario where
the equities were spread out evenly across the blp files, by stepping
along a sorted list of assets that were alive over a query range which
spanned 70 trading days.
```
size = 3000
sids = [assets[i] for i in range(0, len(assets), len(assets) /
size)][:size]
```

Also, add parameter to WithBcolzDailyBarReader fixture which allows the
test to specify what the threshold count for reading all data should be,
so that the test_us_equity_pricing can be forced into either mode to
make sure that both branches in logic are covered by all test cases.

On local dev machine this patch improves the read time of `load_raw_array`
for one asset from 100 ms to 96.5 µs. (10^5 improvement.) With reading
only asset per call a being an observed common case when populating the
non-cached values in USEquityHistoryLoader.
2016-04-21 20:43:52 -04:00
Richard Frank 8c92f2d241 TST: What if we don't gc...
Looks like we removed ref cycles elsewhere, so windows builds are
passing without this.
2016-04-21 18:41:57 -04:00
Jean Bredeche 9d1e15ddde BUG: Fetcher wasn't working properly in before_trading_start.
We were trying to use the previous day in before_trading_start because
we were looking for the previous market minute, then normalizing it.  That's
no longer the case, as we want to use today's date for fetcher lookups
in before_trading_start.

Also refactored a bit how dataportal determines if a query should be
routed to the fetcher data structures.
2016-04-21 15:09:14 -04:00
Maya Tydykov 1531568899 ENH: add custom dataset for estimize
MAINT: alphabetize constants

MAINT: remove obsolete column

TST: refactor tests to use common code

MAINT: remove unneeded fields from dataset

MAINT: remove obsolete earnings estimates columns and refactor
2016-04-19 11:29:03 -04:00