Commit Graph

30 Commits

Author SHA1 Message Date
Kunal Gosar 7c9f39241e [DataFrame] Implementing write methods (#1918)
* Add in write methods and functionality

* infer highest available pickle version

* Fix import rebase artifact

* formatting changes to test

* fix lint
2018-04-22 21:25:33 -07:00
Devin Petersohn 8f59546ef2 [DataFrame] Implementing API correct groupby with aggregation methods (#1914) 2018-04-21 17:28:16 -07:00
adgirish 3c48783a16 [DataFrame] Adding read methods and tests (#1712)
* Adding read methods and tests

* Referencing internal partition method so constructors are more canonical with Pandas

* Fixing to reference from_pandas in utils

* Cleaning up unused imports

* rerunning tests

* fixing flake8

* resolving errors

* Added sql and sas test

* updating

* Temporarily phasing out read_csv code for wrapper while diagnosing, added io tests to travis

* Adding travis

* restoring distributed read csv

* resolving rebases

* lint

* Sampling out HD test

* adding dep

* fix pathing

* Flagging out tests

* resolving read_method issues

* fix build issue

* move additional dependencies to extras

* fixing lint

* removing IO dependencies

* updated requirements doc
2018-04-20 18:33:08 -07:00
Patrick Yang f505f0642f [DataFrame] Pass read_csv kwargs to _infer_column (#1894)
* pass kwargs to _infer_column

* adding small test for non-comma delim

* fix lint
2018-04-16 08:47:30 -07:00
Peter Schafhalter 1d605e8f8a [DataFrame] Inherit documentation from Pandas (#1727)
* Added _inherit_docstrings

* DataFrame documentation inherits from Pandas

* Fix formatting

* Replace hasattr and document properties

* Fix rebase

* Override documentation for groupby

* Override documentation for series

* Don't overwrite property docstrings

* Fix property __doc__ for python2
2018-04-12 20:30:19 -07:00
Omkar Salpekar a3ddde398c [DataFrame] Fixed repr, info, and memory_usage (#1874)
* working with dataframes with too many rows and columns

* repr works for jupyter notebooks now

* added comments and test file

* added repr test file to .travis.yml

* added back ray.dataframe as pd to test file

* fixed pandas importing issues in test file

* getting the front and back of df more efficiently

* only keeping dataframe tests in travis

* fixing numpy array for row and col lengths issue

* doesn't add dimensions if df is small enough

* implemented memory_usage()

* completed memory_usage - still failing 2 tests

* only failing one test for memory_usage

* all repr and dataframes tests passing now

* fixing error related to python2 in info()

* fixing python2 errors

* fixed linting errosr

* using _arithmetic_helper in memory_usage()

* fixed last lint error

* removed testing-specific code

* adding back travis test

* removing extra tests from travis

* re-added concat test

* fixes with new indexing scheme

* code cleanup

* fully working with new indexing scheme

* added tests for info and memory_usage

* removed test file
2018-04-11 08:07:07 -07:00
Peter Schafhalter 405b05d58a [DataFrame] Implemented __getattr__ (#1753)
* __getattr__ accesses columns

* Added test
2018-04-10 10:19:33 -07:00
adgirish efeaacbedc Adding support for concat (#1739)
adding tests

fixing flake8

adding init

flake 8 on test

fixing tests, imports, and flake8

handling for index

adding tests for row, index

added more robust error handling for axis

fixing test failures

cleaning up error sfor 2.7

updating travis

resolving import

fixing flake8

moved import order

Fixing to refactor and delaying implementing ray-pd inner concat

resolving ray-pd concat and from_pandas mutation

Revert "resolving ray-pd concat and from_pandas mutation"

This reverts commit 5db43e4e89e328286532f3ef98a4526575c5d08d.
2018-04-09 21:36:24 -07:00
Devin Petersohn 0d9a7a3c19 [DataFrame] Update architecture to be more flexible and performant (#1821) 2018-04-05 15:14:33 -07:00
Rohan Singh 1f027344f1 [Dataframes] Implemented .describe() (#1696)
* added describe methods

* mean updates and added truediv func

* updates

* updated truediv test

* porting stocks to ubuntu

* hacky solution for describe, mean, median, quantile by transposing df

* removed data file

* removed faulty truediv implementation

* flake8 and documentation updates

* updated mean, median, var, std to handle mixed values

* added describe methods

* mean updates and added truediv func

* updates

* updated truediv test

* porting stocks to ubuntu

* hacky solution for describe, mean, median, quantile by transposing df

* removed data file

* removed faulty truediv implementation

* flake8 and documentation updates

* fixed quantile to drop object typed columns

* syntax improvements"

* fixed flatten issue

* fixing flatten issue

* minor updates

* added describe methods

* mean updates and added truediv func

* updates

* updated truediv test

* porting stocks to ubuntu

* hacky solution for describe, mean, median, quantile by transposing df

* removed data file

* removed faulty truediv implementation

* flake8 and documentation updates

* updated mean, median, var, std to handle mixed values

* added describe methods

* mean updates and added truediv func

* updates

* updated truediv test

* porting stocks to ubuntu

* hacky solution for describe, mean, median, quantile by transposing df

* removed data file

* removed faulty truediv implementation

* flake8 and documentation updates

* fixed quantile to drop object typed columns

* syntax improvements"

* fixed flatten issue

* fixing flatten issue

* improved describe syntax
2018-03-15 21:16:59 -07:00
Devin Petersohn 8c1066cdba [DataFrame] Implemented cummax, cummin, cumsum, cumprod (#1705)
* cummax, cummin, cumsum, cumprod

* added remote function

* Fix lint

* Fixing tests and linting

* Fix lint
2018-03-13 10:06:34 -07:00
Jae Min Kim 737120952e [Dataframes] Reorganization (#1676)
* moved helper functions for dataframes into df_utils

* Updating base on review comments

* fixed bug with from_pandas

* Updating formatting

* Fix lint
2018-03-12 19:13:33 -07:00
Peter Veerman 6455ec934b [DataFrame] Implements DataFrame.rename, DataFrame.rename_axis, and Index.set_names (#1573)
* Index update

* Fixed transpose bug with nan values

* Fix lint

* Add rename tests

* Implement DataFrame.rename, DataFrame.rename_axis, and Index.set_names

* Temp

* Fixing rename for new index implementation

Fix rebase merges

* Fix rename and rename_axis to work with new index.

Re-add pytest fixture

Clean up rebase artifacts

Remove index.py file

* Addressing minor points

* Addressing comments
2018-03-12 19:05:32 -07:00
Peter Veerman 2b747ba46c [DataFrame] Implement .fillna(), .ffill(), .bfill(), .eval(), and .drop() (#1544)
* Implement ray.DataFrame.drop w/ tests

* Implement ray.DataFrame.eval w/ tests

Fix flake8 issues

* Fix flake8 issues in dataframe.py

* Implement fillna

* Implement fillna

* Implement ffill and bfill

* Define helper functions outside of method invocation

* Implement ray.DataFrame.eval w/ tests

* Index update

* Fixed transpose bug with nan values

* Fix lint

* Implement fillna

* Use ray index to check if labels exist in df

* Fix ValueError catching

* Remove duplicate test methods

* Add documentation for .fillna(), .ffill(), .bfill(), .eval(), and .drop()

Fix flake8 errors

* Remove notebook files

* Change fillna, eval, drop to use new index type

* Fix documentation for fillna, eval and drop

temp

Temp

temp

temp

temp

* Update drop to work with new type of ray index

* Fix flake8 errors

* Refactor fillna fix for index
2018-03-09 07:37:27 -08:00
Rohan Singh 0abebb0975 [Dataframes] Implement .__len__(), .__contains__(), .first_valid_index(), and .last_valid_index() (#1664)
* added len, contains, first_valid_index, last_valid_index

* fixed contains test cases

* test files updated for PR
2018-03-06 23:56:11 -08:00
Robert Nishihara 1222d09224 Fix dataframe test linting and test. (#1629) 2018-02-28 15:21:49 -08:00
Devin Petersohn e7df293946 [DataFrames] Updating Error messages to encourage contribution. (#1623) 2018-02-27 21:44:33 -08:00
Kunal Gosar 4a15c2c65c [Dataframes] Call ray.init() on ray.dataframe import (#1626)
* ray.init on dataframe import

* wrapping ray.init in a try/except

* removing ray.init calls from test code

* resolving flake8
2018-02-27 16:11:23 -08:00
Simon Mo 4ab16d7fb3 [DataFrame] Implement loc, iloc (#1612)
* Add parquet-cpp to gitignore

* Add read_csv and read_parquet

* Gitignore pytest_cache

* Fix flake8

* Add io to __init__

* Changing Index. Currently running tests, but so far untested.

* Removing issue of reassigning DF in from_pandas

* Fixing lint

* Fix bug

* Fix bug

* Fix bug

* Better performance

* Fixing index issue with sum

* Address comments

* Update io with index

* Updating performance and implementation. Adding tests

* Fixing off-by-1

* Fix lint

* Address Comments

* Make pop compatible with new to_pandas

* Format Code

* Cleanup some index issue

* Bug fix: assigned reset_index back

* Implement loc and iloc

* Revert whitespace

* Format code

* Address comments
2018-02-27 01:57:52 -08:00
Kunal Gosar 48bd7b147d [DataFrame] Added Implementations for equals, query, and some other operations (#1610)
* Implemented Dataframe __abs__ and __iter__

* implemented __neg__

* implemented query

* Implemented equals

* Implemented __eq__ and __ne__ operators

* Added method level comments

* resolved flake8 comments

* resolving devin's comments
2018-02-26 18:31:00 -08:00
Simon Mo d78a22f94c [DataFrame] Implement IO for ray_df (#1599)
* Add parquet-cpp to gitignore

* Add read_csv and read_parquet

* Gitignore pytest_cache

* Fix flake8

* Add io to __init__

* Changing Index. Currently running tests, but so far untested.

* Removing issue of reassigning DF in from_pandas

* Fixing lint

* Fix bug

* Fix bug

* Fix bug

* Better performance

* Fixing index issue with sum

* Address comments

* Update io with index

* Updating performance and implementation. Adding tests

* Fixing off-by-1

* Fix lint

* Address Comments

* Make pop compatible with new to_pandas

* Format Code

* Cleanup some index issue

* Bug fix: assigned reset_index back

* Remove unused debug line
2018-02-26 18:26:38 -08:00
Devin Petersohn 1fa59f1887 [DataFrame] Adding insert, set_axis, set_index, reset_index and tests (#1603) 2018-02-26 08:58:15 -08:00
Devin Petersohn 529397b35e [DataFrames] Updating Index implementation, performance improvements (#1598) 2018-02-25 13:39:28 -08:00
Devin Petersohn de6fa02c85 [DataFrame] Fix transpose with nan values and add functionality needed for Index (#1545) 2018-02-21 08:46:37 -08:00
Helen Che fd03fb967f [DataFrame] Implement iteritems, items, itertuples, and iterrows. (#1543)
* items

* Can't pickle generator so return list

* Add itterows method

* Finish flak8

* Add itertuples

* Some changes

* Add iter tests to mixed types test

* Appease flake8
2018-02-20 10:07:36 -08:00
Hari Subbaraj 8d1a0b0d04 [DataFrame] Dataframe functions (max, min, notnull, notna) (#1500)
* Finished max, min, notna, notnull

* flake8 satisfied

* fixed pytest fixture error

* flake8 sufficed

* post-code review

* added methods to new mixed types test
2018-02-16 14:00:59 -08:00
Helen Che 62680011ee [DataFrame] Add implementation for get method (#1496)
* Add implementation for get method
Add tests for get method
Add implementation/tests for get_dtype_counts method
Add implementation/tests for get_ftype_counts method

* Add test fixtures

* Change method tests to fixtures

* Flake8
2018-02-08 22:12:03 -08:00
Devin Petersohn fa37564511 [DataFrame] Implementation for head, idxmax, idxmin, pop, tail, and Ray Index (#1520)
* Adding head implementation

* Adding idxmax, idxmin, pop, tail

* Adding index skeleton

* Addressing reviewer comments

* Fixing tests to reflect Series constructor changes
2018-02-07 15:43:45 -08:00
Simon Mo 0a79442954 [DataFrame] MVP (1/4) (#1495)
* Implement __{getitem, delitem, copy, deepcopy}__

* Implement all(), any()

* Revert "Implement all(), any()"

This reverts commit 784052414f063662cdb30943297dc9ddfd3ca300.

* Address Comments + Fix axis indexing

* Update syntax for test_axes

* Implement bfill, bool, count

* Implement round

* Resolve bfill inplace issue

* Deimplement bfill; wait for the distributed version

* Fix format

* Copy df for __delitem__
2018-02-03 09:26:18 -08:00
Devin Petersohn 4aca016bff Adding series and a way to validate our API. (#1435)
* Adding series and a way to validate our API.

* Moving partitions into protected status
2018-01-21 19:20:54 -08:00