Commit Graph

24 Commits

Author SHA1 Message Date
Peter Veerman 2b747ba46c [DataFrame] Implement .fillna(), .ffill(), .bfill(), .eval(), and .drop() (#1544)
* Implement ray.DataFrame.drop w/ tests

* Implement ray.DataFrame.eval w/ tests

Fix flake8 issues

* Fix flake8 issues in dataframe.py

* Implement fillna

* Implement fillna

* Implement ffill and bfill

* Define helper functions outside of method invocation

* Implement ray.DataFrame.eval w/ tests

* Index update

* Fixed transpose bug with nan values

* Fix lint

* Implement fillna

* Use ray index to check if labels exist in df

* Fix ValueError catching

* Remove duplicate test methods

* Add documentation for .fillna(), .ffill(), .bfill(), .eval(), and .drop()

Fix flake8 errors

* Remove notebook files

* Change fillna, eval, drop to use new index type

* Fix documentation for fillna, eval and drop

temp

Temp

temp

temp

temp

* Update drop to work with new type of ray index

* Fix flake8 errors

* Refactor fillna fix for index
2018-03-09 07:37:27 -08:00
Rohan Singh 0abebb0975 [Dataframes] Implement .__len__(), .__contains__(), .first_valid_index(), and .last_valid_index() (#1664)
* added len, contains, first_valid_index, last_valid_index

* fixed contains test cases

* test files updated for PR
2018-03-06 23:56:11 -08:00
Devin Petersohn 4af42d5bb6 [DataFrame] Adding error checking for pandas version (#1662)
* Adding error checking for pandas version

* Addressing comments
2018-03-06 09:57:49 -08:00
Kunal Gosar 6685d4c446 fix tail and finish repr and str (#1628) 2018-03-02 15:26:54 -08:00
Robert Nishihara 1222d09224 Fix dataframe test linting and test. (#1629) 2018-02-28 15:21:49 -08:00
Devin Petersohn e7df293946 [DataFrames] Updating Error messages to encourage contribution. (#1623) 2018-02-27 21:44:33 -08:00
Kunal Gosar 4a15c2c65c [Dataframes] Call ray.init() on ray.dataframe import (#1626)
* ray.init on dataframe import

* wrapping ray.init in a try/except

* removing ray.init calls from test code

* resolving flake8
2018-02-27 16:11:23 -08:00
Kunal Gosar 34664dbf76 [DataFrame] Pass lengths to _default_index instead of df (#1621)
* Pass lengths to remote function over DataFrame

* Increasing performance by moving length to remote
2018-02-27 02:38:26 -08:00
Simon Mo 4ab16d7fb3 [DataFrame] Implement loc, iloc (#1612)
* Add parquet-cpp to gitignore

* Add read_csv and read_parquet

* Gitignore pytest_cache

* Fix flake8

* Add io to __init__

* Changing Index. Currently running tests, but so far untested.

* Removing issue of reassigning DF in from_pandas

* Fixing lint

* Fix bug

* Fix bug

* Fix bug

* Better performance

* Fixing index issue with sum

* Address comments

* Update io with index

* Updating performance and implementation. Adding tests

* Fixing off-by-1

* Fix lint

* Address Comments

* Make pop compatible with new to_pandas

* Format Code

* Cleanup some index issue

* Bug fix: assigned reset_index back

* Implement loc and iloc

* Revert whitespace

* Format code

* Address comments
2018-02-27 01:57:52 -08:00
Kunal Gosar f43328f332 moved _default_index to remote fn (#1617) 2018-02-26 21:12:04 -08:00
Kunal Gosar 48bd7b147d [DataFrame] Added Implementations for equals, query, and some other operations (#1610)
* Implemented Dataframe __abs__ and __iter__

* implemented __neg__

* implemented query

* Implemented equals

* Implemented __eq__ and __ne__ operators

* Added method level comments

* resolved flake8 comments

* resolving devin's comments
2018-02-26 18:31:00 -08:00
Simon Mo d78a22f94c [DataFrame] Implement IO for ray_df (#1599)
* Add parquet-cpp to gitignore

* Add read_csv and read_parquet

* Gitignore pytest_cache

* Fix flake8

* Add io to __init__

* Changing Index. Currently running tests, but so far untested.

* Removing issue of reassigning DF in from_pandas

* Fixing lint

* Fix bug

* Fix bug

* Fix bug

* Better performance

* Fixing index issue with sum

* Address comments

* Update io with index

* Updating performance and implementation. Adding tests

* Fixing off-by-1

* Fix lint

* Address Comments

* Make pop compatible with new to_pandas

* Format Code

* Cleanup some index issue

* Bug fix: assigned reset_index back

* Remove unused debug line
2018-02-26 18:26:38 -08:00
Devin Petersohn 1fa59f1887 [DataFrame] Adding insert, set_axis, set_index, reset_index and tests (#1603) 2018-02-26 08:58:15 -08:00
Devin Petersohn 529397b35e [DataFrames] Updating Index implementation, performance improvements (#1598) 2018-02-25 13:39:28 -08:00
Devin Petersohn de6fa02c85 [DataFrame] Fix transpose with nan values and add functionality needed for Index (#1545) 2018-02-21 08:46:37 -08:00
Helen Che fd03fb967f [DataFrame] Implement iteritems, items, itertuples, and iterrows. (#1543)
* items

* Can't pickle generator so return list

* Add itterows method

* Finish flak8

* Add itertuples

* Some changes

* Add iter tests to mixed types test

* Appease flake8
2018-02-20 10:07:36 -08:00
Hari Subbaraj 8d1a0b0d04 [DataFrame] Dataframe functions (max, min, notnull, notna) (#1500)
* Finished max, min, notna, notnull

* flake8 satisfied

* fixed pytest fixture error

* flake8 sufficed

* post-code review

* added methods to new mixed types test
2018-02-16 14:00:59 -08:00
Helen Che 62680011ee [DataFrame] Add implementation for get method (#1496)
* Add implementation for get method
Add tests for get method
Add implementation/tests for get_dtype_counts method
Add implementation/tests for get_ftype_counts method

* Add test fixtures

* Change method tests to fixtures

* Flake8
2018-02-08 22:12:03 -08:00
Devin Petersohn fa37564511 [DataFrame] Implementation for head, idxmax, idxmin, pop, tail, and Ray Index (#1520)
* Adding head implementation

* Adding idxmax, idxmin, pop, tail

* Adding index skeleton

* Addressing reviewer comments

* Fixing tests to reflect Series constructor changes
2018-02-07 15:43:45 -08:00
Simon Mo 0a79442954 [DataFrame] MVP (1/4) (#1495)
* Implement __{getitem, delitem, copy, deepcopy}__

* Implement all(), any()

* Revert "Implement all(), any()"

This reverts commit 784052414f063662cdb30943297dc9ddfd3ca300.

* Address Comments + Fix axis indexing

* Update syntax for test_axes

* Implement bfill, bool, count

* Implement round

* Resolve bfill inplace issue

* Deimplement bfill; wait for the distributed version

* Fix format

* Copy df for __delitem__
2018-02-03 09:26:18 -08:00
Robert Nishihara 5acc98e629 Update arrow with better dataframe serialization and get rid of custo… (#1413)
* Update arrow with better dataframe serialization and get rid of custom dataframe serializers.

* Update plasma client API.

* Fix potential bug.

* Bug fix.

* Update arrow to use deduplicated file descriptors and mutable buffers.

* Fix tests.

* Update commit.

* Update commit.

* Update commit.

* Update commit.

* Update commit

* Update commit back to arrow codebase.'
2018-01-24 10:03:29 -08:00
Devin Petersohn 4aca016bff Adding series and a way to validate our API. (#1435)
* Adding series and a way to validate our API.

* Moving partitions into protected status
2018-01-21 19:20:54 -08:00
Devin Petersohn 112ef07563 Adding all DataFrame methods with NotImplementedErrors (#1403)
* Adding all DataFrame methods with NotImplementedErrors

* Moving dataframe creation into function call
2018-01-07 12:00:16 -08:00
Devin Petersohn a75a473d7f Add a distributed Dataframe API to Ray (#1330)
* Adding dataframe object and minor APIs

* Adding reduce functionality

* Adding some print and making reduce work on current Ray

* Cleanup

* Added new functionality and docs.

* Adding more functionality.

* New functionality with older cleanup

* Complying with flake8 formatting

* Added tests and addressed reviewer comments

* Complying with flake8.

* Adding pandas to travis and requirements doc

* Fixing flake8 failures

* Fixing flake8 errors from imports

* Fixing import error

* Fixing import errors

* Addressing reviewer comments

* Addressing lint error
2017-12-20 09:31:22 -08:00