Skip to content

BUG: string methods on empty series (GH7241) #7242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

wabu
Copy link
Contributor

@wabu wabu commented May 27, 2014

closes #7241

@jreback jreback added this to the 0.14.1 milestone May 27, 2014
@jreback
Copy link
Contributor

jreback commented May 27, 2014

this will need a release note (though 0.14.1 section is not there yet)

@jreback
Copy link
Contributor

jreback commented May 31, 2014

can u squash to a single commit

@wabu
Copy link
Contributor Author

wabu commented Jun 1, 2014

ok

@hayd
Copy link
Contributor

hayd commented Jun 1, 2014

This is good.

I wonder if for consistency perhaps these should return empty Series/DataFRames with object dtype?

In [8]: s.str.findall('a')
Out[8]: Series([], dtype: object)

@@ -578,6 +578,19 @@ def check_index(index):
tm.makeDateIndex, tm.makePeriodIndex ]:
check_index(index())

# GH7241
# empty series with one group
s = Series(dtype=str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is equivalent to Series(dtype=object).

@jreback
Copy link
Contributor

jreback commented Jun 1, 2014

agreed @wabu can u put in tests for all string methods in empty series (their maybe some already existing) and see what breaks if u enforce object dtype

@hayd
Copy link
Contributor

hayd commented Jun 1, 2014

It's not necessarily object for all string methods, I guess it should match the dtype of a non-empty example... for example str.count and str.len returns int64 (actually these could be the only examples).

@jreback
Copy link
Contributor

jreback commented Jun 1, 2014

right those are counting methods so they should be int64 in empty

@wabu
Copy link
Contributor Author

wabu commented Jun 2, 2014

sounds like the right thing to do, I'll have a look.
I realized some time ago that other methods behave strange with empty series/dfs, but did not have time to look at this (perhaps it's already fixed).

@wabu wabu changed the title BUG: extract on empty series with >1 groups (GH7241) BUG: string methods on empty series (GH7241) Jun 3, 2014
@wabu
Copy link
Contributor Author

wabu commented Jun 3, 2014

I changed _na_map. as noted in the release notes, now str_extract returns series with only nan values as dtype=object instead of float. I think it's strange if you have object for empty, but float for not-matching strings.

old:

>>> pd.Series(['foo']).str.extract('(bar)')
0 NaN
dtype: float

new:

>>> pd.Series(['foo']).str.extract('(bar)')
0 NaN
dtype: object

@@ -30,6 +30,9 @@ API changes
- Openpyxl now raises a ValueError on construction of the openpyxl writer
instead of warning on pandas import (:issue:`7284`).

- ``StringMethods.extract`` returns series with only NaN values as
``dtype=object`` instead of ``dtype=float`` (:issue:`7242`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say something about non-matching here (also do need to mention this is str.extract doc section (basics.rst)? and/or doc-string?

@hayd
Copy link
Contributor

hayd commented Jun 3, 2014

thanks for putting the empty tests together!

You have a merge conflict (probably in release notes) so please rebase off master.

- all StringMethods are tested and work on empty seires
- moreover extract always returns dtype==object, even when no match is
  found
@hayd
Copy link
Contributor

hayd commented Jun 4, 2014

closed via f24f2e8

(there was another merge conflict on the release notes!)

@hayd hayd closed this Jun 4, 2014
@hayd
Copy link
Contributor

hayd commented Jun 4, 2014

Thanks!

@wabu wabu deleted the extract-empty-fix branch June 4, 2014 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StringMethods.extract fails on empty series
3 participants