Skip to content

ENH: Implement core/strings/wrap method #6705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 187 commits into from

Conversation

jeffreystarr
Copy link
Contributor

This patch implements the str_wrap function within core/strings. The implementation follows
the behavior of R's stringr library. When a string is 'wrap'ped, the return value will be a
paragraph with lines of max(word length, width) length.

Example:

>>> str_wrap(Series([u'line to be wrapped', u'another line to be wrapped']), width=12)
Series([u'line to be\nwrapped', u'another\nline to be\nwrapped'])

This patch implements the str_wrap function within
core/strings. The implementation follows the behavior
of R's stringr library. When a string is 'wrap'ped,
the return value will be a paragraph with lines of
max(word length, width) length.
@jreback jreback added this to the 0.14.0 milestone Mar 25, 2014
@jreback
Copy link
Contributor

jreback commented Mar 25, 2014

can you show an example in the top fo the PR of using this.

Also need to add to the documentation in basics.rst (mention this function, don't need an example)

also add to comparison_with_r.rst

@dsm054
Copy link
Contributor

dsm054 commented Mar 25, 2014

Couldn't this leverage the textwrap module?

…R's stringr library defaults and semantics (width being exclusive).
Expanded docstring with description of major optional parameters and added an example.
@jeffreystarr
Copy link
Contributor Author

I've changed the implementation to use the textwrap module and expanded the documentation to reference many of the parameters wrap can now use.

I've also added a mention of wrap to basics.rst.

comparison_with_r.rst does not currently discuss any vectorized string operations. I think a new section to that file (to discuss all vectorized string operations) would be a PR by itself.

@@ -710,20 +711,64 @@ def str_rstrip(arr, to_strip=None):
return _na_map(lambda x: x.rstrip(to_strip), arr)


def str_wrap(arr, width=80):
def str_wrap(arr, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should all of the arguments and their defaults be here? is their a reason you did it this other way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since str_wrap is a wrapper around textwrap.TextWrapper.wrap, I did not want to restrict the interface to a subset of the options available. (There have been at least three parameters added to textwrap.TextWrapper from python 2.6 to 3.3.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...that makes sense, the only issue then is the doc string.... @jorisvandenbossche any idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffreystarr does the doc-string work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK now, with the docstring explaining the parameters. It's only the signature now that does not include all those.
But what is the reason not including the keyword arguments and its default values in the signature but instead in a dict in the function?

library str_wrap function. Unless overwritten using kwargs, the instance has expand_tabs=False,
replace_whitespace=True, drop_whitespace=True, break_long_words=False, and
break_on_hyphens=False. Since R's stringr str_wrap treats the line width as an exclusive
value, the instance is configured with width=user-supplied width - 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather go with python's default value instead of changing the user input? And then putting a note explaining the difference with R?

@@ -195,6 +195,7 @@ Improvements to existing features
- Performance improvement when converting ``DatetimeIndex`` to floating ordinals
using ``DatetimeConverter`` (:issue:`6636`)
- Performance improvement for ``DataFrame.shift`` (:issue: `5609`)
- Arrays of strings can be wrapped to a specified width (str.wrap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the refernce to this PR number

@jreback
Copy link
Contributor

jreback commented Apr 5, 2014

@jeffreystarr pls rebase this

@jtratner
Copy link
Contributor

jtratner commented Apr 6, 2014

@jeffreystarr - I'd also like to make width a positional argument, which lets you do something like: df.str.wrap(30), which has a nice clarity to it vs. df.str.wrap(width=30). I'd suggest width should be required, because I'd say there isn't really a "default" width that really makes sense. This is a pretty minimal change, just needs to be def str_wrap(self, width, **kwargs) in each function.

As an aside, this previously was just NotImplemented right? So there's no issue with forcing width to be passed?

@jeffreystarr
Copy link
Contributor Author

You are correct - it was previously NotImplemented. There is no impact to
existing code by making width required.

On Sat, Apr 5, 2014 at 7:41 PM, Jeff Tratner [email protected]:

@jeffreystarr https://github.com/jeffreystarr - I'd also like to make
width a positional argument, which lets you do something like:
df.str.wrap(30), which has a nice clarity to it vs. df.str.wrap(width=30).
I'd suggest width should be required, because I'd say there isn't really
a "default" width that really makes sense. This is a pretty minimal change,
just needs to be def str_wrap(self, width, **kwargs) in each function.

As an aside, this previously was just NotImplemented right? So there's no
issue with forcing width to be passed?

Reply to this email directly or view it on GitHubhttps://github.com//pull/6705#issuecomment-39655369
.

@jtratner
Copy link
Contributor

jtratner commented Apr 6, 2014

@jeffreystarr - great, then if it's okay with you, I'd suggest changing width to be required.

@jreback
Copy link
Contributor

jreback commented Apr 8, 2014

@jeffreystarr this looks good

can you can squash down to 1 commit would be great

width : int
Maximum line-width
expand_tabs: bool, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a space before the : (like for width above)

@jreback
Copy link
Contributor

jreback commented Apr 28, 2014

hmm...looks like you somehow rebase to a different head

git rebase -i origin/master

should do the trick

if it doesn't, then:

  • pull master
  • create a new branch
  • git cherry-pick <commits> (you must pick 1 by one, starting with the oldest
    the are YOUR commits (I think you have 6)

then this 'starts' you over

you can then

git push yourforck yourbranch -f

and it will erase this and replace with your new

@jorisvandenbossche
Copy link
Member

small note on the advice of @jreback: be sure that origin (if this is your fork) is also updated, or you can rebase on upstream (if this is pydata/pandas):

git fetch upstream
git rebase upstream/master

This patch implements the str_wrap function within
core/strings. The implementation follows the behavior
of R's stringr library. When a string is 'wrap'ped,
the return value will be a paragraph with lines of
max(word length, width) length.
…R's stringr library defaults and semantics (width being exclusive).
Expanded docstring with description of major optional parameters and added an example.
…o str_wrap

Conflicts:
	doc/source/release.rst
	doc/source/v0.14.0.txt
@jreback
Copy link
Contributor

jreback commented Apr 29, 2014

closing in favor of #6999

@jreback jreback closed this Apr 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.