Skip to content

BUG: fix _array2string for structured array (issue #5692) #8160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 18, 2016
Merged

BUG: fix _array2string for structured array (issue #5692) #8160

merged 1 commit into from
Oct 18, 2016

Conversation

skwbc
Copy link
Contributor

@skwbc skwbc commented Oct 15, 2016

The cause of issue #5692 is that _array2string (in numpy/core/arrayprint.py) doesn’t have format function for structured arrays and it uses general purpose format function to format array elements of structured arrays.
I made StructureFormat class to format structured array elements. _get_format_function instantiates StructureFormat by instantiating format function classes for each field of structure recursively and merge them.

@charris charris changed the title BUG: fix _array2string for strustured array (issue #5692) BUG: fix _array2string for structured array (issue #5692) Oct 16, 2016
@charris
Copy link
Member

charris commented Oct 16, 2016

@skwbc You should expand the commit message to contain your comment above. Also include Closes #5692 in the commit message body so that the issue is closed when this PR is merged.

@skwbc
Copy link
Contributor Author

skwbc commented Oct 17, 2016

@charris
I have modified the commit message to include my comment above and Closes #5692.

for descr in dtype_.descr:
field_name = descr[0]
field_values = data[field_name]
if 1 < len(field_values.shape):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this if would be clearer written as

if len(field_values.shape) <= 1:
    blah
else:
    blah

As one expects to handle things in order of increasing size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote it.

field_name = descr[0]
field_values = data[field_name]
if 1 < len(field_values.shape):
format_function = repr_format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is repr_format the correct choice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, yes (low confidence).
By using repr_format, this updated _array2string should be called and it works fine.

@charris
Copy link
Member

charris commented Oct 18, 2016

LGTM. @ahaldane Might take a look.

The cause of issue #5692 is that `_array2string` (in
numpy/core/arrayprint.py) doesn’t have format function for structured
arrays and it uses general purpose format function to format array
elements of structured arrays.
This commit adds `StructureFormat` class to format structured array
elements. `_get_format_function` instantiates `StructureFormat` by
instantiating format function classes for each field of structure
recursively and merge them.

Closes #5692.
@charris charris merged commit 7c5163a into numpy:master Oct 18, 2016
@charris
Copy link
Member

charris commented Oct 18, 2016

Thanks @skwbc . If you want to take a shot at adding standard docstrings to the arrayprint functions that would be awesome.

@ahaldane
Copy link
Member

Sorry, I only just saw this.

Looks like the right idea, but I see two problems:

First, this will raise an exception if the dtype has padding bytes:

>>> dt = np.dtype('u1,i4', align=True)
>>> print np.zeros(3, dtype=dt)
ValueError: no field of name

That's easy to fix though, just do

        for field_name in dtype_.names:

instead of

        for descr in dtype_.descr:
            field_name = descr[0]

This happens because dtype.descr is specifically designed to satisfy the PEP3118 requirements about padding bytes, and is only meant to be used in that context. I almost want to add a warning about that to the docstring of dtype.descr so that people don't use it. I think it confuses many people.

Second, I don't think this treats subarrays quite correctly:

>>> print np.zeros(shape=3, dtype=[('B', 'i4', 5)])
[(array([0, 0, 0, 0, 0], dtype=int32),)
 (array([0, 0, 0, 0, 0], dtype=int32),)
 (array([0, 0, 0, 0, 0], dtype=int32),)]

but I think it should give (like in 1.11)

>>> print np.zeros(shape=3, dtype=[('B', 'i4', 5)])
[([0, 0, 0, 0, 0],) ([0, 0, 0, 0, 0],) ([0, 0, 0, 0, 0],)]

@skwbc if you can make a new PR with to fix those, please do. If you don't in a few days one of us will.

@mhvk
Copy link
Contributor

mhvk commented Oct 18, 2016

@skwbc - also see #8172 - it would be nice if this worked for structured array scalars as well!

@skwbc
Copy link
Contributor Author

skwbc commented Oct 19, 2016

@ahaldane I'll make a new PR in a few days.

@jakirkham
Copy link
Contributor

jakirkham commented Mar 7, 2017

FWIW this broke some doctests I had. So is complicating my ability to test against 1.11 and 1.12. Is there a way to fix this to keeping a trailing zero or force dropping it in 1.11?

@eric-wieser
Copy link
Member

Perhaps write your doctests as

>>> some_result == np.array(...)
True

Instead of relying on the repr. Alternatively, only run your doctests on one version of numpy?

@jakirkham
Copy link
Contributor

Is there no option to specify how I would like to format floats? It seems like this changed how formatting is handled under the surface. Would expect that the users would have some recourse. Is that not the case?

@mhvk
Copy link
Contributor

mhvk commented Mar 7, 2017

@jakirkham - It is indeed quite annoying and I had similar problems adjusting the doctests for astropy; my conclusion was that it was better to mimic new numpy even on old versions, but the problem is that one used to have no control over the formatting. So, I ended up writing code such that it used an updated array2string -- see https://github.com/astropy/astropy/blob/master/astropy/coordinates/representation.py#L34 (ignore the TODO there, but note that this presumes the structured array only contains float).

@jakirkham
Copy link
Contributor

Thanks for the tips, @mhvk. Unfortunately my structured arrays contain a variety of types outside of floats. That said, I think I found a viable workaround by simply overriding the formatter for float to convert to a Python float and then create a str. It's not the prettiest thing, but it seems to do the trick. Ideally one would add this to setup and teardown functions of relevant doctests. Please see commit ( nanshe-org/nanshe@00233bc ) for an example.

@mhvk
Copy link
Contributor

mhvk commented Mar 7, 2017

Indeed, the problem was that prior to 1.12, the structured array formatters simply ignored the numpy formats (and thus deferred to python).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants