Skip to content

Added Stata 13 support #4662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 21, 2013
Merged

Added Stata 13 support #4662

merged 2 commits into from
Sep 21, 2013

Conversation

PKEuS
Copy link
Contributor

@PKEuS PKEuS commented Aug 23, 2013

Added initial Stata 13 .dta file format support. Newly added string features ("strls") and writing are not supported. Fixes #4291.

@jreback
Copy link
Contributor

jreback commented Aug 23, 2013

@PKEuS you have travis turned on?

@PKEuS
Copy link
Contributor Author

PKEuS commented Aug 24, 2013

Yes.

Main error shown on travis is "ValueError: Version of given Stata file is not 104, 105, 108, 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12) or 117 (Stata 13)" for all 117 unit tests, but only for Python != 2.7. Does anybody have an idea what is going on?

@jreback
Copy link
Contributor

jreback commented Aug 24, 2013

the one build that passes runs very few tests (it's for a non-English testing) and doesn't run the stata stuff

does this pass locally?

Travis runs on 32-bit Linux FYI

@PKEuS
Copy link
Contributor Author

PKEuS commented Aug 24, 2013

Yes, it passes locally. But there is something strange about it I noticed:
When I execute the tests a second time, they fail with the same message as Travis, while they pass, after I changed something in stata.py.

@jreback
Copy link
Contributor

jreback commented Aug 24, 2013

maybe sure that the multiprocess_can_split is False (at the top of the test_stata)

what did u change?

@jreback
Copy link
Contributor

jreback commented Aug 25, 2013

I ran this PR local (64-bit linux) and got this; maybe reading the wrong value for vartypes?

(Pdb) p self.path_or_buf
<open file '/mnt/home/jreback/pandas/pandas/io/tests/data/stata1_v13.dta', mode 'rb' at 0x2d5d150>
(Pdb) p seek_vartypes
1441433355735269392
(Pdb) self.path_or_buf.seek(seek_vartypes)
*** IOError: [Errno 22] Invalid argument
(Pdb) p self.byteorder
'>'

@PKEuS
Copy link
Contributor Author

PKEuS commented Sep 4, 2013

I was able to fix some problems, now two of 5 buildbots (2.6 and 2.7) are still failing. Now with:

File "/home/travis/virtualenv/python2.6_with_system_site_packages/lib/python2.6/site-packages/pandas-0.0.0-| py2.6-linux-x86_64.egg/pandas/io/stata.py", line 362, in _read_header
self.path_or_buf.seek(seek_vartypes)

Might the problem be that seek_vartypes is read as an 8-bytes integer?

@jreback
Copy link
Contributor

jreback commented Sep 4, 2013

you should be explicit if you are reading bytes
Travis is a 32 bit build

@PKEuS
Copy link
Contributor Author

PKEuS commented Sep 9, 2013

seek_vartypes = struct.unpack(self.byteorder + 'q', self.path_or_buf.read(8))[0] + 16

Is that not explicit enough? Where do I have to add something? At the line above, or the following one?

self.path_or_buf.seek(seek_vartypes)

@jreback
Copy link
Contributor

jreback commented Sep 9, 2013

@PKEuS not sure. can you setup a vagrant 32-bit machine and debug?

@PKEuS
Copy link
Contributor Author

PKEuS commented Sep 14, 2013

Finally, the problem seems to be solved!
Endianess was detected wrongly - no idea why it worked with python 3 - apparently python 2 was correct. My code was simply wrong.

@jreback
Copy link
Contributor

jreback commented Sep 14, 2013

@PKEuS great, couple of questions

it looks like this is backward compat (as you can still read the new files). Is there a change in the writing behavior?

can you document in doc/source/release.rst and update the docs in io.rst (I would just add a short blurb that in v0.13 you now accept this new Stata format, while remaining backwards compat)

@PKEuS
Copy link
Contributor Author

PKEuS commented Sep 15, 2013

The StataWriter has not been changed - it still uses the old format.

I will add the new feature to the release notes.

@jreback
Copy link
Contributor

jreback commented Sep 15, 2013

@PKEuS this looks fine

can you rebase on master, then squish down to a few commits....?

Fixed docstring of read_stata
Removed unused imports
Fixed some bugs in Stata format 117 parsing
Added unit testing for Stata 13 format
@PKEuS
Copy link
Contributor Author

PKEuS commented Sep 21, 2013

Squashed to two commits and rebased.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

@PKEuS looks fine to me ... @jseabold ?

@jseabold
Copy link
Contributor

I defer to the test suite ;)

jreback added a commit that referenced this pull request Sep 21, 2013
@jreback jreback merged commit a7eb339 into pandas-dev:master Sep 21, 2013
@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

@PKEuS thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: update stata for Stata 13 format
3 participants