Description
Hi,
I noticed that when exporting data to stata the NaN
values are not always converted to Stata missing values but instead left blank. This somehow confuses Stata which does not allow using the destring
command to solve the problem nor using replace value=. if value==.
.
As an Example I downloaded the World Development Indicators and used the following commands to export National Savings to an the excel and csv file:
import pandas as pd
import os
dfwdi=pd.read_excel('WDI.xlsx','Data')
dfwdi.columns
dfout=dfwdi.ix[dfwdi['Indicator Code']=='NY.GDS.TOTL.ZS']
dfout
cols=['savyr'+str(i) for i in xrange(1960,dfwdi.columns.values[-1]+1)]
dfout.reset_index(inplace=True, drop=True)
dfout.to_csv('sav.csv', index=False)
dfout.to_stata('sav.dta', write_index=False)
If you import the data into Stata (I am using v.13) and run the following commands, things fail.
use "sav.dta", clear
* Correct number of missing values
summ savyr2000
reg savyr2010 savyr 2000
* Correct countries identified as missing
tab code if savyr==.
* replace missing values to "."
* One cannot replace the missing not presented as "."
replace savyr2010==. if savyr==""
* Use "." to identify
replace savyr2010==. if savyr==.
* Perform analysis again
summ savyr2000
reg savyr2010 savyr 2000
* Still fails
As you can see Stata does not perform the analysis, even though it correctly recognizes the missing values. But not all of them are presented as ".". If one imports the the csv version into Stata and runs the same initial commands it works fine.
import delimited "sav.csv"
* Correct number of missing values
summ savyr2000
reg savyr2010 savyr 2000
Furthermore, for some reason the index is still present in the stata file, even though I had used the write_index=False
option.
I am using Enthought's Canopy distribution on OSX Mavericks with Pandas '0.13.1'. Haven't tried on other Python dists.