Description
I think there is a very minor issue in pandas.io.data.DataReader
for pandas 0.13.1 caused by an inconsistent extension in Ken French's data library. The DataReader can't process the 'F-F_Momentum_Factor' data from Ken French's website:
import pandas as pd
import pandas.io.data as web
mom = web.DataReader("F-F_Momentum_Factor", "famafrench")
Error:
Traceback (most recent call last):
File "./example.py", line 6, in <module>
mom = web.DataReader("F-F_Momentum_Factor", "famafrench")
File "/usr/lib64/python2.7/site-packages/pandas/io/data.py", line 85, in DataReader
return get_data_famafrench(name)
File "/usr/lib64/python2.7/site-packages/pandas/io/data.py", line 497, in get_data_famafrench
data = zf.open(name + '.txt').readlines()
File "/usr/lib64/python2.7/zipfile.py", line 957, in open
zinfo = self.getinfo(name)
File "/usr/lib64/python2.7/zipfile.py", line 905, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'F-F_Momentum_Factor.txt' in the archive"
The issue appears to be caused by the fact that when 'F-F_Momentum_Factor.zip' is unzipped the underlying file is 'F-F_Momentum_Factor.TXT' and get_data_famafrench(name)
in data.py
assumes the extension will be lower case (I believe this is true for all the other data files on Ken's website but for whatever reason has never been true for the momentum factor file). Here is the relevant code in get_data_famafrench(name)
:
with ZipFile(tmpf, 'r') as zf:
data = zf.open(name + '.txt').readlines()
There is probably a better solution to the issue but I changed the preceding to the following and it seems to work:
with ZipFile(tmpf, 'r') as zf:
data = zf.open(zf.namelist()[0]).readlines()
Example:
mom = web.DataReader("F-F_Momentum_Factor", "famafrench")[1]
print mom.head(10)
Output:
1 Mom | |
---|---|
192701 | 0.49 |
192702 | -0.69 |
192703 | 5.41 |
192704 | 3.83 |
192705 | 3.73 |
192706 | -0.65 |
192707 | 5.03 |
192708 | 1.15 |
192709 | 1.55 |
192710 | -0.07 |
Also here is the output from pd.show_versions()
:
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.3-201.fc20.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.13.1
Cython: 0.20
numpy: 1.8.0
scipy: 0.12.1
statsmodels: 0.5.0
IPython: 0.13.2
sphinx: 1.1.3
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: 0.8.0
tables: 3.0.0
numexpr: 2.3
matplotlib: 1.3.1
openpyxl: 1.8.3
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.2
sqlalchemy: None
lxml: 3.2.4
bs4: None
html5lib: None
bq: None
apiclient: None