Skip to content

famafrench, pandas.io.data.DataReader error for 'F-F_Momentum_Factor' data #6460

Closed
@kdiether

Description

@kdiether

I think there is a very minor issue in pandas.io.data.DataReader for pandas 0.13.1 caused by an inconsistent extension in Ken French's data library. The DataReader can't process the 'F-F_Momentum_Factor' data from Ken French's website:

import pandas as pd
import pandas.io.data as web

mom = web.DataReader("F-F_Momentum_Factor", "famafrench")

Error:

Traceback (most recent call last):
  File "./example.py", line 6, in <module>
    mom = web.DataReader("F-F_Momentum_Factor", "famafrench")
  File "/usr/lib64/python2.7/site-packages/pandas/io/data.py", line 85, in DataReader
    return get_data_famafrench(name)
  File "/usr/lib64/python2.7/site-packages/pandas/io/data.py", line 497, in get_data_famafrench
    data = zf.open(name + '.txt').readlines()
  File "/usr/lib64/python2.7/zipfile.py", line 957, in open
    zinfo = self.getinfo(name)
  File "/usr/lib64/python2.7/zipfile.py", line 905, in getinfo
    'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'F-F_Momentum_Factor.txt' in the archive"

The issue appears to be caused by the fact that when 'F-F_Momentum_Factor.zip' is unzipped the underlying file is 'F-F_Momentum_Factor.TXT' and get_data_famafrench(name) in data.py assumes the extension will be lower case (I believe this is true for all the other data files on Ken's website but for whatever reason has never been true for the momentum factor file). Here is the relevant code in get_data_famafrench(name):

with ZipFile(tmpf, 'r') as zf:
    data = zf.open(name + '.txt').readlines()

There is probably a better solution to the issue but I changed the preceding to the following and it seems to work:

with ZipFile(tmpf, 'r') as zf:
    data = zf.open(zf.namelist()[0]).readlines()

Example:

mom = web.DataReader("F-F_Momentum_Factor", "famafrench")[1]
print mom.head(10)

Output:

1 Mom
192701 0.49
192702 -0.69
192703 5.41
192704 3.83
192705 3.73
192706 -0.65
192707 5.03
192708 1.15
192709 1.55
192710 -0.07

Also here is the output from pd.show_versions():

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.3-201.fc20.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.13.1
Cython: 0.20
numpy: 1.8.0
scipy: 0.12.1
statsmodels: 0.5.0
IPython: 0.13.2
sphinx: 1.1.3
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: 0.8.0
tables: 3.0.0
numexpr: 2.3
matplotlib: 1.3.1
openpyxl: 1.8.3
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.2
sqlalchemy: None
lxml: 3.2.4
bs4: None
html5lib: None
bq: None
apiclient: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions