Closed
Description
collections.Counter
and collections.defaultdict
both have default values. However, pandas.Series.map
does not respect these defaults and instead returns missing values.
The issue is illustrated below:
import pandas
from collections import Counter, defaultdict
input = pandas.Series(range(5))
counter = Counter()
counter[1] += 1
output = input.map(counter)
expected = series.map(lambda x: counter[x])
pandas.DataFrame({
'input': input,
'output': output,
'expected': expected,
})
Here's the output:
expected input output
0 0 0 NaN
1 1 1 1.0
2 0 2 NaN
3 0 3 NaN
4 0 4 NaN
The workaround is rather easy (lambda x: dictionary[x]
) and shouldn't be to hard to implement. Are people on board with the change? Is there a performance concern with looking up each key independently?