Open In App

Python | Pandas.factorize()

Last Updated : 27 Sep, 2018
Comments
Improve
Suggest changes
Like Article
Like
Report
pandas.factorize() method helps to get the numeric representation of an array by identifying distinct values. This method is available as both pandas.factorize() and Series.factorize().
Parameters: values : 1D sequence. sort : [bool, Default is False] Sort uniques and shuffle labels. na_sentinel : [ int, default -1] Missing Values to mark 'not found'. Return: Numeric representation of array
Code: Explaining the working of factorize() method Python3 1==
# importing libraries
import numpy as np
import pandas as pd
from pandas.api.types import CategoricalDtype

labels, uniques = pd.factorize(['b', 'd', 'd', 'c', 'a', 'c', 'a', 'b'])

print("Numeric Representation : \n", labels)
print("Unique Values : \n", uniques)
Python3 1==
# sorting the numerics
label1, unique1 = pd.factorize(['b', 'd', 'd', 'c', 'a', 'c', 'a', 'b'], 
                                                           sort = True)

print("\n\nNumeric Representation : \n", label1)
print("Unique Values : \n", unique1)
Python3 1==
# Missing values indicated
label2, unique2 = pd.factorize(['b', None, 'd', 'c', None, 'a', ], 
                                              na_sentinel = -101)

print("\n\nNumeric Representation : \n", label2)
print("Unique Values : \n", unique2)
Python3 1==
# When factorizing pandas object; unique will differ 
a = pd.Categorical(['a', 'a', 'c'], categories =['a', 'b', 'c'])

label3, unique3 = pd.factorize(a)

print("\n\nNumeric Representation : \n", label3)
print("Unique Values : \n", unique3)

Next Article

Similar Reads