Quantcast
Channel: Basics Category Page - PythonForBeginners.com
Viewing all articles
Browse latest Browse all 193

Pandas Apply Function to Dataframe or Series

$
0
0

The pandas apply() or applymap() method is used to apply a function to values in a dataframe or a series. In this article, we will discuss the syntax and use of the pandas apply function in Python. 

The apply() Method

The apply() method has the following syntax.

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)
  • The func parameter takes a function that is executed on the series or dataframe. If the input function takes a single value as input and provides a single value as output as in the square root function, the function is executed on each value in the series. When the apply() method is invoked on a dataframe, the function should take a series as its input.
    • If the function is an aggregate function such as the sum function, the function is executed with the entire row or column as the input.
  • The axis parameter is used to specify whether rows or columns are taken as input when we use an aggregate function as input to the apply() function. By default, it has the value 0 or ‘index’ which means that the input function is applied to each column. To apply the function on each row, you can set the axis parameter to 1.
  • The raw parameter is used to determine if a row or column is passed as a Series or ndarray object to the input function. By default, it is set to False which means that the apply() function passes each row or column as a Series to the input function. If you want to improve the performance of the code, you can set the raw parameter to True. After this, the input function will receive ndarray objects as its input. 
  • The result_type parameter is used only when the axis parameter is set to 1. The result_type parameter can take 4 values as input.
    • When the result_type parameter is set to ‘expand’, list-like results will be turned into columns.
    • When the result_type parameter is set to reduce, the apply() method returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
    • When the result_type parameter is set to “broadcast”, results will be broadcast to the original shape of the DataFrame, and the original index and columns will be retained.
    • If the result_type parameter is set to None, which is its default value, the return value of the apply() function depends on the return value of the input function. Hence, the apply() function returns list-like results as a series of those. However, if the apply() function returns a Series these are expanded to columns.

After execution, the apply() function returns the modified dataframe or series.

Pandas Apply a Function to a Series

To apply a function to a pandas series, you can simply pass the function as an input argument to the apply() method as shown below.

import pandas as pd
import numpy as np
numbers=[100,90,80,90,70,100,60]
series=pd.Series(numbers)
print("The series is:")
print(series)
newSeries=series.apply(np.sqrt)
print("The updated series is:")
print(newSeries)

Output:

he series is:
0    100
1     90
2     80
3     90
4     70
5    100
6     60
dtype: int64
The updated series is:
0    10.000000
1     9.486833
2     8.944272
3     9.486833
4     8.366600
5    10.000000
6     7.745967
dtype: float64

In the above example, the apply() method, when invoked on the series, takes the numpy.sqrt function as its input argument. The function is executed on every element of the series and we get the output series.

Here, we have passed the inbuilt numpy.sqrt function to the apply() method. You can also pass a custom function to the apply() method as shown below.

import pandas as pd
import numpy as np
def fun1(x):
    nameDict={100:"Hundred", 90:"Ninety", 80:"Eighty", 70:"Seventy", 60:"Sixty"}
    if x in nameDict:
        return nameDict[x]
    else:
        return x

numbers=[100,90,80,90,70,100,60]
series=pd.Series(numbers)
print("The series is:")
print(series)
newSeries=series.apply(fun1)
print("The updated series is:")
print(newSeries)

Output:

The series is:
0    100
1     90
2     80
3     90
4     70
5    100
6     60
dtype: int64
The updated series is:
0    Hundred
1     Ninety
2     Eighty
3     Ninety
4    Seventy
5    Hundred
6      Sixty
dtype: object

In this example, we have created a function fun1() that takes a number as input and returns its alphabetical representation. When we pass fun1() to the apply() method, you can observe that the function is executed on all the elements of the series.

Pandas Apply Function to a Dataframe

Instead of a series, you can also use the apply function with pandas dataframe. For this, we have to use two functions based on the use case.

If you want to apply an in-built serializable function to a dataframe such as numpy.sqrt function, you can invoke the apply() function on the dataframe and pass the function as an input argument. After execution of the apply() function, it will return a new dataframe as shown below.

import pandas as pd
import numpy as np
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
newDf=df.apply(np.sqrt)
print("The updated dataframe is:")
print(newDf)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The updated dataframe is:
       Roll      Maths    Physics  Chemistry
0  1.000000  10.000000   8.944272   9.486833
1  1.414214   8.944272  10.000000   9.486833
2  1.732051   9.486833   8.944272   8.366600
3  2.000000  10.000000  10.000000   9.486833
4  2.236068   9.486833   9.486833   8.944272
5  2.449490   8.944272   8.366600   8.366600

If you have defined a custom function that works on dataframe values, the apply() method doesn’t work with the function. when we pass such a function to the apply() method as input, the program runs into a python TypeError exception as shown below.

import pandas as pd
import numpy as np
def fun1(x):
    nameDict={100:"Hundred", 90:"Ninety", 80:"Eighty", 70:"Seventy", 60:"Sixty"}
    if x in nameDict:
        return nameDict[x]
    else:
        return x
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
newDf=df.apply(fun1)
print("The updated dataframe is:")
print(newDf)

Output:

TypeError: unhashable type: 'Series'

To avoid the above error saying TypeError: unhashable type: ‘Series’, you can use the applymap() function instead of the apply() function to use a custom function on the dataframe values.

Apply Custom Function to Pandas Dataframe Values

You can pass a user-defined function to the applymap() method to apply a custom function on the pandas dataframe as shown below.

import pandas as pd
import numpy as np
def fun1(x):
    nameDict={100:"Hundred", 90:"Ninety", 80:"Eighty", 70:"Seventy", 60:"Sixty"}
    if x in nameDict:
        return nameDict[x]
    else:
        return x
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
newDf=df.applymap(fun1)
print("The updated dataframe is:")
print(newDf)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The updated dataframe is:
   Roll    Maths  Physics Chemistry
0     1  Hundred   Eighty    Ninety
1     2   Eighty  Hundred    Ninety
2     3   Ninety   Eighty   Seventy
3     4  Hundred  Hundred    Ninety
4     5   Ninety   Ninety    Eighty
5     6   Eighty  Seventy   Seventy

In this example, we have used the applymap() method instead of the apply() method to apply a custom function to a dataframe. Hence, the program doesn’t run into any errors.

Pandas Apply a Function to a Column in  a Dataframe

Instead of the entire pandas dataframe, you can also apply any function on a column in the dataframe. For this, you just need to invoke the apply()  method on the given column and pass the input function to the apply() method as shown below.

import pandas as pd
import numpy as np
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Maths"]=df["Maths"].apply(np.sqrt)
print("The updated dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The updated dataframe is:
   Roll      Maths  Physics  Chemistry
0     1  10.000000       80         90
1     2   8.944272      100         90
2     3   9.486833       80         70
3     4  10.000000      100         90
4     5   9.486833       90         80
5     6   8.944272       70         70

A column in a pandas dataframe is essentially a series object. Hence, the apply() method works on a column of pandas dataframe in the same manner it works on a series.

Apply Custom Function to One Column in a Dataframe

You can also apply a user-defined function to a column in a dataframe using the apply() method as shown in the following example.

import pandas as pd
import numpy as np
def fun1(x):
    nameDict={100:"Hundred", 90:"Ninety", 80:"Eighty", 70:"Seventy", 60:"Sixty"}
    if x in nameDict:
        return nameDict[x]
    else:
        return x
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df["Maths"]=df["Maths"].apply(fun1)
print("The updated dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The updated dataframe is:
   Roll    Maths  Physics  Chemistry
0     1  Hundred       80         90
1     2   Eighty      100         90
2     3   Ninety       80         70
3     4  Hundred      100         90
4     5   Ninety       90         80
5     6   Eighty       70         70

Pandas Apply Function to Multiple Columns in  a Dataframe

Instead of a single column, you can also apply a function to multiple columns in a dataframe. For this, you need to select all the columns of the dataframe and then apply the function on the columns using the apply() method as shown in the following example.

import pandas as pd
import numpy as np
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df[["Maths","Physics", "Chemistry"]]=df[["Maths","Physics", "Chemistry"]].apply(np.sqrt)
print("The updated dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The updated dataframe is:
   Roll      Maths    Physics  Chemistry
0     1  10.000000   8.944272   9.486833
1     2   8.944272  10.000000   9.486833
2     3   9.486833   8.944272   8.366600
3     4  10.000000  10.000000   9.486833
4     5   9.486833   9.486833   8.944272
5     6   8.944272   8.366600   8.366600

The above approach works only if the function given to the apply() method is a built-in serializable function such as numpy.sqrt.

Apply Custom Function to Multiple Columns in a Dataframe

If you want to apply a custom function on pandas dataframe values, you can use the applymap() method instead of the apply() method as shown below.

import pandas as pd
import numpy as np
def fun1(x):
    nameDict={100:"Hundred", 90:"Ninety", 80:"Eighty", 70:"Seventy", 60:"Sixty"}
    if x in nameDict:
        return nameDict[x]
    else:
        return x
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df[["Maths","Physics", "Chemistry"]]=df[["Maths","Physics", "Chemistry"]].applymap(fun1)
print("The updated dataframe is:")
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The updated dataframe is:
   Roll    Maths  Physics Chemistry
0     1  Hundred   Eighty    Ninety
1     2   Eighty  Hundred    Ninety
2     3   Ninety   Eighty   Seventy
3     4  Hundred  Hundred    Ninety
4     5   Ninety   Ninety    Eighty
5     6   Eighty  Seventy   Seventy

Conclusion

In this article, we have discussed different ways to apply a function to a dataframe using the apply() method. We also discussed how to apply a custom function to a pandas dataframe using the applymap() method.

To learn more about python programming, you can read this article on how to sort a pandas dataframe. You might also like this article on how to drop columns from a pandas dataframe.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Pandas Apply Function to Dataframe or Series appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 193

Trending Articles