Quantcast
Channel: Basics Category Page - PythonForBeginners.com
Viewing all articles
Browse latest Browse all 193

Pandas Map vs Apply Method in Python

$
0
0

Pandas dataframes provide us with various methods to perform data manipulation. Two of those methods are the map() method and the apply() method. This article discusses pandas map vs apply to compare both methods.

The map() Method 

The pandas map method is used to execute a function on a pandas series or a column in a dataframe. When invoked on a series, the map() method takes a function, another series, or a Python dictionary as its input argument.

  • If we pass a function as input to the map() method, the function is executed with all the elements of the series, and a new series is created with the output. 
  • When we pass a dictionary to the map() method, the keys of the dictionary should be the present element of the series and the values of the dictionary should be the desired values. After execution of the map() method, the elements of the series are mapped to new elements according to the dictionary, and a new series is created. 
  • If we pass another series to the map() method, the indices of the series should be the present values of the series and the elements of the input series should be the desired values. After execution of the map() method, the elements of the series are mapped to new elements according to the input series, and a new series is created. 

You can observe this in the following example.

import pandas as pd
import numpy as np
series=pd.Series([1,2,3,4,5,6,7])
print("The series is:")
print(series)
series=series.map(np.sqrt)
print("The modified series is:")
print(series)

Output:

The series is:
0    1
1    2
2    3
3    4
4    5
5    6
6    7
dtype: int64
The modified series is:
0    1.000000
1    1.414214
2    1.732051
3    2.000000
4    2.236068
5    2.449490
6    2.645751
dtype: float64

In the above example, we first created a series using the Series() function. Then, we passed the numpy.sqrt function to the map() method. You can observe that the function is applied to each element of the input series object and then the output series is created.

In the map() method, you cannot use aggregate functions as the function is applied to each element of the series. If we pass an aggregate function such as sum() to the map() method, the program will run into an error. You can observe this in the following example.

import pandas as pd
series=pd.Series([1,2,3,4,5,6,7])
print("The series is:")
print(series)
series=series.map(sum)
print("The modified series is:")
print(series)

Output:

The series is:
0    1
1    2
2    3
3    4
4    5
5    6
6    7
dtype: int64
TypeError: 'int' object is not iterable

In this example, we passed the sum() function to the map() method. You can observe that the program runs into a Python TypeError exception saying that the element of the series is not iterable.

The apply() Method

We use the pandas apply method to apply functions on a series or a dataframe. The apply() method, when invoked on a series, takes a function as its input.

  • If the input function takes a single value as input and provides a single value as output as in the square root function, the function is executed on each value in the series or dataframe. Here, the function must support broadcasting so that it can be executed on the elements of the series and dataframe
  • If the function is an aggregate function such as the sum function, the function is executed with the entire row or column as the input. 

You can observe the above behavior in the following code. 

import pandas as pd
import numpy as np
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The output dataframe is:")
df=df.apply(np.sqrt)
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The output dataframe is:
       Roll      Maths    Physics  Chemistry
0  1.000000  10.000000   8.944272   9.486833
1  1.414214   8.944272  10.000000   9.486833
2  1.732051   9.486833   8.944272   8.366600
3  2.000000  10.000000  10.000000   9.486833
4  2.236068   9.486833   9.486833   8.944272
5  2.449490   8.944272   8.366600   8.366600

In this example, we passed the numpy.sqrt() function to the apply() method. You can observe that the function is applied to all the elements of the dataframe to produce the output. This is due to the reason that the sqrt() function supports broadcasting. If a function such as a user-defined function doesn’t support broadcasting, the program will run into an error.

When we pass an aggregate function to the apply() method as its input, it works on the columns of a dataframe as shown below.

import pandas as pd
import numpy as np
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
print("The output dataframe is:")
df=df.apply(sum)
print(df)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The output dataframe is:
Roll          21
Maths        540
Physics      520
Chemistry    490
dtype: int64

In this example, we passed the sum() function to the apply() method. You can observe that the output dataframe contains the sum of values in all the columns in the input dataframe.

Pandas Map vs Apply in Python

Although you might think that the pandas map and apply function work in the same way, they are entirely different. Following are some of the differences between the pandas map vs apply method.

The map() methodThe apply() method
The map() method is defined only for Series objects.The apply() method is defined for Series as well as Dataframes.
It works with a function, series, or dictionary as its input argument.It works with only a function as its input argument.
The map() method operates the functions on one element at a time.The apply() method operates elementwise in a dataframe with only those functions that support broadcasting. For a series, it operates elementwise.
If you pass an aggregate function as input, the map() method will throw an error saying that the elements of the series are not iterable.Aggregate functions work on a column or row as a whole to produce the output when used with the apply() method on a dataframe. Aggregate functions don’t work with Series objects. 
Pandas Map vs Apply Table

Conclusion

In this article, we discussed the differences between the pandas apply vs map method in Python. To learn more about Python programming, you can read this article on tuple index out of range error in Python. You might also like this article on string manipulation in Python.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy learning!

The post Pandas Map vs Apply Method in Python appeared first on PythonForBeginners.com.


Viewing all articles
Browse latest Browse all 193

Trending Articles