Python for Beginners: Pandas Where Method With Series and DataFrame

While working with pandas dataframe, we often filter data using different conditions. In this article, we will discuss how we can use the pandas where method to filter and replace data from a series or dataframe.

The Pandas where() Method

We use the pandas where() method to replace a value based on a condition. The where() method has the following syntax.

DataFrame.where(cond, other=_NoDefault.no_default, *, inplace=False, axis=None, level=None)

Here, 

  • The cond parameter takes a condition or multiple conditional statements as input arguments. The conditional statements must evaluate to a series of True and False values. If the cond parameter is True for a row, the data is preserved in that row. All the values are set to None for the rows where the cond parameter evaluates to False. 
  • The other parameter takes a function, series, dataframe, or scaler value as its input argument. All the entries where the cond parameter evaluates to False are replaced with the corresponding value from the other parameter. If we pass a function to the other parameter, it is computed on the DataFrame and should return scalar or Series/DataFrame. The function must not change the input DataFrame. If we don’t specify the other parameter, all the values are set to None for the rows where the cond parameter evaluates to False. 
  • By default, the where() method returns a new dataframe after execution. If you want to modify the existing dataframe using the where() method, you can set the inplace parameter to True. After this, the original dataframe will be modified to store the output.
  • The axis parameter is used to set the alignment axis if needed. For Series, the axis parameter is unused. For dataframes, it has a default value of 0.
  • The level parameter is used to set the alignment level if required.

Now, let us discuss how we can use the where() method with a series or a dataframe.

Pandas Where() Method With Series in Python

When we invoke the where() method on a pandas series, it takes a condition as its input argument. After execution, it returns a new series. In the output series, the values that fulfill the condition in the input argument and unchanged while the rest of the values are set to None. You can observe this in the following example.

import pandas as pd
series=pd.Series([1,23,12,423,4,53,231,234,1])
print("The input series is:")
print(series)
output=series.where(series>50)
print("The output series is:")
print(output)

Output:

The input series is:
0      1
1     23
2     12
3    423
4      4
5     53
6    231
7    234
8      1
dtype: int64
The output series is:
0      NaN
1      NaN
2      NaN
3    423.0
4      NaN
5     53.0
6    231.0
7    234.0
8      NaN
dtype: float64

In the above example, we passed the condition series>50 to the where() method. In the output series, you can observe that the where() method preserves the numbers greater than 50. On the other hand, values less than 50 are set to None.

Replace a Value Based on a Condition Using The where() Method

Instead of None, we can also set a replacement value for the values in the series that don’t fulfill the condition given in the input to the where() method. For this, we will pass the replacement value as the second input argument to the where() method. After execution, it returns a series in which the values that fulfill the condition remain unchanged while the other values are replaced using the replacement value. You can observe this in the following example.

import pandas as pd
series=pd.Series([1,23,12,423,4,53,231,234,1])
print("The input series is:")
print(series)
output=series.where(series>50,-1)
print("The output series is:")
print(output)

Output:

The input series is:
0      1
1     23
2     12
3    423
4      4
5     53
6    231
7    234
8      1
dtype: int64
The output series is:
0     -1
1     -1
2     -1
3    423
4     -1
5     53
6    231
7    234
8     -1
dtype: int64

In the above example, we have set the other parameter to -1. Hence, the numbers less than 50 are set to -1 in the output dataframe.

Replace a Value Using a Function Based on a Condition Using The where() Method

Instead of a value, we can also pass a function for replacing the values in the series using the where() method. For instance, consider the following example.

def myFun(x):
    return x**2
import pandas as pd
series=pd.Series([1,23,12,423,4,53,231,234,1])
print("The input series is:")
print(series)
output=series.where(series>50,other=myFun)
print("The output series is:")
print(output)

Output:

The input series is:
0      1
1     23
2     12
3    423
4      4
5     53
6    231
7    234
8      1
dtype: int64
The output series is:
0      1
1    529
2    144
3    423
4     16
5     53
6    231
7    234
8      1
dtype: int64

In the above code, we have defined a function myFun() that takes a number and returns its square. Then, we passed the function to the other parameter in the where() method. After this, the values less than 50 are first passed to the function myFun(). The where() method then gives the output of myFun() function in the output series in all the positions where the cond parameter is False.

Pandas Where Method With DataFrame

Instead of a series, we can also use the where() method on a dataframe. When we invoke the where() method on a dataframe, it takes a condition as its input argument. After execution, it returns a dataframe created from the input dataframe.

Here, the rows that fulfill the condition given as input to the where() method remain unchanged. All the other rows are filled with a None value. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df1=df.where(df["Maths"]>80)
print("The output dataframe is:")
print(df1)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The output dataframe is:
   Roll  Maths  Physics  Chemistry
0   1.0  100.0     80.0       90.0
1   NaN    NaN      NaN        NaN
2   3.0   90.0     80.0       70.0
3   4.0  100.0    100.0       90.0
4   5.0   90.0     90.0       80.0
5   NaN    NaN      NaN        NaN

Instead of the None value, we can also give a replacement value to the where() method as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df1=df.where(df["Maths"]>80,"LOW")
print("The output dataframe is:")
print(df1)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The output dataframe is:
  Roll Maths Physics Chemistry
0    1   100      80        90
1  LOW   LOW     LOW       LOW
2    3    90      80        70
3    4   100     100        90
4    5    90      90        80
5  LOW   LOW     LOW       LOW

In the above examples, you can observe that the where() method works in a similar manner it works with a series. The only difference is that the results are applied to the entire row instead of a single value.

Pandas where() Method With Multiple Conditions

We can also use multiple conditions in a single where method. For this, we will operate all the conditions with AND/OR logical operator. After the execution of each condition, the logical operations are performed and we get a mask containing True and False values. The mask is then used to create the output dataframe. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df1=df.where((df["Maths"]>80) & (df["Chemistry"]>80))
print("The output dataframe is:")
print(df1)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The output dataframe is:
   Roll  Maths  Physics  Chemistry
0   1.0  100.0     80.0       90.0
1   NaN    NaN      NaN        NaN
2   NaN    NaN      NaN        NaN
3   4.0  100.0    100.0       90.0
4   NaN    NaN      NaN        NaN
5   NaN    NaN      NaN        NaN

Conclusion

In this article, we discussed different ways to use the pandas where method with a series or dataframe in Python. To learn more about Python programming, you can read this article on how to read excel into pandas dataframe. You might also like this article on how to map functions to a pandas series in Python.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Pandas Where Method With Series and DataFrame appeared first on PythonForBeginners.com.

Planet Python