https://s.w.org/images/core/emoji/13.1.0/72×72/1f4a1.png
The Pandas DataFrame has several methods concerning Computations and Descriptive Stats. When applied to a DataFrame, these methods evaluate the elements and return the results.
- Part 1 focuses on the DataFrame methods
abs()
,all()
,any()
,clip()
,corr()
, andcorrwith()
. - Part 2 focuses on the DataFrame methods
count()
,cov()
,cummax()
,cummin()
,cumprod()
,cumsum()
. - Part 3 focuses on the DataFrame methods
describe()
,diff()
,eval()
,kurtosis()
. - Part 4 focuses on the DataFrame methods
mad()
,min()
,max()
,mean()
,median()
, andmode()
. - Part 5 focuses on the DataFrame methods
pct_change()
,quantile()
,rank()
,round()
,prod()
, andproduct()
.
Getting Started
Remember to add the Required Starter Code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
Required Starter Code
import pandas as pd import numpy as np
Before any data manipulation can occur, two new libraries will require installation.
- The
pandas
library enables access to/from a DataFrame. - The
numpy
library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
$ pip install numpy
Hit the <Enter>
key on the keyboard to start the installation process.
Feel free to check out the correct ways of installing those libraries here:
If the installations were successful, a message displays in the terminal indicating the same.
DataFrame pct_change()
The pct_change()
method calculates and returns the percentage change between the current and prior element(s) in a DataFrame. The return value is the caller.
To fully understand this method and other methods in this tutorial from a mathematical point of view, feel free to watch this short tutorial:
The syntax for this method is as follows:
DataFrame.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
Parameter | Description |
---|---|
periods |
This sets the period(s) to calculate the percentage change. |
fill_method |
This determines what value NaN contains. |
limit |
This sets how many NaN values to fill in the DataFrame before stopping. |
freq |
Used for a specified time series. |
**kwargs |
Additional keywords passed into a DataFrame/Series. |
This example calculates and returns the percentage change of four (4) fictitious stocks over three (3) months.
df = pd.DataFrame({'ASL': [18.93, 17.03, 14.87], 'DBL': [39.91, 41.46, 40.99], 'UXL': [44.01, 43.67, 41.98]}, index= ['2021-10-01', '2021-11-01', '2021-12-01']) result = df.pct_change(axis='rows', periods=1) print(result)
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df
. - Line [2] uses the
pc_change()
method with a selected axis and period to calculate the change. This output saves to theresult
variable. - Line [3] outputs the result to the terminal.
Output:
ASL | DBL | UXL | |
2021-10-01 | NaN | NaN | NaN |
2021-11-01 | -0.100370 | 0.038837 | -0.007726 |
2021-12-01 | -0.126835 | -0.011336 | -0.038699 |
Note: The first line contains NaN
values as there is no previous row.
DataFrame quantile()
The quantile()
method returns the values from a DataFrame/Series at the specified quantile and axis.
The syntax for this method is as follows:
DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')
Parameter | Description |
---|---|
q |
This is a value 0 <= q <= 1 and is the quantile(s) to calculate. |
axis |
If zero (0) or index, apply the function to each column. Default is None . If one (1) or column, apply the function to each row. |
numeric_only |
Only include columns that contain integers, floats, or boolean values. |
interpolation |
Calculates the estimated median or quartiles for the DataFrame/Series. |
To fully understand the interpolation
parameter from a mathematical point of view, feel free to check out this tutorial:
This example uses the same stock DataFrame as noted above to determine the quantile(s).
df = pd.DataFrame({'ASL': [18.93, 17.03, 14.87], 'DBL': [39.91, 41.46, 40.99], 'UXL': [44.01, 43.67, 41.98]}) result = df.quantile(0.15) print(result)
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df
. - Line [2] uses the
quantile()
method to calculate by setting theq
(quantile) parameter to 0.15. This output saves to theresult
variable. - Line [3] outputs the result to the terminal.
Output:
ASL | 15.518 |
DBL | 40.234 |
USL | 42.487 |
Name: 0.15, dtype: float64 |
DataFrame rank()
The rank()
method returns a DataFrame/Series with the values ranked in order. The return value is the same as the caller.
The syntax for this method is as follows:
DataFrame.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
Parameter | Description |
---|---|
axis |
If zero (0) or index, apply the function to each column. Default is None . If one (1) or column, apply the function to each row. |
method |
Determines how to rank identical values, such as: – The average rank of the group. – The lowest (min) rank value of the group. – The highest (max) rank value of the group. – Each assigns in the same order they appear in the array. – Density increases by one (1) between the groups. |
numeric_only |
Only include columns that contain integers, floats, or boolean values. |
na_option |
Determines how NaN values rank, such as: – Keep assigns a NaN to the rank values. – Top: The lowest rank to any NaN values found. – Bottom: The highest to any NaN values found. |
ascending |
Determines if the elements/values rank in ascending or descending order. |
pct |
If set to True , the results will return in percentile form. By default, this value is False . |
For this example, a CSV file is read in and is ranked on Population and sorted. Click here to download and move this file to the current working directory.
df = pd.read_csv("countries.csv") df["Rank"] = df["Population"].rank() df.sort_values("Population", inplace=True) print(df)
- Line [1] reads in the
countries.csv
file and saves it todf
. - Line [2] appends a column to the end of the DataFrame (
df
). - Line [3] sorts the CSV file in ascending order.
- Line [4] outputs the result to the terminal.
Output:
Country | Capital | Population | Area | Rank | |
4 | Poland | Warsaw | 38383000 | 312685 | 1.0 |
2 | Spain | Madrid | 47431256 | 498511 | 2.0 |
3 | Italy | Rome | 60317116 | 301338 | 3.0 |
1 | France | Paris | 67081000 | 551695 | 4.0 |
0 | Germany | Berlin | 83783942 | 357021 | 5.0 |
5 | Russia | Moscow | 146748590 | 17098246 | 6.0 |
6 | USA | Washington | 328239523 | 9833520 | 7.0 |
8 | India | Dheli | 1352642280 | 3287263 | 8.0 |
7 | China | Beijing | 1400050000 | 9596961 | 9.0 |
DataFrame round()
The round()
method rounds the DataFrame output to a specified number of decimal places.
The syntax for this method is as follows:
DataFrame.round(decimals=0, *args, **kwargs)
Parameter | Description |
---|---|
decimals |
Determines the specified number of decimal places to round the value(s). |
*args |
Additional keywords passed into a DataFrame/Series. |
**kwargs |
Additional keywords passed into a DataFrame/Series. |
For this example, the Bank of Canada’s mortgage rates over three (3) months display and round to three (3) decimal places.
Code Example 1:
df = pd.DataFrame([(2.3455, 1.7487, 2.198)], columns=['Month 1', 'Month 2', 'Month 3']) result = df.round(3) print(result)
- Line [1] creates a DataFrame complete with column names and saves to
df
. - Line [2] rounds the mortgage rates to three (3) decimal places. This output saves to the
result
variable. - Line [3] outputs the result to the terminal.
Output:
Month 1 | Month 2 | Month 3 | |
0 | 2.346 | 1.749 | 2.198 |
Another way to perform the same task is with a Lambda!
Code Example 2:
df = pd.DataFrame([(2.3455, 1.7487, 2.198)], columns=['Month 1', 'Month 2', 'Month 3']) result = df.apply(lambda x: round(x, 3)) print(result)
- Line [1] creates a DataFrame complete with column names and saves to
df
. - Line [2] rounds the mortgage rates to three (3) decimal places using a Lambda. This output saves to the
result
variable. - Line [3] outputs the result to the terminal.
Note: The output is identical to that of the above.
DataFrame Prod and Product
The prod() and product() methods are identical. Both return the product of the values of a requested axis.
The syntax for these methods is as follows:
DataFrame.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
DataFrame.product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
Parameters:
Axis: If zero (0) or index, apply the function to each column. Default is None.
If one (1) or column, apply the function to each row.
Skip_na: If set to True, this parameter excludes NaN/NULL values when calculating the result.
Level: Set the appropriate parameter if the DataFrame/Series is multi-level.
If no value, then None is assumed.
Numeric_only: Only include columns that contain integers, floats, or boolean values.
Min_count: The number of values on which to perform the calculation.
**kwargs: Additional keywords passed into a DataFrame/Series.
For this example, random numbers generate and the product on the selected axis returns.
Code:
df = pd.DataFrame({‘A’: [2, 4, 6],
‘B’: [7, 3, 5],
‘C’: [6, 3, 1]})
index_ = [‘A’, ‘B’, ‘C’]
df.index = index_
result = df.prod(axis=0)
print(result)
Line [1] creates a DataFrame complete with random numbers and saves to df.
Line [2-3] creates and sets the DataFrame index.
Line [3] calculates the product along axis 0. This output saves to the result variable.
Line [4] outputs the result to the terminal.
Output:
Formula Example: 2*4*6=48
A | 48 |
B | 105 |
C | 18 |
dtype: int64 |
Finxter