https://s.w.org/images/core/emoji/13.1.0/72×72/1f4a1.png
The Pandas DataFrame has several methods concerning Computations and Descriptive Stats. When applied to a DataFrame, these methods evaluate the elements and return the results.
- Part 1 focuses on the DataFrame methods
abs(),all(),any(),clip(),corr(), andcorrwith(). - Part 2 focuses on the DataFrame methods
count(),cov(),cummax(),cummin(),cumprod(),cumsum(). - Part 3 focuses on the DataFrame methods
describe(),diff(),eval(),kurtosis(). - Part 4 focuses on the DataFrame methods
mad(),min(),max(),mean(),median(), andmode(). - Part 5 focuses on the DataFrame methods
pct_change(),quantile(),rank(),round(),prod(), andproduct().
Getting Started
Remember to add the Required Starter Code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
Required Starter Code
import pandas as pd import numpy as np
Before any data manipulation can occur, two new libraries will require installation.
- The
pandaslibrary enables access to/from a DataFrame. - The
numpylibrary supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.
To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install numpy
Hit the <Enter> key on the keyboard to start the installation process.
Feel free to check out the correct ways of installing those libraries here:
If the installations were successful, a message displays in the terminal indicating the same.
DataFrame pct_change()
The pct_change() method calculates and returns the percentage change between the current and prior element(s) in a DataFrame. The return value is the caller.
To fully understand this method and other methods in this tutorial from a mathematical point of view, feel free to watch this short tutorial:
The syntax for this method is as follows:
DataFrame.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
| Parameter | Description |
|---|---|
periods |
This sets the period(s) to calculate the percentage change. |
fill_method |
This determines what value NaN contains. |
limit |
This sets how many NaN values to fill in the DataFrame before stopping. |
freq |
Used for a specified time series. |
**kwargs |
Additional keywords passed into a DataFrame/Series. |
This example calculates and returns the percentage change of four (4) fictitious stocks over three (3) months.
df = pd.DataFrame({'ASL': [18.93, 17.03, 14.87],
'DBL': [39.91, 41.46, 40.99],
'UXL': [44.01, 43.67, 41.98]},
index= ['2021-10-01', '2021-11-01', '2021-12-01'])
result = df.pct_change(axis='rows', periods=1)
print(result)
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df. - Line [2] uses the
pc_change()method with a selected axis and period to calculate the change. This output saves to theresultvariable. - Line [3] outputs the result to the terminal.
Output:
| ASL | DBL | UXL | |
| 2021-10-01 | NaN | NaN | NaN |
| 2021-11-01 | -0.100370 | 0.038837 | -0.007726 |
| 2021-12-01 | -0.126835 | -0.011336 | -0.038699 |
Note: The first line contains NaN values as there is no previous row.
DataFrame quantile()
The quantile() method returns the values from a DataFrame/Series at the specified quantile and axis.
The syntax for this method is as follows:
DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')
| Parameter | Description |
|---|---|
q |
This is a value 0 <= q <= 1 and is the quantile(s) to calculate. |
axis |
If zero (0) or index, apply the function to each column. Default is None. If one (1) or column, apply the function to each row. |
numeric_only |
Only include columns that contain integers, floats, or boolean values. |
interpolation |
Calculates the estimated median or quartiles for the DataFrame/Series. |
To fully understand the interpolation parameter from a mathematical point of view, feel free to check out this tutorial:
This example uses the same stock DataFrame as noted above to determine the quantile(s).
df = pd.DataFrame({'ASL': [18.93, 17.03, 14.87],
'DBL': [39.91, 41.46, 40.99],
'UXL': [44.01, 43.67, 41.98]})
result = df.quantile(0.15)
print(result)
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df. - Line [2] uses the
quantile()method to calculate by setting theq(quantile) parameter to 0.15. This output saves to theresultvariable. - Line [3] outputs the result to the terminal.
Output:
| ASL | 15.518 |
| DBL | 40.234 |
| USL | 42.487 |
| Name: 0.15, dtype: float64 |
DataFrame rank()
The rank() method returns a DataFrame/Series with the values ranked in order. The return value is the same as the caller.
The syntax for this method is as follows:
DataFrame.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
| Parameter | Description |
|---|---|
axis |
If zero (0) or index, apply the function to each column. Default is None. If one (1) or column, apply the function to each row. |
method |
Determines how to rank identical values, such as: – The average rank of the group. – The lowest (min) rank value of the group. – The highest (max) rank value of the group. – Each assigns in the same order they appear in the array. – Density increases by one (1) between the groups. |
numeric_only |
Only include columns that contain integers, floats, or boolean values. |
na_option |
Determines how NaN values rank, such as: – Keep assigns a NaN to the rank values. – Top: The lowest rank to any NaN values found. – Bottom: The highest to any NaN values found. |
ascending |
Determines if the elements/values rank in ascending or descending order. |
pct |
If set to True, the results will return in percentile form. By default, this value is False. |
For this example, a CSV file is read in and is ranked on Population and sorted. Click here to download and move this file to the current working directory.
df = pd.read_csv("countries.csv")
df["Rank"] = df["Population"].rank()
df.sort_values("Population", inplace=True)
print(df)
- Line [1] reads in the
countries.csvfile and saves it todf. - Line [2] appends a column to the end of the DataFrame (
df). - Line [3] sorts the CSV file in ascending order.
- Line [4] outputs the result to the terminal.
Output:
| Country | Capital | Population | Area | Rank | |
| 4 | Poland | Warsaw | 38383000 | 312685 | 1.0 |
| 2 | Spain | Madrid | 47431256 | 498511 | 2.0 |
| 3 | Italy | Rome | 60317116 | 301338 | 3.0 |
| 1 | France | Paris | 67081000 | 551695 | 4.0 |
| 0 | Germany | Berlin | 83783942 | 357021 | 5.0 |
| 5 | Russia | Moscow | 146748590 | 17098246 | 6.0 |
| 6 | USA | Washington | 328239523 | 9833520 | 7.0 |
| 8 | India | Dheli | 1352642280 | 3287263 | 8.0 |
| 7 | China | Beijing | 1400050000 | 9596961 | 9.0 |
DataFrame round()
The round() method rounds the DataFrame output to a specified number of decimal places.
The syntax for this method is as follows:
DataFrame.round(decimals=0, *args, **kwargs)
| Parameter | Description |
|---|---|
decimals |
Determines the specified number of decimal places to round the value(s). |
*args |
Additional keywords passed into a DataFrame/Series. |
**kwargs |
Additional keywords passed into a DataFrame/Series. |
For this example, the Bank of Canada’s mortgage rates over three (3) months display and round to three (3) decimal places.
Code Example 1:
df = pd.DataFrame([(2.3455, 1.7487, 2.198)], columns=['Month 1', 'Month 2', 'Month 3']) result = df.round(3) print(result)
- Line [1] creates a DataFrame complete with column names and saves to
df. - Line [2] rounds the mortgage rates to three (3) decimal places. This output saves to the
resultvariable. - Line [3] outputs the result to the terminal.
Output:
| Month 1 | Month 2 | Month 3 | |
| 0 | 2.346 | 1.749 | 2.198 |
Another way to perform the same task is with a Lambda!
Code Example 2:
df = pd.DataFrame([(2.3455, 1.7487, 2.198)],
columns=['Month 1', 'Month 2', 'Month 3'])
result = df.apply(lambda x: round(x, 3))
print(result)
- Line [1] creates a DataFrame complete with column names and saves to
df. - Line [2] rounds the mortgage rates to three (3) decimal places using a Lambda. This output saves to the
resultvariable. - Line [3] outputs the result to the terminal.
Note: The output is identical to that of the above.
DataFrame Prod and Product
The prod() and product() methods are identical. Both return the product of the values of a requested axis.
The syntax for these methods is as follows:
DataFrame.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
DataFrame.product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
Parameters:
Axis: If zero (0) or index, apply the function to each column. Default is None.
If one (1) or column, apply the function to each row.
Skip_na: If set to True, this parameter excludes NaN/NULL values when calculating the result.
Level: Set the appropriate parameter if the DataFrame/Series is multi-level.
If no value, then None is assumed.
Numeric_only: Only include columns that contain integers, floats, or boolean values.
Min_count: The number of values on which to perform the calculation.
**kwargs: Additional keywords passed into a DataFrame/Series.
For this example, random numbers generate and the product on the selected axis returns.
Code:
df = pd.DataFrame({‘A’: [2, 4, 6],
‘B’: [7, 3, 5],
‘C’: [6, 3, 1]})
index_ = [‘A’, ‘B’, ‘C’]
df.index = index_
result = df.prod(axis=0)
print(result)
Line [1] creates a DataFrame complete with random numbers and saves to df.
Line [2-3] creates and sets the DataFrame index.
Line [3] calculates the product along axis 0. This output saves to the result variable.
Line [4] outputs the result to the terminal.
Output:
Formula Example: 2*4*6=48
| A | 48 |
| B | 105 |
| C | 18 |
| dtype: int64 |
Finxter


In this blog post, we will compare the performance of performing a backup from a MySQL database using 







