CodersLegacy: 10 Most Important Functions in BeautifulSoup

Beautiful Soup is a Python library that is commonly used for web scraping purposes. It is a very powerful tool for extracting and parsing data from HTML and XML files. Beautiful Soup provides several functions that make web scraping a lot easier. In this article, we will look at the 10 most important BeautifulSoup functions and how to use them to parse data.


1. BeautifulSoup()

The BeautifulSoup() function is used to create a Beautiful Soup object. This object represents the parsed HTML/XML document. It takes two arguments: the HTML/XML document as a string and the parser to be used. The parser is optional, and if it is not specified, Beautiful Soup will automatically select one based on the document.

from bs4 import BeautifulSoup

html_doc = """
<html>
  <head>
    <title>The Title</title>
  </head>
  <body>
    <p class='text'>Some text.</p>
  </body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser')

In this example, we are creating a Beautiful Soup object from an HTML string using the html.parser parser. Printing out the soup object will show you all the html it currently has stored within it.


2. find()

The find() function is used to find the first occurrence of a tag in the HTML/XML document. It takes two arguments: the name of the tag and any attributes associated with it. The attributes are optional.

from bs4 import BeautifulSoup

html_doc = """
<html>
  <head>
    <title>The Title</title>
  </head>
  <body>
    <p class='text'>Some text.</p>
  </body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser')
p_tag = soup.find('p', {'class': 'text'})
print(p_tag)
<p class="text">Some text.</p>

In this example, we are finding the first occurrence of the p tag with the class attribute set to 'text'.


3. find_all()

The find_all() function is used to find all occurrences of a tag in the HTML/XML document. It takes the same arguments as find().

from bs4 import BeautifulSoup

html_doc = """
<html>
    <head>
        <title>The Title</title>
    </head>
    <body>
        <p class='text'>Some text.</p>
        <p class='text'>More text.</p>
    </body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser')
p_tags = soup.find_all('p', {'class': 'text'})
print(p_tags)
[<p class="text">Some text.</p>, <p class="text">More text.</p>]

In this example, we are finding all occurrences of the p tag with the class attribute set to 'text'.


4. get_text()

The get_text() function is used to get the text content of a tag. It takes no arguments.

from bs4 import BeautifulSoup

html_doc = """
<html>
  <head>
    <title>The Title</title>
  </head>
  <body>
    <p class='text'>Some text.</p>
  </body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser')
p_tag = soup.find('p', {'class': 'text'})
text = p_tag.get_text()

print(text)
Some text.

In this example, we are getting the text content of the p tag we found earlier.


5. get()

The get() function is used to get the value of an attribute of a tag. It takes one argument, which is the name of the attribute.

from bs4 import BeautifulSoup

html_doc = """
<html>
  <head>
    <title>The Title</title>
  </head>
  <body>
    <p class='text'>Some text.</p>
  </body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser') 
p_tag = soup.find('p', {'class': 'text'}) 
class_attribute = p_tag.get('class')

print(class_attribute)
['text']

In this example, we are getting the value of the class attribute of the p tag we found earlier. This works for other attributes like “href” and “id” as well.


6. find_parent()

The find_parent() function is used to find the parent tag of a given tag. It takes no arguments.

from bs4 import BeautifulSoup

html_doc = """
<html>
    <head>
        <title>The Title</title>
    </head>
    <body>
        <div>
            <p class='text'>Some text.</p>
        </div>
    </body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser') 
p_tag = soup.find('p', {'class': 'text'}) 
div_tag = p_tag.find_parent('div')

print(div_tag)
<div>
<p class="text">Some text.</p>
</div>

In this example, we are finding the parent div tag of the p tag we found earlier.


7. find_next_sibling()

The find_next_sibling() function is used to find the next sibling tag of a given tag. It takes no arguments.

from bs4 import BeautifulSoup

html_doc = """
<html>
    <head>
        <title>The Title</title>
    </head>
    <body>
        <p class='text'>Some text.</p>
        <p class='text'>More text.</p>
    </body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser') 
p_tag = soup.find('p', {'class': 'text'}) 
next_p_tag = p_tag.find_next_sibling('p')

print(next_p_tag)
<p class="text">More text.</p>

In this example, we are finding the next p tag that comes after the p tag we found earlier.


8. find_all_next()

The find_all_next() function is used to find all the tags that come after a given tag in the HTML/XML document. It takes no arguments.

from bs4 import BeautifulSoup

html_doc = """
<html>
  <head>
    <title>The Title</title>
  </head>
  <body>
    <div>
      <p class='text'>Some text.</p>
      <p class='text'>More text.</p>
      <span class='text'>Some more text.</span>
    </div>
  </body>
</html>"""

soup = BeautifulSoup(html_doc, 'html.parser') 
p_tag = soup.find('p', {'class': 'text'}) 
next_tags = p_tag.find_all_next()

print(next_tags)
[<p class="text">More text.</p>, <span class="text">Some more text.</span>]

In this example, we are finding all the tags that come after the p tag we found earlier.


9. select()

The select() function is one of the most important functions in BeautifulSoup, used to select tags based on CSS selectors. It takes one argument, which is the CSS selector.

from bs4 import BeautifulSoup

html_doc = """

<html>
    <head>
        <title>The Title</title>
    </head>
    <body>
        <div>
            <p class='text'>Some text.</p>
        </div>
        <div>
            <p class='text'>More text.</p>
        </div>
    </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser') 
p_tags = soup.select('div > p.text')
print(p_tags)
[<p class="text">Some text.</p>, <p class="text">More text.</p>]

In this example, we are selecting all the p tags with the class attribute set to ‘text’ that are inside a div tag.


10. prettify()

The prettify() function is used to make the HTML/XML document more human-readable. It takes no arguments.

from bs4 import BeautifulSoup

html_doc = """<html><head><title>The Title</title></head><body><p class='text'>Some text.</p></body></html> """

soup = BeautifulSoup(html_doc, 'html.parser') 
prettified_html = soup.prettify()
print(prettified_html)
<html>
 <head>
  <title>
   The Title
  </title>
 </head>
 <body>
  <p class="text">
   Some text.
  </p>
 </body>
</html>

In this example, we are making the HTML document more human-readable using the prettify() function.


Conclusion

Beautiful Soup is a powerful tool for web scraping in Python. In this article, we have covered the 10 most important functions of Beautiful Soup and how to use them to parse data from HTML and XML files. These functions are just a few of the many functions provided by Beautiful Soup, and by mastering them, you can become an expert in web scraping with Python.

This marks the end of the 10 most Important Functions in BeautifulSoup article. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

The post 10 Most Important Functions in BeautifulSoup appeared first on CodersLegacy.

Planet Python

Python Web Scraping: From URL to CSV in No Time

https://s.w.org/images/core/emoji/14.0.0/72×72/1f4a1.png

4/5 – (1 vote)

Setting up the Environment

Before diving into web scraping with Python, set up your environment by installing the necessary libraries.

First, install the following libraries: requests, BeautifulSoup, and pandas. These packages play a crucial role in web scraping, each serving different purposes.✨

To install these libraries, click on the previously provided links for a full guide (including troubleshooting) or simply run the following commands:

pip install requests
pip install beautifulsoup4
pip install pandas

The requests library will be used to make HTTP requests to websites and download the HTML content. It simplifies the process of fetching web content in Python.

BeautifulSoup is a fantastic library that helps extract data from the HTML content fetched from websites. It makes navigating, searching, and modifying HTML easy, making web scraping straightforward and convenient.

Pandas will be helpful in data manipulation and organizing the scraped data into a CSV file. It provides powerful tools for working with structured data, making it popular among data scientists and web scraping enthusiasts. ????

Fetching and Parsing URL

Next, you’ll learn how to fetch and parse URLs using Python to scrape data and save it as a CSV file. We will cover sending HTTP requests, handling errors, and utilizing libraries to make the process efficient and smooth. ????

Sending HTTP Requests

When fetching content from a URL, Python offers a powerful library known as the requests library. It allows users to send HTTP requests, such as GET or POST, to a specific URL, obtain a response, and parse it for information.

We will use the requests library to help us fetch data from our desired URL.

For example:

import requests
response = requests.get('https://example.com/data.csv')

The variable response will store the server’s response, including the data we want to scrape. From here, we can access the content using response.content, which will return the raw data in bytes format. ????

Handling HTTP Errors

Handling HTTP errors while fetching data from URLs ensures a smooth experience and prevents unexpected issues. The requests library makes error handling easy by providing methods to check whether the request was successful.

Here’s a simple example:

import requests
response = requests.get('https://example.com/data.csv')
response.raise_for_status()

The raise_for_status() method will raise an exception if there’s an HTTP error, such as a 404 Not Found or 500 Internal Server Error. This helps us ensure that our script doesn’t continue to process erroneous data, allowing us to gracefully handle any issues that may arise. ????

With these tools, you are now better equipped to fetch and parse URLs using Python. This will enable you to effectively scrape data and save it as a CSV file. ????

Extracting Data from HTML

In this section, we’ll discuss extracting data from HTML using Python. The focus will be on utilizing the BeautifulSoup library, locating elements by their tags, and attributes. ????

Using BeautifulSoup

BeautifulSoup is a popular Python library that simplifies web scraping tasks by making it easy to parse and navigate through HTML. To get started, import the library and request the page content you want to scrape, then create a BeautifulSoup object to parse the data:

from bs4 import BeautifulSoup
import requests

url = "example_website"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

Now you have a BeautifulSoup object and can start extracting data from the HTML. ????

Locating Elements by Tags and Attributes

BeautifulSoup provides various methods to locate elements by their tags and attributes. Some common methods include find(), find_all(), select(), and select_one().

Let’s see these methods in action:

# Find the first <span> tag
span_tag = soup.find("span")

# Find all <span> tags
all_span_tags = soup.find_all("span")

# Locate elements using CSS selectors
title = soup.select_one("title")

# Find all <a> tags with the "href" attribute
links = soup.find_all("a", {"href": True})

These methods allow you to easily navigate and extract data from an HTML structure. ????

Once you have located the HTML elements containing the needed data, you can extract the text and attributes.

Here’s how:

# Extract text from a tag
text = span_tag.text

# Extract an attribute value
url = links[0]["href"]

Finally, to save the extracted data into a CSV file, you can use Python’s built-in csv module. ????

import csv

# Writing extracted data to a CSV file
with open("output.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["Index", "Title"])
    for index, link in enumerate(links, start=1):
        writer.writerow([index, link.text])

Following these steps, you can successfully extract data from HTML using Python and BeautifulSoup, and save it as a CSV file. ????

???? Recommended: Basketball Statistics – Page Scraping Using Python and BeautifulSoup

Organizing Data

This section explains how to create a dictionary to store the scraped data and how to write the organized data into a CSV file. ????

Creating a Dictionary

Begin by defining an empty dictionary that will store the extracted data elements.

In this case, the focus is on quotes, authors, and any associated tags. Each extracted element should have its key, and the value should be a list that contains individual instances of that element.

For example:

data = {
    "quotes": [],
    "authors": [],
    "tags": []
}

As you scrape the data, append each item to its respective list. This approach makes the information easy to index and retrieve when needed. ????

Working with DataFrames and Pandas

Once the data is stored in a dictionary, it’s time to convert it into a dataframe. Using the Pandas library, it’s easy to transform the dictionary into a dataframe where the keys become the column names and the respective lists become the rows.

Simply use the following command:

import pandas as pd

df = pd.DataFrame(data)

Exporting Data to a CSV File

With the dataframe prepared, it’s time to write it to a CSV file. Thankfully, Pandas comes to the rescue once again. Using the dataframe’s built-in .to_csv() method, it’s possible to create a CSV file from the dataframe, like this:

df.to_csv('scraped_data.csv', index=False)

This command will generate a CSV file called 'scraped_data.csv' containing the organized data with columns for quotes, authors, and tags. The index=False parameter ensures that the dataframe’s index isn’t added as an additional column. ????

???? Recommended: 17 Ways to Read a CSV File to a Pandas DataFrame

And there you have it—a neat, organized CSV file containing your scraped data!

Handling Pagination

This section will discuss how to handle pagination while scraping data from multiple URLs using Python to save the extracted content in a CSV format. It is essential to manage pagination effectively because most websites display their content across several pages.????

Looping Through Web Pages

Looping through web pages requires the developer to identify a pattern in the URLs, which can assist in iterating over them seamlessly. Typically, this pattern would include the page number as a variable, making it easy to adjust during the scraping process.????

Once the pattern is identified, you can use a for loop to iterate over a range of page numbers. For each iteration, update the URL with the page number and then proceed with the scraping process. This method allows you to extract data from multiple pages systematically.????

For instance, let’s consider that the base URL for every page is "https://www.example.com/listing?page=", where the page number is appended to the end.

Here is a Python example that demonstrates handling pagination when working with such URLs:

import requests
from bs4 import BeautifulSoup
import csv

base_url = "https://www.example.com/listing?page="

with open("scraped_data.csv", "w", newline="") as csvfile:
    csv_writer = csv.writer(csvfile)
    csv_writer.writerow(["Data_Title", "Data_Content"])  # Header row

    for page_number in range(1, 6):  # Loop through page numbers 1 to 5
        url = base_url + str(page_number)
        response = requests.get(url)
        soup = BeautifulSoup(response.text, "html.parser")
        
        # TODO: Add scraping logic here and write the content to CSV file.????

In this example, the script iterates through the first five pages of the website and writes the scraped content to a CSV file. Note that you will need to implement the actual scraping logic (e.g., extracting the desired content using Beautiful Soup) based on the website’s structure.????

Handling pagination with Python allows you to collect more comprehensive data sets????, improving the overall success of your web scraping efforts. Make sure to respect the website’s robots.txt rules and rate limits to ensure responsible data collection.????

Exporting Data to CSV

You can export web scraping data to a CSV file in Python using the Python CSV module and the Pandas to_csv function. ???? Both approaches are widely used and efficiently handle large amounts of data.

Python CSV Module

The Python CSV module is a built-in library that offers functionalities to read from and write to CSV files. It is simple and easy to use????. To begin with, first, import the csv module.

import csv

To write the scraped data to a CSV file, open the file in write mode ('w') with a specified file name, create a CSV writer object, and write the data using the writerow() or writerows() methods as required.

with open('data.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["header1", "header2", "header3"])
    writer.writerows(scraped_data)

In this example, the header row is written first, followed by the rows of data obtained through web scraping. ????

Using Pandas to_csv()

Another alternative is the powerful library Pandas, often used in data manipulation and analysis. To use it, start by importing the Pandas library.

import pandas as pd

Pandas offers the to_csv() method, which can be applied to a DataFrame. If you have web-scraped data and stored it in a DataFrame, you can easily export it to a CSV file with the to_csv() method, as shown below:

dataframe.to_csv('data.csv', index=False)

In this example, the index parameter is set to False to exclude the DataFrame index from the CSV file. ????

The Pandas library also provides options for handling missing values, date formatting, and customizing separators and delimiters, making it a versatile choice for data export.

10 Minutes to Pandas in 5 Minutes

If you’re just getting started with Pandas, I’d recommend you check out our free blog guide (it’s only 5 minutes!): ????

???? Recommended: 5 Minutes to Pandas — A Simple Helpful Guide to the Most Important Pandas Concepts (+ Cheat Sheet)

Be on the Right Side of Change

A Complete Guide to the ChatGPT API

https://static1.makeuseofimages.com/wordpress/wp-content/uploads/2023/03/chatgpt-openai-logo-on-green-background-feature.jpg

Through the release of its API, OpenAI has opened up the capabilities of ChatGPT to everyone. You can now seamlessly integrate ChatGPT’s power into your application.

Follow through these initial steps to get started, whether you’re looking to integrate ChatGPT into your existing application or develop new applications with it.

Getting Access to the OpenAI API Keys

To start using the ChatGPT API, you first need to obtain the OpenAI API keys. Sign up or log in to the official OpenAI platform.

MAKEUSEOF VIDEO OF THE DAYSCROLL TO CONTINUE WITH CONTENT

Once you’re logged in, click on the Personal tab in the top-right section. Select the View API Keys option from the dropdown, and you’ll land on the API keys page. Click on the Create new secret key button to generate the API key.

You won’t be able to view the key again, so store it somewhere safe.

The code used in this project is available in a GitHub repository and is free for you to use under the MIT license.

How to Use the ChatGPT API

The OpenAI API’s gpt-3.5-turbo and gpt-4 models are the same models that ChatGPT and ChatGPT+ use respectively. These powerful models are capable of understanding and generating natural language text.

Please note that the ChatGPT API is a general term that refers to OpenAI APIs that use GPT-based models for developing chatbots, including the gpt-3.5-turbo and gpt-4 models.

The ChatGPT API is primarily optimized for chat but it also works well for text completion tasks. The gpt-3.5-turbo and gpt-4 models are more powerful and cheaper than the previous GPT-3 models. However, as of writing, you can not fine-tune the GPT-3.5 models. You can only fine-tune the GPT-3 base models i.e., davinci, curie, ada, and cabbage.

As of writing, the GPT-4 API is on the waitlist. But the GPT-3.5 models are accessible to everyone, so we will be using the same in this article. Although, you can use GPT-4 right now by upgrading to ChatGPT+.

Using the ChatGPT API for Chat Completion

You need to configure the chat model to get it ready for the API call. This can be better understood with the help of an example:

 import openai

openai.api_key = "YOUR_API_KEY"

completion = openai.ChatCompletion.create(
  model = "gpt-3.5-turbo",
  temperature = 0.8,
  max_tokens = 2000,
  messages = [
    {"role": "system", "content": "You are a funny comedian who tells dad jokes."},
    {"role": "user", "content": "Write a dad joke related to numbers."},
    {"role": "assistant", "content": "Q: How do you make 7 even? A: Take away the s."},
    {"role": "user", "content": "Write one related to programmers."}
  ]
)

print(completion.choices[0].message)

Running this code produces the following output:

The above code demonstrates a ChatGPT API call using Python. Note that the model was able to understand the context ("dad joke") and the type of response (Q&A form) that we were expecting even though we didn’t explicitly mention it in the last user prompt.

Thus, when building applications, you can provide the context in advance and the model will adapt to your requirements accordingly.

Here, the most important part is the messages parameter which accepts an array of message objects. Each message object contains a role and content. You can provide three types of roles to the message objects:

  • system: It sets up the context and behavior of the assistant.
  • user: It’s used to give instructions to the assistant. It is typically generated by the end user. But you as a developer can also provide some potential user prompts beforehand.
  • assistant: We provide the assistant with some information in advance so that it gives us the response we expect from the API.

You can further customize the temperature and max_tokens parameters of the model to get the output according to your requirements.

The higher the temperature, the higher the randomness of the output, and vice-versa. If you want your responses to be more focused and deterministic, go for the lower temperature value. And if you want it to be more creative, go for the higher value. The temperature value ranges between 0 and 2.

Like ChatGPT, its API also has a word limit. Use the max_tokens parameter to limit the length of responses. However, setting a lower max_tokens value can cause potential issues as it may cut off the output mid-way. As of writing, the gpt-3.5-turbo model has a token limit of 4,096, while the gpt-4 model has a limit of 8,192 tokens.

You can further configure the model using the other parameters provided by OpenAI.

Using the ChatGPT API for Text Completion

Apart from the chat completion tasks, the gpt-3.5-turbo model also does a good job with text completion. It outperforms the previous text-davinci-003 model and is priced at only one-tenth of its cost.

The following example demonstrates how you can configure the ChatGPT API for text completion:

 import openai

openai.api_key = "YOUR_API_KEY"

completion = openai.ChatCompletion.create(
  model = "gpt-3.5-turbo",
  temperature = 0.8,
  max_tokens = 2000,
  messages = [
    {"role": "system", "content": "You are a poet who creates poems that evoke emotions."},
    {"role": "user", "content": "Write a short poem for programmers."}
  ]
)

print(completion.choices[0].message.content)

You don’t even need to provide the system role and its content. Providing just the user prompt will do the work for you.

 messages = [
  {"role": "user", "content": "Write a short poem for programmers."}
]

Running the above code will generate a poem for programmers:

Response Format of the ChatGPT API

The ChatGPT API sends the response in the following format:

You further need to extract the assistant’s reply that’s stored in the content.

Building Applications Using the ChatGPT API

You can directly use the API endpoint or the openai Python/Node.js library to start building ChatGPT API-powered applications. Apart from the official openai library, you can also develop applications using the community-maintained libraries recommended by OpenAI.

However, OpenAI does not verify the security of these community-maintained libraries, so it’s better to either directly use the API endpoint or use the official openai Python/Node.js library.

Method 1: Using the API Endpoint

You need to use the /v1/chat/completions endpoint to utilize the gpt-3.5-turbo and gpt-4 models.

 import requests

openai.api_key = "YOUR_API_KEY"
URL = "https://api.openai.com/v1/chat/completions"

payload = {
  "model": "gpt-3.5-turbo",
  "temperature" : 1.0,
  "messages" : [
    {"role": "system", "content": f"You are an assistant who tells any random and very short fun fact about this world."},
    {"role": "user", "content": f"Write a fun fact about programmers."},
    {"role": "assistant", "content": f"Programmers drink a lot of coffee!"},
    {"role": "user", "content": f"Write one related to the Python programming language."}
  ]
}

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {openai.api_key}"
}

response = requests.post(URL, headers=headers, json=payload)
response = response.json()

print(response['choices'][0]['message']['content'])

The above sample code demonstrates how you can directly use the endpoint to make the API call using the requests library.

First, assign the API key to a variable. Next, you need to provide the model name to the model parameter of the payload object. After that, we provided the conversation history to the messages parameter.

Here, we’ve kept a higher temperature value so that our response is more random and thus more creative.

Here’s the response output:

Note that there are some problems with OpenAI’s ChatGPT, so you may get offensive or biased replies from its API too.

Method 2: Using the Official openai Library

Install the openai Python library using pip:

 pip install openai 

Now, you’re ready to generate text or chat completions.

 import openai

openai.api_key = "YOUR_API_KEY"

response = openai.ChatCompletion.create(
  model = "gpt-3.5-turbo",
  temperature = 0.2,
  max_tokens = 1000,
  messages = [
    {"role": "user", "content": "Who won the 2018 FIFA world cup?"}
  ]
)

print(response['choices'][0]['message']['content'])

In this code, we only provided a single user prompt. We’ve kept the temperature value low to keep the response more deterministic rather than creative.

You’ll get the following response after running the code:

The ChatGPT responses may seem magical and can make anyone wonder how ChatGPT works. But behind the scenes, it’s backed by the Generative Pre-trained Transformer (GPT) language model that does all the heavy lifting.

Build Next Generation Apps Using the ChatGPT API

You learned how to configure the ChatGPT API. The ChatGPT API has opened gates for you and developers around the world to build innovative products leveraging the power of AI.

You can use this tool to develop applications like story writers, code translators, email writers, marketing copy generators, text summarizers, and so on. Your imagination is the limit to building applications leveraging this technology.

Apart from the ChatGPT API, you can also use other OpenAI models to develop cool applications.

MakeUseOf

Python List of Tuples to DataFrame ????

https://s.w.org/images/core/emoji/14.0.0/72×72/2b50.png

5/5 – (1 vote)

To convert a list of tuples to a Pandas DataFrame, import the pandas library, call the DataFrame constructor, and pass the list of tuples as the data argument such as in pd.DataFrame(tuples_list, columns=['Number', 'Letter']).

Here’s a minimal example:

import pandas as pd
tuples_list = [(1, 'A'), (2, 'B'), (3, 'C')]
df = pd.DataFrame(tuples_list, columns=['Number', 'Letter'])

The output of the given code will be a Pandas DataFrame with two columns, 'Number' and 'Letter', as follows:

   Number Letter
0       1      A
1       2      B
2       3      C

After the Panda image, let’s dive deeper into this conversion technique so you can improve your skills and learn more on Pandas’ assume capabilities!

I’ll also show you how to convert a list of named tuples — and how to convert the DataFrame back to a list of tuples (key-value pairs). ????

Converting a List of Tuples to DataFrame

First, let’s explore how to convert a list of tuples into a DataFrame using Python’s Pandas library. ????

Using DataFrame Constructor

The simplest way to convert a list of tuples into a DataFrame is by using the DataFrame() constructor provided by the Pandas library. This method is straightforward and can be achieved in just a few lines of code.

Here’s an example:

import pandas as pd
tuple_list = [('A', 1), ('B', 2), ('C', 3)]
df = pd.DataFrame(tuple_list)
print(df)

Executing this code will create a DataFrame with the following structure:

0 1
A 1
B 2
C 3

Handling Data with Column Names

When converting a list of tuples to a DataFrame, it’s often useful to include column names to make the data more readable and understandable. To do this, you can add the columns parameter when calling the DataFrame() constructor.

Here’s an example:

import pandas as pd
tuple_list = [('A', 1), ('B', 2), ('C', 3)]
column_names = ['Letter', 'Number']
df = pd.DataFrame(tuple_list, columns=column_names)
print(df)

With the column names specified, the resulting DataFrame will look like this:

Letter Number
A 1
B 2
C 3

By using the DataFrame constructor and handling data with column names, you can easily convert a list of tuples into a DataFrame that is more organized and easier to understand. Keep working with these techniques, and soon enough, you’ll be a master of DataFrames! ????

Examples and Use Cases

When working with Python, one often encounters data stored in lists of tuples. These data structures are lightweight and easy to use, but sometimes, it’s beneficial to convert them into a more structured format, such as a DataFrame ????. In this section, we will explore some examples and use cases for converting a list of tuples into a DataFrame in Python, using the pandas library.

Here’s a simple example that demonstrates how to create a DataFrame from a list of tuples:

import pandas as pd

data = [('Peter', 18, 7), ('Riff', 15, 6), ('John', 17, 8), ('Michel', 18, 7), ('Sheli', 17, 5)]
df = pd.DataFrame(data, columns=['Name', 'Age', 'Score'])

In this example, we have a list of tuples representing student data, with each tuple containing a name, age, and score. By passing this list to the DataFrame constructor along with the column names, we can easily convert it into a DataFrame ????.

Consider another use case, where we need to filter and manipulate data before converting it into a DataFrame. For instance, let’s imagine we have a list of sales data, with each tuple representing an item, its price, and the number of sales:

data = [('Item A', 35, 20), ('Item B', 45, 15), ('Item C', 50, 30), ('Item D', 25, 10)]

In this case, we can use list comprehensions to filter items with sales greater than 20 and update the price by applying a 10% discount:

filtered_data = [(item, price * 0.9, sales) for item, price, sales in data if sales > 20]
df = pd.DataFrame(filtered_data, columns=['Item', 'Discounted Price', 'Sales'])

Now, our DataFrame contains only the filtered items with the discounted prices ????.

Python List of Named Tuples to DataFrame

Converting a list of named tuples to a DataFrame in Python can be done efficiently using the pandas library’s default functions as well.

???? Info: A named tuple is a subclass of a tuple, which allows you to access elements by name, making it highly readable and practical for data manipulation. ????

First, create a list of named tuples using Python’s built-in collections module.

Let’s assume we have a list of students with their names, ages, and test scores:

from collections import namedtuple

Student = namedtuple('Student', ['name', 'age', 'score'])
students = [
    Student('Alice', 23, 89),
    Student('Bob', 22, 92),
    Student('Charlie', 24, 85)
]

With the list of named tuples prepared, proceed to import the pandas library and use the pd.DataFrame() method to convert the list to a DataFrame:

import pandas as pd

dataframe = pd.DataFrame(students, columns=Student._fields)

This process creates a DataFrame with columns corresponding to the named tuple fields. The final result appears as follows:

      name  age  score
0    Alice   23     89
1      Bob   22     92
2  Charlie   24     85

In summary, simply define the list with the named tuple structure, and then call the pd.DataFrame() method to create the DataFrame.

Create a List of Tuples From a DataFrame

When working with data in Python, you may need to convert a DataFrame back into a list of tuples.

To begin, import the library in your Python code using import pandas as pd.

Now, let’s say you have a DataFrame, and you want to extract its data as a list of tuples. The simplest approach is to use the itertuples() function, which is a built-in method in Pandas (source).

To use this method, call the itertuples() function on the DataFrame object, and then pass the output to the list() function to convert it into a list:

python import pandas as pd

# Sample DataFrame data = {'Name': ['John', 'Alice', 'Tim'],
                           'Age': [28, 22, 27]}
df = pd.DataFrame(data)

# Convert DataFrame to list of tuples 
list_of_tuples = list(df.itertuples(index=False, name=None))
print(list_of_tuples) 

This code will output:

 [('John', 28), ('Alice', 22), ('Tim', 27)] 

The itertuples() method has two optional parameters: index and name. Setting index=False excludes the DataFrame index from the tuples, and setting name=None returns regular tuples instead of named tuples.


So there you go! You now know how to convert a DataFrame into a list of tuples using the Pandas library in Python ????. To keep learning and improving your Python skills, feel free to download our cheat sheets and visit the recommended Pandas tutorial:

⭐ Recommended: 10 Minutes to Pandas (in 5 Minutes)

Be on the Right Side of Change

Python for Beginners: Pandas Where Method With Series and DataFrame

While working with pandas dataframe, we often filter data using different conditions. In this article, we will discuss how we can use the pandas where method to filter and replace data from a series or dataframe.

The Pandas where() Method

We use the pandas where() method to replace a value based on a condition. The where() method has the following syntax.

DataFrame.where(cond, other=_NoDefault.no_default, *, inplace=False, axis=None, level=None)

Here, 

  • The cond parameter takes a condition or multiple conditional statements as input arguments. The conditional statements must evaluate to a series of True and False values. If the cond parameter is True for a row, the data is preserved in that row. All the values are set to None for the rows where the cond parameter evaluates to False. 
  • The other parameter takes a function, series, dataframe, or scaler value as its input argument. All the entries where the cond parameter evaluates to False are replaced with the corresponding value from the other parameter. If we pass a function to the other parameter, it is computed on the DataFrame and should return scalar or Series/DataFrame. The function must not change the input DataFrame. If we don’t specify the other parameter, all the values are set to None for the rows where the cond parameter evaluates to False. 
  • By default, the where() method returns a new dataframe after execution. If you want to modify the existing dataframe using the where() method, you can set the inplace parameter to True. After this, the original dataframe will be modified to store the output.
  • The axis parameter is used to set the alignment axis if needed. For Series, the axis parameter is unused. For dataframes, it has a default value of 0.
  • The level parameter is used to set the alignment level if required.

Now, let us discuss how we can use the where() method with a series or a dataframe.

Pandas Where() Method With Series in Python

When we invoke the where() method on a pandas series, it takes a condition as its input argument. After execution, it returns a new series. In the output series, the values that fulfill the condition in the input argument and unchanged while the rest of the values are set to None. You can observe this in the following example.

import pandas as pd
series=pd.Series([1,23,12,423,4,53,231,234,1])
print("The input series is:")
print(series)
output=series.where(series>50)
print("The output series is:")
print(output)

Output:

The input series is:
0      1
1     23
2     12
3    423
4      4
5     53
6    231
7    234
8      1
dtype: int64
The output series is:
0      NaN
1      NaN
2      NaN
3    423.0
4      NaN
5     53.0
6    231.0
7    234.0
8      NaN
dtype: float64

In the above example, we passed the condition series>50 to the where() method. In the output series, you can observe that the where() method preserves the numbers greater than 50. On the other hand, values less than 50 are set to None.

Replace a Value Based on a Condition Using The where() Method

Instead of None, we can also set a replacement value for the values in the series that don’t fulfill the condition given in the input to the where() method. For this, we will pass the replacement value as the second input argument to the where() method. After execution, it returns a series in which the values that fulfill the condition remain unchanged while the other values are replaced using the replacement value. You can observe this in the following example.

import pandas as pd
series=pd.Series([1,23,12,423,4,53,231,234,1])
print("The input series is:")
print(series)
output=series.where(series>50,-1)
print("The output series is:")
print(output)

Output:

The input series is:
0      1
1     23
2     12
3    423
4      4
5     53
6    231
7    234
8      1
dtype: int64
The output series is:
0     -1
1     -1
2     -1
3    423
4     -1
5     53
6    231
7    234
8     -1
dtype: int64

In the above example, we have set the other parameter to -1. Hence, the numbers less than 50 are set to -1 in the output dataframe.

Replace a Value Using a Function Based on a Condition Using The where() Method

Instead of a value, we can also pass a function for replacing the values in the series using the where() method. For instance, consider the following example.

def myFun(x):
    return x**2
import pandas as pd
series=pd.Series([1,23,12,423,4,53,231,234,1])
print("The input series is:")
print(series)
output=series.where(series>50,other=myFun)
print("The output series is:")
print(output)

Output:

The input series is:
0      1
1     23
2     12
3    423
4      4
5     53
6    231
7    234
8      1
dtype: int64
The output series is:
0      1
1    529
2    144
3    423
4     16
5     53
6    231
7    234
8      1
dtype: int64

In the above code, we have defined a function myFun() that takes a number and returns its square. Then, we passed the function to the other parameter in the where() method. After this, the values less than 50 are first passed to the function myFun(). The where() method then gives the output of myFun() function in the output series in all the positions where the cond parameter is False.

Pandas Where Method With DataFrame

Instead of a series, we can also use the where() method on a dataframe. When we invoke the where() method on a dataframe, it takes a condition as its input argument. After execution, it returns a dataframe created from the input dataframe.

Here, the rows that fulfill the condition given as input to the where() method remain unchanged. All the other rows are filled with a None value. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df1=df.where(df["Maths"]>80)
print("The output dataframe is:")
print(df1)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The output dataframe is:
   Roll  Maths  Physics  Chemistry
0   1.0  100.0     80.0       90.0
1   NaN    NaN      NaN        NaN
2   3.0   90.0     80.0       70.0
3   4.0  100.0    100.0       90.0
4   5.0   90.0     90.0       80.0
5   NaN    NaN      NaN        NaN

Instead of the None value, we can also give a replacement value to the where() method as shown below.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df1=df.where(df["Maths"]>80,"LOW")
print("The output dataframe is:")
print(df1)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The output dataframe is:
  Roll Maths Physics Chemistry
0    1   100      80        90
1  LOW   LOW     LOW       LOW
2    3    90      80        70
3    4   100     100        90
4    5    90      90        80
5  LOW   LOW     LOW       LOW

In the above examples, you can observe that the where() method works in a similar manner it works with a series. The only difference is that the results are applied to the entire row instead of a single value.

Pandas where() Method With Multiple Conditions

We can also use multiple conditions in a single where method. For this, we will operate all the conditions with AND/OR logical operator. After the execution of each condition, the logical operations are performed and we get a mask containing True and False values. The mask is then used to create the output dataframe. You can observe this in the following example.

import pandas as pd
myDicts=[{"Roll":1,"Maths":100, "Physics":80, "Chemistry": 90},
        {"Roll":2,"Maths":80, "Physics":100, "Chemistry": 90},
        {"Roll":3,"Maths":90, "Physics":80, "Chemistry": 70},
        {"Roll":4,"Maths":100, "Physics":100, "Chemistry": 90},
        {"Roll":5,"Maths":90, "Physics":90, "Chemistry": 80},
        {"Roll":6,"Maths":80, "Physics":70, "Chemistry": 70}]
df=pd.DataFrame(myDicts)
print("The input dataframe is:")
print(df)
df1=df.where((df["Maths"]>80) & (df["Chemistry"]>80))
print("The output dataframe is:")
print(df1)

Output:

The input dataframe is:
   Roll  Maths  Physics  Chemistry
0     1    100       80         90
1     2     80      100         90
2     3     90       80         70
3     4    100      100         90
4     5     90       90         80
5     6     80       70         70
The output dataframe is:
   Roll  Maths  Physics  Chemistry
0   1.0  100.0     80.0       90.0
1   NaN    NaN      NaN        NaN
2   NaN    NaN      NaN        NaN
3   4.0  100.0    100.0       90.0
4   NaN    NaN      NaN        NaN
5   NaN    NaN      NaN        NaN

Conclusion

In this article, we discussed different ways to use the pandas where method with a series or dataframe in Python. To learn more about Python programming, you can read this article on how to read excel into pandas dataframe. You might also like this article on how to map functions to a pandas series in Python.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Pandas Where Method With Series and DataFrame appeared first on PythonForBeginners.com.

Planet Python

Understanding Python Functions: A Practical Overview

https://static1.makeuseofimages.com/wordpress/wp-content/uploads/2023/04/python-logo-with-scribbled-python-functions-3.jpg

As a programmer, you will often find yourself performing an action or task repeatedly. This can be tedious and time-consuming, especially when working with a large or complex code base. Automating them with functions is a more effective approach to performing such tasks. Functions allow you to write the code logic once and use it anywhere in your program.

What Is a Python Function?

In Python, a function is a block of code used to perform a specific task. You only need to write a function once, but you can use it multiple times in your code. A function can take in arguments as input and return output values. This simple program shows a function that calculates the sum of three numbers:

MAKEUSEOF VIDEO OF THE DAYSCROLL TO CONTINUE WITH CONTENT
 

def calculate_sum(a, b, c):
    return a+b+c

print(calculate_sum(1,2,3))
print(calculate_sum(1000, 300,44))
print(calculate_sum(12, 4,78))

In the program above, the function returns the sum of three arguments. When the function is called multiple times, it returns a different output for each case. A useful application for this function will be a calculator app.

Defining a Function in Python

Python has many built-in functions available for developers to use. However, these built-in functions are not always enough to meet the demands of most projects. To meet custom demands, you have to define your custom functions. Defining custom functions is common practice in programming.

In Python, you can define a custom function by using the def keyword followed by the name of your function with parenthesis in front of it. Here is an example:

 def function_name()

You should take note of these rules when assigning a function name in Python:

  • Function names should be in lowercase.
  • Function names should be descriptive.
  • Use underscores to separate words in a function name.

After defining the function, you must write the logic to perform your desired task. For example, this function calculates the area of a triangle:

 

def calculate_triangle_area(base, height):
    area = (base * height)/2
    return area

print(calculate_triangle_area(12, 3))

The function above defines two parameters: base and height, divides their product by two, and returns the result as the output. You can write whatever logic you want your function to perform.

Understanding Function Arguments

In previous examples, the functions have taken arguments to perform actions. The arguments in these examples are known as required or positional arguments. In Python, your arguments can be either of the following:

  • Positional arguments
  • Keyword arguments

Positional Arguments

Positional arguments need to be passed in the correct order of definition. For example, if you define a function with parameters a, b, and c, you must pass in values for these parameters accordingly when you call them. Let us examine a previous example:

 

def calculate_sum(a, b, c):
    return a+b+c

print(calculate_sum(1,2,3))
print(calculate_sum(1000, 300,44))
print(calculate_sum(12, 4,78))

In the above program, the calculate_sum() function takes three arguments whenever we call it. Each argument represents a corresponding parameter. In the first function call, numbers 1, 2, and 3 represent a, b, and c accordingly.

A parameter is declared in a function’s definition, while an argument is the value passed when you call the function This value is a representation of its corresponding parameter.

Positional arguments are compulsory. If you don’t add them, you will get a TypeError. The following example demonstrates this:

 def calculate_sum(a, b, c):
    return a+b+c

print(calculate_sum(1,2))

When you run the above program on your machine, you will get an error similar to the one in the image below:

Keyword Arguments

Keyword arguments do not necessarily need to be passed whenever you call a function. They are optional arguments and don’t need to follow a specific order. Python lets us use *args and **kwargs to specify keyword arguments.

Apart from using *args and **kwargs, it is also possible to specify default values for your arguments. Doing this will not get an error if you forget to add a value when calling the function. This example gives an illustration:

 def calculate_sum(a, b, c=3):
    return a+b+c

print(calculate_sum(1,2))

In the above program, when calculate_sum() is called, there is no argument for c; this will not affect the program because c already has a default value. You can specify default values for as many arguments as you want but ensure you do this wisely.

Use Functions to Organize Your Code

Functions are useful for organizing your code, making it more readable and efficient. With functions, you can break your code into smaller, reusable chunks that are easier to understand and maintain. Additionally, if you need to make changes to your code, you only need to modify the necessary function rather than the entire code base.

MakeUseOf

Customize Laravel pagination views

https://cdn.hibit.dev/images/social/2023/preview/laravel_pagination.png

Laravel is a powerful PHP framework that has gained popularity due to its simplicity and flexibility. One of the features that make it stand out is pagination. Pagination is a technique used to break large datasets into smaller, more manageable pieces, allowing users to navigate through them with ease. By default, Laravel provides a pagination system that works well for most use cases. However, sometimes you may need to customize it to meet specific requirements. In this post, we will explore how to customize pagination in Laravel.

Prerequisites

Laravel utilizes Composer to manage its dependencies. Take a look on how to install Composer on your computer. With composer on your machine we can install Laravel by running the following command in the terminal:

composer create-project --prefer-dist laravel/laravel hibit-pagination

This will install the latest version of Laravel and create a new project named hibit-pagination.

Generating the Pagination Views

Before we can customize our pagination views, we need to generate them using artisan command:

php artisan vendor:publish --tag=laravel-pagination

Laravel custom pagination artisan publish

This command will copy Laravel’s pagination views to the resources directory in our application:

Laravel custom pagination views

We need to modify the configuration to choose from the available options for the paginator. To do this, we need to open the app/Providers/AppServiceProvider.php file and specify in boot method the desired option.

Laravel custom pagination service provide

Once the paginator is selected, we can customize the blade view to match our application’s design. We can modify the resources/views/vendor/pagination/*.blade.php file to change the look and feel of our pagination links.

Conclusion

Customizing the blade and pagination styling in Laravel is a straightforward process that allows you to match the pagination links to your application’s design. By generating the pagination views, customizing the blade views, and customizing the pagination links, you can create pagination links that are unique to your application.

Laravel News Links