Python BeautifulSoup Examples
https://ift.tt/2UWwaPZ
Introduction
In this tutorial, we will explore numerous examples of using the BeautifulSoup library in Python. For a better understanding let us follow a few guidelines/steps that will help us to simplify things and produce an efficient code. Please have a look at the framework/steps that we are going to follow in all the examples mentioned below:
- Inspect the HTML and CSS code behind the website/webpage.
- Import the necessary libraries.
- Create a User Agent (Optional).
- Send
get()
request and fetch the webpage contents. - Check the Status Code after receiving the response.
- Create a Beautiful Soup Object and define the parser.
- Implement your logic.
❖ Disclaimer: This article considers that you have gone through the basic concepts of web scraping. The sole purpose of this article is to list and demonstrate examples of web scraping. The examples mentioned have been created only for educational purposes. In case you want to learn the basic concepts before diving into the examples, please follow the tutorial at this link.
Without further delay let us dive into the examples. Let the games begin!
Example 1: Scraping An Example Webpage
Let’s begin with a simple example where we are going to extract data from a given table in a webpage. The webpage from which we are going to extract the data has been mentioned below:
The code to scrape the data from the table in the above webpage has been given below.
# 1. Import the necessary LIBRARIES import requests from bs4 import BeautifulSoup # 2. Create a User Agent (Optional) headers = {"User-Agent": "Mozilla/5.0 (Linux; U; Android 4.2.2; he-il; NEO-X5-116A Build/JDQ39) AppleWebKit/534.30 (" "KHTML, like Gecko) Version/4.0 Safari/534.30"} # 3. Send get() Request and fetch the webpage contents response = requests.get("https://shubhamsayon.github.io/python/demo_html.html", headers=headers) webpage = response.content # 4. Check Status Code (Optional) # print(response.status_code) # 5. Create a Beautiful Soup Object soup = BeautifulSoup(webpage, "html.parser") # 6. Implement the Logic. for tr in soup.find_all('tr'): topic = "TOPIC: " url = "URL: " values = [data for data in tr.find_all('td')] for value in values: print(topic, value.text) topic = url print()
Output:
TOPIC: __str__ vs __repr__ In Python URL: https://blog.finxter.com/python-__str__-vs-__repr__/ TOPIC: How to Read a File Line-By-Line and Store Into a List? URL: https://blog.finxter.com/how-to-read-a-file-line-by-line-and-store-into-a-list/ TOPIC: How To Convert a String To a List In Python? URL: https://blog.finxter.com/how-to-convert-a-string-to-a-list-in-python/ TOPIC: How To Iterate Through Two Lists In Parallel? URL: https://blog.finxter.com/how-to-iterate-through-two-lists-in-parallel/ TOPIC: Python Scoping Rules – A Simple Illustrated Guide URL: https://blog.finxter.com/python-scoping-rules-a-simple-illustrated-guide/ TOPIC: Flatten A List Of Lists In Python URL: https://blog.finxter.com/flatten-a-list-of-lists-in-python/
Video Walkthrough of The Above Code:
Example 2: Scraping Data From The Finxter Leaderboard
This example shows how we can easily scrape data from the Finxter dashboard which lists the elos/points. The image given below depicts the data that we are going to extract from https://app.finxter.com.
The code to scrape the data from the table in the above webpage has been given below.
# import the required libraries import requests from bs4 import BeautifulSoup # create User-Agent (optional) headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"} # get() Request response = requests.get("https://app.finxter.com/learn/computer/science/", headers=headers) # Store the webpage contents webpage = response.content # Check Status Code (Optional) print(response.status_code) # Create a BeautifulSoup object out of the webpage content soup = BeautifulSoup(webpage, "html.parser") # The logic for table in soup.find_all('table',class_='w3-table-all',limit=1): for tr in table.find_all('tr'): name = "USERNAME: " elo = "ELO: " rank = "RANK: " for td in tr.find_all('td'): print(name,td.text.strip()) name = elo elo = rank print()
Output: Please download the file given below to view the extracted data as a result of executing the above code.
Example 3: Scraping The Free Python Job Board
Data scraping can prove to be extremely handy while automating searches on Job websites. The example given below is a complete walkthrough of how you can scrape data from job websites. The image given below depicts the website whose data we shall be scraping.
In the code given below, we will try and extract the job title, location, and company name for each job that has been listed. Please feel free to run the code on your system and visualize the output.
import requests from bs4 import BeautifulSoup # create User-Agent (optional) headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"} # get() Request response = requests.get("http://pythonjobs.github.io/", headers=headers) # Store the webpage contents webpage = response.content # Check Status Code (Optional) # print(response.status_code) # Create a BeautifulSoup object out of the webpage content soup = BeautifulSoup(webpage, "html.parser") # The logic for job in soup.find_all('section', class_='job_list'): title = [a for a in job.find_all('h1')] for n, tag in enumerate(job.find_all('div', class_='job')): company_element = [x for x in tag.find_all('span', class_='info')] print("Job Title: ", title[n].text.strip()) print("Location: ", company_element[0].text.strip()) print("Company: ", company_element[3].text.strip()) print()
Output:
Job Title: Software Engineer (Data Operations) Location: Sydney, Australia / Remote Company: Autumn Compass Job Title: Developer / Engineer Location: Maryland / DC Metro Area Company: National Institutes of Health contracting company. Job Title: Senior Backend Developer (Python/Django) Location: Vienna, Austria Company: Bambus.io
Video Walkthrough Of Above Code:
Example 4: Scraping Data From An Online Book Store
Web scraping has a large scale usage when it comes to extracting information about products from shopping websites. In this example, we shall see how we can extract data about books/products from alibris.com.
The image given below depicts the webpage from which we are going to scrape data.
The code given below demonstrates how to extract:
- The name of each Book,
- The name of the Author,
- The price of each book.
# import the required libraries import requests from bs4 import BeautifulSoup # create User-Agent (optional) headers = {"User-Agent": "Mozilla/5.0 (Linux; U; Android 4.2.2; he-il; NEO-X5-116A Build/JDQ39) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30"} # get() Request response = requests.get( "https://www.alibris.com/search/books/subject/Fiction", headers=headers) # Store the webpage contents webpage = response.content # Check Status Code (Optional) # print(response.status_code) # Create a BeautifulSoup object out of the webpage content soup = BeautifulSoup(webpage, "html.parser") # The logic for parent in soup.find_all('ul',{'class':'primaryList'}): for n,tag in enumerate(parent.find_all('li')): title = [x for x in tag.find_all('p', class_='bookTitle')] author = [x for x in tag.find_all('p', class_='author')] price = [x for x in tag.find_all('a', class_='buy')] for item in title: print("Book: ",item.text.strip()) for item in author: author = item.text.split("\n") print("AUTHOR: ",author[2]) for item in price: if 'eBook' in item.text.strip(): print("eBook PRICE: ", item.text.strip()) else: print("PRICE: ", item.text.strip()) print()
Output: Please download the file given below to view the extracted data as a result of executing the above code.
Example 5: Scraping Using Relative Links
Until now we have seen examples where we scraped data directly from a webpage. Now, we will find out how we can extract data from websites that have hyperlinks. In this example, we shall extract data from https://codingbat.com/. Let us try and extract all the questions listed under the Python category in codingbat.com.
The demonstartion given below depicts a sample data that we are going to extract from the website.
Solution:
# 1. Import the necessary LIBRARIES import requests from bs4 import BeautifulSoup # 2. Create a User Agent (Optional) headers = {"User-Agent": "Mozilla/5.0 (Linux; U; Android 4.2.2; he-il; NEO-X5-116A Build/JDQ39) AppleWebKit/534.30 (" "KHTML, like Gecko) Version/4.0 Safari/534.30"} # 3. Send get() Request and fetch the webpage contents response = requests.get('http://codingbat.com/python', headers=headers) webpage = response.content # 4. Check Status Code (Optional) # print(response.status_code) # 5. Create a Beautiful Soup Object soup = BeautifulSoup(webpage, "html.parser") # The Logic url = 'https://codingbat.com' div = soup.find_all('div', class_='summ') links = [url + div.a['href'] for div in div] for link in links: #print(link) second_page = requests.get(link, headers={ "User-Agent": "Mozilla/5.0 (Linux; U; Android 4.2.2; he-il; NEO-X5-116A Build/JDQ39) AppleWebKit/534.30 (" "KHTML, like Gecko) Version/4.0 Safari/534.30"}) sub_soup = BeautifulSoup(second_page.content, 'html.parser') div = sub_soup.find('div', class_='tabc') question = [url + td.a['href'] for td in div.table.find_all('td')] for link in question: third_page = requests.get(link) third_soup = BeautifulSoup(third_page.content, 'html.parser') indent = third_soup.find('div', attrs={'class': 'indent'}) problem = indent.table.div.string siblings_of_statement = indent.table.div.next_siblings demo = [sibling for sibling in siblings_of_statement if sibling.string is not None] print(problem) for example in demo: print(example) print("\n")
Output: Please download the file given below to view the extracted data as a result of executing the above code.
Conclusion
I hope you enjoyed the examples discussed in the article. Please subscribe and stay tuned for more articles and video contents in the future!
Where to Go From Here?
Enough theory, let’s get some practice!
To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?
Practice projects is how you sharpen your saw in coding!
Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?
Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.
Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.
The post Python BeautifulSoup Examples first appeared on Finxter.
Python
via Finxter https://ift.tt/2HRc2LV
November 24, 2020 at 08:55PM