https://media.notthebee.com/articles/64b5552aa24e564b5552aa24e6.jpg
Many of you saw clips of Tucker Carlson interviewing GOP presidential candidates last Friday, but this one might have escaped your notice:
Not the Bee
Just another WordPress site
https://media.notthebee.com/articles/64b5552aa24e564b5552aa24e6.jpg
Many of you saw clips of Tucker Carlson interviewing GOP presidential candidates last Friday, but this one might have escaped your notice:
Not the Bee
https://developer.rockero.cz/images/og_image.pngWe have released a collection of technology standards and best practices about Laravel. Most of the tips come from well-known Laravel developers or directly from the way the Laravel framework is written. Visit our Wiki and take your knowledge a step further!Laravel News Links
https://i0.wp.com/toolguyd.com/blog/wp-content/uploads/2023/07/Festool-Pizza-Cutter.jpg?resize=600%2C529&ssl=1
Festool says that “cordless power meets ultimate pizza precision,” with this new elbow grease-powered pizza cutter.
Styled after their track saws, the Festool Pizz-TS features a food-safe ABS housing and 85mm 0.8mm stainless steel blade.
The blade slides out for cleaning.
Cute.
Price: $30 + $7.95 shipping
Not a fan of Festool? You can get a sidewinder circular saw-style pizza cutter for less than half the price.
You won’t be able to boast about your new Festool saw, but it should do the job just as well.
Price: $16-17
ToolGuyd
https://s3.us-east-1.amazonaws.com/images.gearjunkie.com/uploads/2023/07/B84I8567-1-2.jpg
I attended a Honda XR150L ride event expecting to be wholly unimpressed and bored by such a small, low-powered, and low-tech motorcycle. And I predicted a struggle in writing anything positive. I was wrong.
It is true that the Honda XR150L is an air-cooled, low-tech “beginner” dual sport bike. But for 20 years, these attributes have helped the unassuming little bike function in markets like Latin America and Thailand as a super-reliable transporter. And in cultures like these, “suggested maintenance schedules” are probably laughable. But the XR150L has soldiered on for decades and is now available as a 2023 model in the U.S.
Our group of motorcycle journalists and YouTubers rode the bikes hard for a little under 60 miles in beautiful Solvang, Calif. We rode a mixture of rough country blacktop, smooth highway tarmac, and both hardpacked and loose dirt. I had a smile on my face the whole time.
In short: The 2023 Honda XR150L fills many voids, thanks to its easy-to-ride nature and incredibly low MSRP of $2,971. It can hold its own in the garage of both new and experienced riders as a practical and reliable short-distance commuter that can also tackle easier off-road adventuring. All with proven reliability that is hard to beat.
Lowest-priced full-size dual-sport motorcycle
Very easy to ride
Incredible reliability history
Limited power and suspension
No rider-aid electronics
Bars can be too low for standing position for some
The Honda XR150L stat sheet reads like it was custom-made for new or beginner riders. Nothing on this bike should scare anyone nor impede any part of the learning experience.
First off, it has the lowest seat height in the class at 32.8 inches, which made getting my feet down relaxed and easy.
Next, the 149.2cc air-cooled, carbureted motor revs slowly and peaks out at 12.5 horsepower (according to the EPA). It was tame and tractable by any measure. I’ve found that new riders are usually scared of the bike getting away from them, especially off road. No such fears should exist on the Honda XR150L.
And though the motorcycle is low and small, it comes with a 19-inch diameter front wheel and a 17-inch rear. These allowed it to roll over obstacles easier than smaller wheels, which can be typical on small-displacement bikes.
Finally, the front dual-piston brake caliper clamps down on a 240mm rotor while a 110mm drum brake scrubs speed in the rear. For the bike’s size, weight, and power and for a beginner, I felt this is an appropriate setup.
The Honda XR150L was incredibly easy to ride. The electric start, low seat height, and gentle power delivery made for relaxed riding for anyone experienced, and I felt that beginners couldn’t ask for a better platform on which to learn. Nothing about the bike seemed inaccessible or unpredictable.
The one notable exception was the front brake. I appreciated that it had power, but it was touchy. The initial bite was aggressive compared to the rear brake by a large margin. This could have been due to the bike being new and the disc brake pads and rotor not being bedded in (a procedure that deposits brake pad material into the metal of the rotor, drastically improving braking). But I got used to it, and during our ride, the brake got bedded in.
The bike was obviously smaller than most motorcycles I ride, but the upright seated position was comfortable for my 6-foot frame for the entire day. The standing “attack” position while dirt biking was lacking, though. For my height and arm length, the bars were too low. The solution would be riser bars or taller bar clamps.
Also, the Honda XR150L was ludicrously quiet, which could be a huge plus in some off-road riding areas.
The 31mm conventional fork has 7.1 inches of travel, and a single rear shock services 5.9 inches of boing. These were fine for the roads, as long as I avoided the combination of large obstacles and high speed. For 95% of the on-road travel that the rest of the bike could handle, the suspension was adequate.
And for the new or beginner motorcyclist venturing into easy dirt riding, the suspension and motor were appropriate. Honda chose settings for comfort both on the road and off. The soft suspension and motor output were a match for the slower speeds of newer riders on unfamiliar terrain. For an experienced rider, both were limiters, but it wasn’t all bad.
While respecting the intended use of the Honda XR150L, I pushed the bike on the most challenging trail that was available. It was a chicken/egg situation; was I limited by the motor or the suspension? Either way, I had to downshift often and go full throttle to get up and over some obstacles. Or, I had to back off as I bottomed both fork and shock on occasion.
But you know what? I was having a blast. At one point in my career, all I tested was dirt bikes or adventure bikes and related gear. My skill or fitness was the limiter. Most of the time, the motorcycle could deliver more power and had more suspension than I could ever use. But with the Honda XR150L, it was the opposite.
I over-rode the bike. Instead of pulling short of the bike’s motor or chassis, the bike was making me ride within its limits. And man, it was so much fun. I was always relaxed and never had a thought about the bike bucking me off or throwing me down. I was forced to ride precisely and use excellent technique. A bike with more power and suspension allows me to let those slide.
Carrying momentum around turns was a must, as I didn’t have the power to just crawl through the turn and point and shoot. I had to take smart lines around obstacles because I didn’t have the suspension to charge through “dumb” lines. I had to brake and accelerate early, on the meat of the tire, and in the right line because I lacked the traction, braking ability, and power to do otherwise.
After I got home, I told a very experienced rider friend what I knew but haven’t practiced myself. Riding a bike with limited ability makes you a better rider. I vow to ride my two-stroke 250cc bike more than my four-stroke 501cc. And she did the same as she understood my logic. I want the same things I experienced on the Honda XR150L to make me a better rider.
As a side note, swapping the stock tires for DOT-approved knobby tires would make a world of difference on the dirt. And in a full aero tuck, I managed just shy of 70 mph with my 170-pound body, on a flat highway, in still conditions.
The Honda XR150L is ideal for new and beginner riders who want a reliable and easy-to-handle motorcycle that they can take on tamer off-road adventures. It would be stupendous for a college student who doesn’t have the money or time to care for a high-maintenance ride but still wants to be able to “go anywhere.”
For more experienced pilots, the little Honda XR150L can still hold a spot. It’s an affordable runabout that can zip to the grocery store or coffee shop, on a country dirt road, or a meandering rural blacktop. It would provide a relaxed, low-stress way of getting about without the hassles of fully gearing up to ride a much larger and more powerful bike.
And as I stated, taking it out and deliberately over-riding the XR150L was both fun and productive in terms of honing off-road riding skills.
The bottom line: You can get a brand new, super reliable, low-hassle short-distance transporter that can venture on easier off-road missions for less than $3,000 MSRP. This is the lowest-priced full-size dual-sport motorcycle on the U.S. market in 2023.
For new or beginner riders, or street riders new to dirt, the Honda XR150L is such a great option. It’s very difficult for me to think of another dual sport motorcycle that is so appropriate for new riders.
As a final word, my daughter is 10. She has seen motorcycles zipping around her entire life. If she were to ask for one in the near future, not knowing what kind of riding she likes, this is the motorcycle I would get for her. I can’t think of a better recommendation.
The post 2023 Honda XR150L Dual-Sport Review: The Little Motorcycle That Could, for Less Than $3,000 appeared first on GearJunkie.
GearJunkie
https://media.notthebee.com/articles/64aef6495302464aef64953025.jpg
I know some Christians who might be offended by ancient Israel’s methods of tearing down pagan temples and desecrating so-called holy places.
Not the Bee
https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/d3d9600fbaec0ff0fc2d4efff433034f.gif
In 1864, David Goodell revolutionized kitchen food prep with the invention of a hand-cranked device that could peel an apple in a matter of seconds. Fifty-nine years later, Chipotle is introducing the next innovation in automated fruit prep with Autocado: a robot that can halve, peel, and core a mountain of avocados with minimal human assistance.
Litter Robot 3 Review: How Much Would Tom Pay For It?
Chipotle, the US-based chain of “fast-casual” restaurants that also operates in countries like Canada, the UK, Germany, and France, actually refers to its new robot as a “cobotic prototype” because it’s designed to work in collaboration with kitchen staff in order to speed up the preparation of guacamole. Although mashing up and mixing other ingredients with avocados to make guacamole seems like a much easier task for a ‘cobot’ to handle, Autocado instead deals with the most time-consuming step of making guac: prepping the avocados.
You can watch the full video of the Autocado by clicking on the image below:
Developed as part of a collaboration with a company called Vebu that “works with food industry leaders to co-create intelligent automation and technology solutions,” the Autocado can be loaded with up to 25lbs. or ripe avocados (it has no means to determine which avocados are ready for prep) which the cobot slices in half and then removes the pit and peel, depositing the unwanted parts in a waste bin. The rest of the fruit is dropped into a giant stainless steel bowl which can be directly transferred to a counter and used to finish the final guacamole prep. The company showed off “chippy” last fall, a bot that helps make chips.
According to Chipotle, on average it takes about 50 minutes for kitchen staff at its restaurants to turn avocados into a batch of guacamole, including the peeling and coring steps. With Autocado, the process could potentially take half the amount of time, freeing up kitchen staff for other food prep tasks while still ensuring that customers are served freshly made guac. There doesn’t seem to be a definitive timeline for when Autocado will be introduced at Chipotle locations across the country, the robot appears to still be in the prototype stage. But if it proves to be successful, here’s to hoping the technology can also be miniaturized for home use.
Gizmodo
https://static1.makeuseofimages.com/wordpress/wp-content/uploads/2023/07/boliviainteligente-dcvqmhruihy-unsplash-1-1.jpg
Providing GPT technology in a powerful and easy-to-use chatbot, ChatGPT has become the world’s most popular AI tool. Many people use ChatGPT to provide engaging conversations, answer queries, offer creative suggestions, and aid in coding and writing. However, ChatGPT is limited as you cannot store your data for long-term personal use, and its September 2021 knowledge data cutoff point.
As a workaround, we can use OpenAI’s API and LangChain to provide ChatGPT with custom data and updated info past 2021 to create a custom ChatGPT instance.
Feeding ChatGPT with custom data and providing updated information beyond its knowledge cutoff date provides several benefits over just using ChatGPT as usual. Here are a few of them:
Now that you understand the importance of providing custom data to ChatGPT, here’s a step-by-step on how to do so on your local computer.
Please note the following instructions are for a Windows 10 or Windows 11 machine.
To provide custom data to ChatGPT, you’ll need to install and download the latest Python3, Git, Microsoft C++, and the ChatGPT-retrieval script from GitHub. If you already have some of the software installed on your PC, make sure they are updated with the latest version to avoid any hiccups during the process.
Start by installing:
When installing Python3, make sure that you tick the Add python.exe to PATH option before clicking Install Now. This is important as it allows you to access Python in any directory on your computer.
When Installing Microsoft C++, you’ll want to install Microsoft Visual Studio Build Tools first. Once installed, you can tick the Desktop development with C++ option and click Install with all the optional tools automatically ticked on the right sidebar.
Now that you have installed the latest versions of Python3, Git, and Microsoft C++, you can download the Python script to easily query custom local data.
Download: ChatGPT-retrieval script (Free)
To download the script, click on Code, then select Download ZIP. This should download the Python script into your default or selected directory.
Once downloaded, we can now set up a local environment.
To set up the environment, you’ll need to open a terminal in the chatgpt-retrieval-main folder you downloaded. To do that, open chatgpt-retrieval-main folder, right-click, and select Open in Terminal.
Once the terminal is open, copy and paste this command:
pip install langchain openai chromadb tiktoken unstructured
This command uses Python’s package manager to create and manage the Python virtual environment needed.
After creating the virtual environment, we need to supply an OpenAI API key to access their services. We’ll first need to generate an API key from the OpenAI API keys site by clicking on Create new secret key, adding a name for the key, then hitting the Create secret key button.
You will be provided with a string of characters. This is your OpenAI API key. Copy it by clicking on the copy icon on the side of the API key. Keep note that this API key should be kept secret. Do not share it with others unless you really intend for them to use it with you.
Once copied, return to the chatgpt-retrieval-main folder and open constants with Notepad. Now replace the placeholder with your API key. Remember to save the file!
Now that you have successfully set up your virtual environment and added your OpenAI API key as an environment variable. You can now provide your custom data to ChatGPT.
To add custom data, place all your custom text data in the data folder within chatgpt-retrieval-main. The format of the text data may be in the form of a PDF, TXT, or DOC.
As you can see from the screenshot above, I’ve added a text file containing a made-up personal schedule, an article I wrote on AMD’s Instinct Accelerators, and a PDF document.
The Python script allows us to query data from the custom data we’ve added to the data folder and the internet. In other words, you will have access to the usual ChatGPT backend and all the data stored locally in the data folder.
To use the script, run the python chatgpt.py script and then add your question or query as the argument.
python chatgpt.py "YOUR QUESTION"
Make sure to put your questions in quotation marks.
To test if we have successfully fed ChatGPT our data, I’ll ask a personal question regarding the Personal Sched.txt file.
It worked! This means ChatGPT was able to read the Personal Sched.txt provided earlier. Now let’s see if we have successfully fed ChatGPT with information it does not know due to its knowledge cutoff date.
As you can see, it correctly described the AMD Instinct MI250x, which was released after ChatGPT -3’s knowledge cutoff date.
Although feeding GPT-3.5 with custom data opens more ways to apply and use the LLM, there are a few drawbacks and limitations.
Firstly, you need to provide all the data yourself. You can still access all the knowledge of GPT-3.5 until its knowledge cutoff date; however, you must provide all the extra data. This means if you want your local model to be knowledgeable of a certain subject on the internet that GPT-3.5 don’t already know, you’ll have to go to the internet and scrape the data yourself and save it as a text on the data folder of chatgpt-retrieval-main.
Another issue is that querying ChatGPT like this takes more time to load when compared to asking ChatGPT directly.
Lastly, the only model currently available is GPT-3.5 Turbo. So even if you have access to GPT-4, you won’t be able to use it to power your custom ChatGPT instance.
Providing custom data to ChatGPT is a powerful way to get more out of the model. Through this method, you can feed the model with any text data you want and prompt it just like regular ChatGPT, albeit with some limitations. However, this will change in the future as it becomes easier to integrate our data with the LLM, along with access to the latest GPT-4 model.
MakeUseOf
https://static1.makeuseofimages.com/wordpress/wp-content/uploads/2023/07/datase-schema-diagram.jpg
SQL (Structured Query Language) is a popular query language used in software development. It’s widely used with relational databases to process and store data.
SCROLL TO CONTINUE WITH CONTENT
As a data analyst or developer, learning SQL is a vital skill. You can structure and process data using various programming languages. While SQL is easy to learn, mastering it in real-life situations is challenging.
Here are six sites to learn and hone your SQL skills by playing games.
Want to have fun while learning SQL? Welcome to SQL Mystery. This website makes it fun to learn SQL while solving a murder mystery using SQL queries and commands. They have mysteries for beginners to introduce you to concepts in a fun way. They also have more challenging games for advanced learners to solve.
The games begin with a history and definition of SQL and its concepts. After a good understanding of the query language, you can start playing the games. This is great for beginners with no knowledge of SQL.
To win the game, you have to help a fictitious police department to solve a crime. The crime occurred in SQL City, and the killer is at large. To solve it, you must retrieve a crime report from the police department database.
You have to search for clues in the database by running the SQL commands on the interactive code editor. If you got the killer’s name right, you would have solved the murder mystery and won the game.
The SQLPD website takes you on an intriguing mission while teaching SQL. It’s an interactive site suitable for both beginners and advanced SQL learners. It has a simple interface with lots of activities that allow you to have fun while learning SQL.
As a learner, you get a mission brief detailing how to carry out the mission and submit the results. The site has several tabs, each with details that help you carry out the mission. The first tab is the brief section with details about the mission.
The second tab has the code editor which allows you to select the commands and find names in a database. The third tab displays the results from the code editor.
The fourth tab is a guide that reminds you of the SQL concepts that you need for the game. The fifth tab allows you to create a profile to enable the site to track your progress.
This website is a good resource for SQL beginners as it requires little or no coding knowledge. Zi Chong, together with other contributors, created this interactive e-book.
The interface helps learners understand SQL by running queries against real-life datasets. The website is not just a reference page for learning SQL. Learners get to learn by practice. Zi Chong introduces SQL commands and explains when and how to use them to query data.
As a learner, you can access an interactive coding editor. Using the editor, you can practice by running the commands and seeing the output. You get to query real-life datasets which is an important way to get real-life experience in SQL.
Not only is the website free to use, but it’s also free of ads. You’re not required to register or download anything to use the site and start learning.
Sqlzoo is an absolute gem if you want to learn SQL by practicing. It has interactive tutorials that help you learn from basic to advanced concepts.
You can learn in stages with interactive tutorials and real-life data. It also comes with an interactive code editor. You can access a database and a quiz to test your understanding of the subject. You then practice by manipulating the provided data and seeing your results.
If you fail a quiz, they provide the right answer and the chance to try again. When you pass, they explain how they arrived at the answer.
You have free access to all website tools and resources. Once you feel confident in SQL, they provide resources to practice on projects. You also don’t have to register or download any tools.
SQLBOLT has a series of interactive lessons and courses to help you master SQL quickly. You get to learn SQL while interacting with a real-life database.
First, you get a brief history of SQL and relational databases and their correlation. This background helps you understand the language and its origins better. You are then introduced to various parts of the SQL query and shown how to alter the schema using commands.
You can create database tables from scratch and manipulate existing ones. Each lesson introduces a new concept and includes an interactive course at the end of each lesson. Their interactive code editor allows you to practice with various datasets.
You learn at your own pace and experiment with different databases. If you are not a beginner, you can skip the basics and move to the more advanced concepts.
The Schemaverse website lets you play a game implemented within a PostgreSQL database. You use SQL commands to command a fleet of spaceships.
You get to compete against other players, and the winner gets a trophy making it fun and engaging. You also have the option to use AI to command the fleet for you.
Schemaverse has a simple platform with several games for beginners and advanced learners. First, you must register and log in to play their games. On the site, you will have access to several games and other resources that guide you on how to get started.
The site has a tutorial page that details its background and how to use its resources. You will learn about the interface and how to navigate the planets and ships using SQL.
On Schemaverse, you learn SQL, PostgreSQL security details and have fun simultaneously. You can use any programming language that supports PostgreSQL to play the game.
They have 11 code samples that guide them on how to use code to execute the games. Schemaverse is open-source, so you can contribute to the game using their GitHub page.
SQL is a widely used technology in software development. Transactional and analytical applications use SQL databases to store and process data effectively.
The fastest-growing industries, like social media, finance, and data processing, use SQL databases. It’s, therefore, an essential skill to have in modern software development. You can begin with the games in this article and move on to advanced resources.
MakeUseOf
https://media.babylonbee.com/articles/64ad7d6a5af0064ad7d6a5af01.jpg
WATERTOWN, SD — As mainstream media reports circulated that maintaining physical fitness is a sign of far-right extremism, a spokesperson for the Federal Bureau of Investigation disclosed that the agency already has a surveillance team closely monitoring what is believed to be a local fascist training facility.
“This place just cranks out dangerous right-wing nutjobs,” said one source within the FBI who requested to remain anonymous. “It may be located in a shopping mall, and the people coming in and out look like normal, law-abiding citizens…but with how physically fit they all appear to be, there is every indication that this place is a breeding ground for domestic terrorists.”
One man surveilled entering the alleged training facility was later reached for comment. “What are you talking about? I’m just working out?” said a confused Scott Poppen. “I’ve been coming here to exercise for years. It’s no big deal. Why is there a black SUV that seems like it follows me to and from the gym? Who are those guys in suits and sunglasses that watch me when I’m in the locker room? What is this about?”
As Mr. Poppen finished speaking, a tactical team of FBI agents emerged from nearby bushes outside the building, subdued him with a taser, and hauled him away with zip ties on his wrists and ankles.
“We urge Americans to be on the alert,” said FBI Director Christopher Wray. “We were very disturbed to find there are thousands of these ‘fitness centers’ across the country. Americans can rest assured that we have reallocated all our resources to monitoring and guarding against this dire threat.”
At publishing time, surveillance at the location was ongoing, with FBI sources warning that many patrons of the alleged training facility had also attended 4th of July parties where fireworks were present.
General Florg of the planet Graxon V has visited Earth – but he’s having trouble understanding humans as he’s never encountered a species with so many genders.
Subscribe to our YouTube channel for more tactical instruction
Babylon Bee
The csv file format is one of the most used file formats to store tabular data. In this article, we will discuss different ways to read a csv file in PySpark.
To read a csv file to create a pyspark dataframe, we can use the DataFrame.csv()
method. The csv()
method takes the filename of the csv file and returns a pyspark dataframe as shown below.Â
import pyspark.sql as ps
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("sample_csv_file.csv")
print("The input csv file is:")
dfs.show()
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| _c0| _c1| _c2| _c3|
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
| Aditya| 45| 89| 1 |
| Chris| 86| 85| 2|
| Joel| null| 85| 3 |
|Katrina| 49| 47| 4|
| Agatha| 76| 89| 5|
| Sam| 76| 98| 6|
+-------+-----+-------+---------+
In the above example, we have used the following CSV file.
In the output, you can observe that the column names are given as _c0
,_c1
, _c2
. However, the actual column names should be Name
, Maths
, Physics
, and Chemistry
. Hence, we need to find a way to read the csv with its column names.
To read a csv file with column names, you can use the header
parameter in the csv()
method. When we set the header
parameter to True in the csv()
method, the first row of the csv file is treated as the column names. You can observe this in the following example.
import pyspark.sql as ps
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("sample_csv_file.csv",header=True)
print("The input csv file is:")
dfs.show()
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
+-------+-----+-------+---------+
| Aditya| 45| 89| 1 |
| Chris| 86| 85| 2|
| Joel| null| 85| 3 |
|Katrina| 49| 47| 4|
| Agatha| 76| 89| 5|
| Sam| 76| 98| 6|
+-------+-----+-------+---------+
In this example, we have set the header
parameter to True in the csv()
method. Hence, the first line of the csv file is read as column names.
By default, the csv()
method reads all the values as strings. For example, if we print the data types using the dtypes attribute of the pyspark dataframe, you can observe that all the column names have string data types.Â
import pyspark.sql as ps
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("sample_csv_file.csv",header=True)
print("The input csv file is:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
+-------+-----+-------+---------+
| Aditya| 45| 89| 1 |
| Chris| 86| 85| 2|
| Joel| null| 85| 3 |
|Katrina| 49| 47| 4|
| Agatha| 76| 89| 5|
| Sam| 76| 98| 6|
+-------+-----+-------+---------+
The data type of columns is:
[('Name', 'string'), ('Maths', 'string'), ('Physics', 'string'), ('Chemistry', 'string')]
In the above output, you can observe that all the columns have string data types irrespective of the values in the columns.
To read a csv file with correct data types for columns, we can use the inferSchema
parameter in the csv()
method. When we set the inferSchema
parameter to True, the program scans all the values in the dataframe and assigns the best data type to each column. You can observe this in the following example.
import pyspark.sql as ps
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("sample_csv_file.csv",header=True,inferSchema=True)
print("The input csv file is:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
+-------+-----+-------+---------+
| Aditya| 45| 89| 1.0|
| Chris| 86| 85| 2.0|
| Joel| null| 85| 3.0|
|Katrina| 49| 47| 4.0|
| Agatha| 76| 89| 5.0|
| Sam| 76| 98| 6.0|
+-------+-----+-------+---------+
The data type of columns is:
[('Name', 'string'), ('Maths', 'int'), ('Physics', 'int'), ('Chemistry', 'double')]
In this example, we have set the inferSchema
parameter to True. Hence, the columns are given proper data types.
Using the inferSchema
parameter to decide the data type for columns in a pyspark dataframe is a costly operation. When we set the inferSchema
parameter to True, the program needs to scan all the values in the csv file. After scanning all the values in a given column, the data type for the particular column is decided. For large datasets, this can be a costly operation. This is why setting the inferSchema
parameter to True is a costly operation and it isn’t recommended for large datasets.
Instead of using the inferSchema
parameter, we can read csv files with specified schemas.Â
A schema contains the column names, their data types, and a boolean value nullable to specify if a particular column can contain null values or not.Â
To define the schema for a pyspark dataframe, we use the StructType()
function and the StructField()
function.Â
The StructField()
function is used to define the name and data type of a particular column. It takes the column name as its first input argument and the data type of the column as its second input argument. To specify the data type of the column names, we use the StringType()
, IntegerType()
, FloatType()
, DoubleType()
, and other functions defined in the pyspark.sql.types
module.Â
In the third input argument to the StructField()
function, we pass True or False specifying if the column can contain null values or not. If we set the third parameter to True, the column will allow null values. Otherwise, it will not.
The StructType()
function is used to create the schema for the pyspark dataframe. It takes a list of StructField
objects as its input argument and returns a StructType
object that we can use as a schema.
To read a csv file with schema using pyspark, we will use the following steps.
StructField()
function.StructField
objects to the StructType()
function to create a schema. StructType
object to the schema
parameter in the csv()
function while reading the csv file.By executing the above steps, we can read a csv file in pyspark with a given schema. You can observe this in the following example.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
list_of_cols=[StructField("Name",StringType(),True),
StructField("Maths",IntegerType(),True),
StructField("Physics",IntegerType(),True),
StructField("Chemistry",IntegerType(),True)]
schema=StructType(list_of_cols)
print("The schema is:")
print(schema)
spark.sparkContext.stop()
Output:
The schema is:
StructType([StructField('Name', StringType(), True), StructField('Maths', IntegerType(), True), StructField('Physics', IntegerType(), True), StructField('Chemistry', IntegerType(), True)])
In the above code, we have defined the schema for the csv file using the StructField()
function and the StructType()
function.
After defining the schema, you can pass it to the csv()
method to read the csv file with a proper data type for each column as shown in the following example.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
list_of_cols=[StructField("Name",StringType(),True),
StructField("Maths",IntegerType(),True),
StructField("Physics",IntegerType(),True),
StructField("Chemistry",IntegerType(),True)]
schema=StructType(list_of_cols)
dfs=spark.read.csv("sample_csv_file.csv",header=True,schema=schema)
print("The input csv file is:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
+-------+-----+-------+---------+
| Aditya| 45| 89| null|
| Chris| 86| 85| 2|
| Joel| null| 85| null|
|Katrina| 49| 47| 4|
| Agatha| 76| 89| 5|
| Sam| 76| 98| 6|
+-------+-----+-------+---------+
The data type of columns is:
[('Name', 'string'), ('Maths', 'int'), ('Physics', 'int'), ('Chemistry', 'int')]
In the above example, we have read a csv using schema. Observe that the values in a column that cannot be converted to the given data type in the schema are replaced with null values.
The csv files need not contain the comma character as its delimiter. They might also contain characters like tabs, spaces, colons (:), semi-colons (;), pipe characters (|), etc as delimiters. For example, let us take the following file that uses the pipe character as the delimiter.
To read a csv file in pyspark with a given delimiter, you can use the sep parameter in the csv()
method. The csv()
method takes the delimiter as an input argument to the sep parameter and returns the pyspark dataframe as shown below.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("demo_file.csv",header=True,inferSchema=True, sep="|")
print("The input csv file is:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv file is:
+------+----+----------+-----+
| Name|Roll| Language|Extra|
+------+----+----------+-----+
|Aditya| 1| Python| 11|
| Sam| 2| Java| 12|
| Chris| 3| C++| 13|
| Joel| 4|TypeScript| 14|
+------+----+----------+-----+
The data type of columns is:
[('Name', 'string'), ('Roll', 'int'), ('Language', 'string'), ('Extra', 'int')]
In the above example, the csv file contains the | character as its delimiter. To read the file, we have passed the | character to the sep parameter as input in the csv()
method.
To read multiple csv files into a pyspark dataframe at once, you can pass the list of filenames to the csv()
method as its first input argument. After execution, the csv()
method will return the pyspark dataframe with data from all files as shown below.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv(["demo_file.csv","demo_file2.csv"],header=True,inferSchema=True, sep="|")
print("The input csv files are:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv files are:
+------+----+----------+-----+
| Name|Roll| Language|Extra|
+------+----+----------+-----+
|Aditya| 1| Python| 11|
| Sam| 2| Java| 12|
| Chris| 3| C++| 13|
| Joel| 4|TypeScript| 14|
|George| 12| C#| 15|
| Sean| 13| SQL| 16|
| Joe| 14| PHP| 17|
| Sam| 15|JavaScript| 18|
+------+----+----------+-----+
The data type of columns is:
[('Name', 'string'), ('Roll', 'int'), ('Language', 'string'), ('Extra', 'int')]
In the above example, we have used the following files.
In the output, you can observe that the contents of the files are stacked horizontally in the order they are passed in the csv()
function.
If the files that we pass to the csv()
method have the same number of columns but different column names, the output dataframe will contain the column names of the first csv file. The data in the columns are stacked by their positions to create the output dataframe. You can observe this in the following example.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv(["demo_file.csv","demo_file2.csv"],header=True,inferSchema=True, sep="|")
print("The input csv files are:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv files are:
23/07/09 04:54:17 WARN CSVHeaderChecker: CSV header does not conform to the schema.
Header: Name, Roll, Language, Extra
Schema: Name, Roll, Language, Ratings
Expected: Ratings but found: Extra
CSV file: file:///home/aditya1117/codes/demo_file2.csv
+------+----+----------+-------+
| Name|Roll| Language|Ratings|
+------+----+----------+-------+
|Aditya| 1| Python| 11|
| Sam| 2| Java| 12|
| Chris| 3| C++| 13|
| Joel| 4|TypeScript| 14|
|George| 12| C#| 15|
| Sean| 13| SQL| 16|
| Joe| 14| PHP| 17|
| Sam| 15|JavaScript| 18|
+------+----+----------+-------+
The data type of columns is:
[('Name', 'string'), ('Roll', 'string'), ('Language', 'string'), ('Ratings', 'string')]
In the above example, the first csv file has the column names Name
, Roll
, Language
, and Ratings
. The second csv file has Extra as the last column instead of Ratings.
In the output, you can observe that the column names of the first csv files are selected as schema. Hence, the csv()
function prints a warning when it encounters a different column name.
If the input files contain a different number of columns, the column names in the schema of the output dataframe are selected from the CSV file with more columns. Here, the rows from the csv file with lesser columns are filled with null values in the extra columns.Â
To understand this, let us add an extra column to the demo_file.csv
. The updated file is as follows.
Now, let us read both files into a pyspark dataframe using the csv()
function.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv(["demo_file2.csv","demo_file.csv"],header=True,inferSchema=True, sep="|")
print("The input csv files are:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv files are:
23/07/09 04:57:08 WARN CSVHeaderChecker: Number of column in CSV header is not equal to number of fields in the schema:
Header length: 4, schema size: 5
CSV file: file:///home/aditya1117/codes/demo_file2.csv
+------+----+----------+-------+-----+
| Name|Roll| Language|Ratings|Grade|
+------+----+----------+-------+-----+
|Aditya| 1| Python| 11| A|
| Sam| 2| Java| 12| A|
| Chris| 3| C++| 13| A+|
| Joel| 4|TypeScript| 14| A+|
|George| 12| C#| 15| null|
| Sean| 13| SQL| 16| null|
| Joe| 14| PHP| 17| null|
| Sam| 15|JavaScript| 18| null|
+------+----+----------+-------+-----+
The data type of columns is:
[('Name', 'string'), ('Roll', 'string'), ('Language', 'string'), ('Ratings', 'string'), ('Grade', 'string')]
In the above code, the demo_file.csv
contains 4 columns. Hence, the column names given in demo_file.csv
are selected for the schema despite the fact that we have passed it as the second file to the csv()
function. You can also observe that the output pyspark data frame contains the data from the demo_file.csv
on the top of the dataframe as the schema is selected from this file.
In this article, we have discussed different ways to read a CSV file in Pyspark. To learn more about pyspark, you can read this article on pyspark vs pandas. You might also like this article on how to create an empty pyspark dataframe.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
The post PySpark Read CSV File With Examples appeared first on PythonForBeginners.com.
Planet Python