https://media.notthebee.com/articles/64aef6495302464aef64953025.jpg
I know some Christians who might be offended by ancient Israel’s methods of tearing down pagan temples and desecrating so-called holy places.
Not the Bee
Just another WordPress site
https://media.notthebee.com/articles/64aef6495302464aef64953025.jpg
I know some Christians who might be offended by ancient Israel’s methods of tearing down pagan temples and desecrating so-called holy places.
Not the Bee
https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/d3d9600fbaec0ff0fc2d4efff433034f.gif
In 1864, David Goodell revolutionized kitchen food prep with the invention of a hand-cranked device that could peel an apple in a matter of seconds. Fifty-nine years later, Chipotle is introducing the next innovation in automated fruit prep with Autocado: a robot that can halve, peel, and core a mountain of avocados with minimal human assistance.
Litter Robot 3 Review: How Much Would Tom Pay For It?
Chipotle, the US-based chain of “fast-casual” restaurants that also operates in countries like Canada, the UK, Germany, and France, actually refers to its new robot as a “cobotic prototype” because it’s designed to work in collaboration with kitchen staff in order to speed up the preparation of guacamole. Although mashing up and mixing other ingredients with avocados to make guacamole seems like a much easier task for a ‘cobot’ to handle, Autocado instead deals with the most time-consuming step of making guac: prepping the avocados.
You can watch the full video of the Autocado by clicking on the image below:
Developed as part of a collaboration with a company called Vebu that “works with food industry leaders to co-create intelligent automation and technology solutions,” the Autocado can be loaded with up to 25lbs. or ripe avocados (it has no means to determine which avocados are ready for prep) which the cobot slices in half and then removes the pit and peel, depositing the unwanted parts in a waste bin. The rest of the fruit is dropped into a giant stainless steel bowl which can be directly transferred to a counter and used to finish the final guacamole prep. The company showed off “chippy” last fall, a bot that helps make chips.
According to Chipotle, on average it takes about 50 minutes for kitchen staff at its restaurants to turn avocados into a batch of guacamole, including the peeling and coring steps. With Autocado, the process could potentially take half the amount of time, freeing up kitchen staff for other food prep tasks while still ensuring that customers are served freshly made guac. There doesn’t seem to be a definitive timeline for when Autocado will be introduced at Chipotle locations across the country, the robot appears to still be in the prototype stage. But if it proves to be successful, here’s to hoping the technology can also be miniaturized for home use.
Gizmodo
https://static1.makeuseofimages.com/wordpress/wp-content/uploads/2023/07/boliviainteligente-dcvqmhruihy-unsplash-1-1.jpg
Providing GPT technology in a powerful and easy-to-use chatbot, ChatGPT has become the world’s most popular AI tool. Many people use ChatGPT to provide engaging conversations, answer queries, offer creative suggestions, and aid in coding and writing. However, ChatGPT is limited as you cannot store your data for long-term personal use, and its September 2021 knowledge data cutoff point.
As a workaround, we can use OpenAI’s API and LangChain to provide ChatGPT with custom data and updated info past 2021 to create a custom ChatGPT instance.
Feeding ChatGPT with custom data and providing updated information beyond its knowledge cutoff date provides several benefits over just using ChatGPT as usual. Here are a few of them:
Now that you understand the importance of providing custom data to ChatGPT, here’s a step-by-step on how to do so on your local computer.
Please note the following instructions are for a Windows 10 or Windows 11 machine.
To provide custom data to ChatGPT, you’ll need to install and download the latest Python3, Git, Microsoft C++, and the ChatGPT-retrieval script from GitHub. If you already have some of the software installed on your PC, make sure they are updated with the latest version to avoid any hiccups during the process.
Start by installing:
When installing Python3, make sure that you tick the Add python.exe to PATH option before clicking Install Now. This is important as it allows you to access Python in any directory on your computer.
When Installing Microsoft C++, you’ll want to install Microsoft Visual Studio Build Tools first. Once installed, you can tick the Desktop development with C++ option and click Install with all the optional tools automatically ticked on the right sidebar.
Now that you have installed the latest versions of Python3, Git, and Microsoft C++, you can download the Python script to easily query custom local data.
Download: ChatGPT-retrieval script (Free)
To download the script, click on Code, then select Download ZIP. This should download the Python script into your default or selected directory.
Once downloaded, we can now set up a local environment.
To set up the environment, you’ll need to open a terminal in the chatgpt-retrieval-main folder you downloaded. To do that, open chatgpt-retrieval-main folder, right-click, and select Open in Terminal.
Once the terminal is open, copy and paste this command:
pip install langchain openai chromadb tiktoken unstructured
This command uses Python’s package manager to create and manage the Python virtual environment needed.
After creating the virtual environment, we need to supply an OpenAI API key to access their services. We’ll first need to generate an API key from the OpenAI API keys site by clicking on Create new secret key, adding a name for the key, then hitting the Create secret key button.
You will be provided with a string of characters. This is your OpenAI API key. Copy it by clicking on the copy icon on the side of the API key. Keep note that this API key should be kept secret. Do not share it with others unless you really intend for them to use it with you.
Once copied, return to the chatgpt-retrieval-main folder and open constants with Notepad. Now replace the placeholder with your API key. Remember to save the file!
Now that you have successfully set up your virtual environment and added your OpenAI API key as an environment variable. You can now provide your custom data to ChatGPT.
To add custom data, place all your custom text data in the data folder within chatgpt-retrieval-main. The format of the text data may be in the form of a PDF, TXT, or DOC.
As you can see from the screenshot above, I’ve added a text file containing a made-up personal schedule, an article I wrote on AMD’s Instinct Accelerators, and a PDF document.
The Python script allows us to query data from the custom data we’ve added to the data folder and the internet. In other words, you will have access to the usual ChatGPT backend and all the data stored locally in the data folder.
To use the script, run the python chatgpt.py script and then add your question or query as the argument.
python chatgpt.py "YOUR QUESTION"
Make sure to put your questions in quotation marks.
To test if we have successfully fed ChatGPT our data, I’ll ask a personal question regarding the Personal Sched.txt file.
It worked! This means ChatGPT was able to read the Personal Sched.txt provided earlier. Now let’s see if we have successfully fed ChatGPT with information it does not know due to its knowledge cutoff date.
As you can see, it correctly described the AMD Instinct MI250x, which was released after ChatGPT -3’s knowledge cutoff date.
Although feeding GPT-3.5 with custom data opens more ways to apply and use the LLM, there are a few drawbacks and limitations.
Firstly, you need to provide all the data yourself. You can still access all the knowledge of GPT-3.5 until its knowledge cutoff date; however, you must provide all the extra data. This means if you want your local model to be knowledgeable of a certain subject on the internet that GPT-3.5 don’t already know, you’ll have to go to the internet and scrape the data yourself and save it as a text on the data folder of chatgpt-retrieval-main.
Another issue is that querying ChatGPT like this takes more time to load when compared to asking ChatGPT directly.
Lastly, the only model currently available is GPT-3.5 Turbo. So even if you have access to GPT-4, you won’t be able to use it to power your custom ChatGPT instance.
Providing custom data to ChatGPT is a powerful way to get more out of the model. Through this method, you can feed the model with any text data you want and prompt it just like regular ChatGPT, albeit with some limitations. However, this will change in the future as it becomes easier to integrate our data with the LLM, along with access to the latest GPT-4 model.
MakeUseOf
https://static1.makeuseofimages.com/wordpress/wp-content/uploads/2023/07/datase-schema-diagram.jpg
SQL (Structured Query Language) is a popular query language used in software development. It’s widely used with relational databases to process and store data.
SCROLL TO CONTINUE WITH CONTENT
As a data analyst or developer, learning SQL is a vital skill. You can structure and process data using various programming languages. While SQL is easy to learn, mastering it in real-life situations is challenging.
Here are six sites to learn and hone your SQL skills by playing games.
Want to have fun while learning SQL? Welcome to SQL Mystery. This website makes it fun to learn SQL while solving a murder mystery using SQL queries and commands. They have mysteries for beginners to introduce you to concepts in a fun way. They also have more challenging games for advanced learners to solve.
The games begin with a history and definition of SQL and its concepts. After a good understanding of the query language, you can start playing the games. This is great for beginners with no knowledge of SQL.
To win the game, you have to help a fictitious police department to solve a crime. The crime occurred in SQL City, and the killer is at large. To solve it, you must retrieve a crime report from the police department database.
You have to search for clues in the database by running the SQL commands on the interactive code editor. If you got the killer’s name right, you would have solved the murder mystery and won the game.
The SQLPD website takes you on an intriguing mission while teaching SQL. It’s an interactive site suitable for both beginners and advanced SQL learners. It has a simple interface with lots of activities that allow you to have fun while learning SQL.
As a learner, you get a mission brief detailing how to carry out the mission and submit the results. The site has several tabs, each with details that help you carry out the mission. The first tab is the brief section with details about the mission.
The second tab has the code editor which allows you to select the commands and find names in a database. The third tab displays the results from the code editor.
The fourth tab is a guide that reminds you of the SQL concepts that you need for the game. The fifth tab allows you to create a profile to enable the site to track your progress.
This website is a good resource for SQL beginners as it requires little or no coding knowledge. Zi Chong, together with other contributors, created this interactive e-book.
The interface helps learners understand SQL by running queries against real-life datasets. The website is not just a reference page for learning SQL. Learners get to learn by practice. Zi Chong introduces SQL commands and explains when and how to use them to query data.
As a learner, you can access an interactive coding editor. Using the editor, you can practice by running the commands and seeing the output. You get to query real-life datasets which is an important way to get real-life experience in SQL.
Not only is the website free to use, but it’s also free of ads. You’re not required to register or download anything to use the site and start learning.
Sqlzoo is an absolute gem if you want to learn SQL by practicing. It has interactive tutorials that help you learn from basic to advanced concepts.
You can learn in stages with interactive tutorials and real-life data. It also comes with an interactive code editor. You can access a database and a quiz to test your understanding of the subject. You then practice by manipulating the provided data and seeing your results.
If you fail a quiz, they provide the right answer and the chance to try again. When you pass, they explain how they arrived at the answer.
You have free access to all website tools and resources. Once you feel confident in SQL, they provide resources to practice on projects. You also don’t have to register or download any tools.
SQLBOLT has a series of interactive lessons and courses to help you master SQL quickly. You get to learn SQL while interacting with a real-life database.
First, you get a brief history of SQL and relational databases and their correlation. This background helps you understand the language and its origins better. You are then introduced to various parts of the SQL query and shown how to alter the schema using commands.
You can create database tables from scratch and manipulate existing ones. Each lesson introduces a new concept and includes an interactive course at the end of each lesson. Their interactive code editor allows you to practice with various datasets.
You learn at your own pace and experiment with different databases. If you are not a beginner, you can skip the basics and move to the more advanced concepts.
The Schemaverse website lets you play a game implemented within a PostgreSQL database. You use SQL commands to command a fleet of spaceships.
You get to compete against other players, and the winner gets a trophy making it fun and engaging. You also have the option to use AI to command the fleet for you.
Schemaverse has a simple platform with several games for beginners and advanced learners. First, you must register and log in to play their games. On the site, you will have access to several games and other resources that guide you on how to get started.
The site has a tutorial page that details its background and how to use its resources. You will learn about the interface and how to navigate the planets and ships using SQL.
On Schemaverse, you learn SQL, PostgreSQL security details and have fun simultaneously. You can use any programming language that supports PostgreSQL to play the game.
They have 11 code samples that guide them on how to use code to execute the games. Schemaverse is open-source, so you can contribute to the game using their GitHub page.
SQL is a widely used technology in software development. Transactional and analytical applications use SQL databases to store and process data effectively.
The fastest-growing industries, like social media, finance, and data processing, use SQL databases. It’s, therefore, an essential skill to have in modern software development. You can begin with the games in this article and move on to advanced resources.
MakeUseOf
https://media.babylonbee.com/articles/64ad7d6a5af0064ad7d6a5af01.jpg
WATERTOWN, SD — As mainstream media reports circulated that maintaining physical fitness is a sign of far-right extremism, a spokesperson for the Federal Bureau of Investigation disclosed that the agency already has a surveillance team closely monitoring what is believed to be a local fascist training facility.
“This place just cranks out dangerous right-wing nutjobs,” said one source within the FBI who requested to remain anonymous. “It may be located in a shopping mall, and the people coming in and out look like normal, law-abiding citizens…but with how physically fit they all appear to be, there is every indication that this place is a breeding ground for domestic terrorists.”
One man surveilled entering the alleged training facility was later reached for comment. “What are you talking about? I’m just working out?” said a confused Scott Poppen. “I’ve been coming here to exercise for years. It’s no big deal. Why is there a black SUV that seems like it follows me to and from the gym? Who are those guys in suits and sunglasses that watch me when I’m in the locker room? What is this about?”
As Mr. Poppen finished speaking, a tactical team of FBI agents emerged from nearby bushes outside the building, subdued him with a taser, and hauled him away with zip ties on his wrists and ankles.
“We urge Americans to be on the alert,” said FBI Director Christopher Wray. “We were very disturbed to find there are thousands of these ‘fitness centers’ across the country. Americans can rest assured that we have reallocated all our resources to monitoring and guarding against this dire threat.”
At publishing time, surveillance at the location was ongoing, with FBI sources warning that many patrons of the alleged training facility had also attended 4th of July parties where fireworks were present.
General Florg of the planet Graxon V has visited Earth – but he’s having trouble understanding humans as he’s never encountered a species with so many genders.
Subscribe to our YouTube channel for more tactical instruction
Babylon Bee
The csv file format is one of the most used file formats to store tabular data. In this article, we will discuss different ways to read a csv file in PySpark.
To read a csv file to create a pyspark dataframe, we can use the DataFrame.csv()
method. The csv()
method takes the filename of the csv file and returns a pyspark dataframe as shown below.Â
import pyspark.sql as ps
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("sample_csv_file.csv")
print("The input csv file is:")
dfs.show()
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| _c0| _c1| _c2| _c3|
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
| Aditya| 45| 89| 1 |
| Chris| 86| 85| 2|
| Joel| null| 85| 3 |
|Katrina| 49| 47| 4|
| Agatha| 76| 89| 5|
| Sam| 76| 98| 6|
+-------+-----+-------+---------+
In the above example, we have used the following CSV file.
In the output, you can observe that the column names are given as _c0
,_c1
, _c2
. However, the actual column names should be Name
, Maths
, Physics
, and Chemistry
. Hence, we need to find a way to read the csv with its column names.
To read a csv file with column names, you can use the header
parameter in the csv()
method. When we set the header
parameter to True in the csv()
method, the first row of the csv file is treated as the column names. You can observe this in the following example.
import pyspark.sql as ps
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("sample_csv_file.csv",header=True)
print("The input csv file is:")
dfs.show()
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
+-------+-----+-------+---------+
| Aditya| 45| 89| 1 |
| Chris| 86| 85| 2|
| Joel| null| 85| 3 |
|Katrina| 49| 47| 4|
| Agatha| 76| 89| 5|
| Sam| 76| 98| 6|
+-------+-----+-------+---------+
In this example, we have set the header
parameter to True in the csv()
method. Hence, the first line of the csv file is read as column names.
By default, the csv()
method reads all the values as strings. For example, if we print the data types using the dtypes attribute of the pyspark dataframe, you can observe that all the column names have string data types.Â
import pyspark.sql as ps
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("sample_csv_file.csv",header=True)
print("The input csv file is:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
+-------+-----+-------+---------+
| Aditya| 45| 89| 1 |
| Chris| 86| 85| 2|
| Joel| null| 85| 3 |
|Katrina| 49| 47| 4|
| Agatha| 76| 89| 5|
| Sam| 76| 98| 6|
+-------+-----+-------+---------+
The data type of columns is:
[('Name', 'string'), ('Maths', 'string'), ('Physics', 'string'), ('Chemistry', 'string')]
In the above output, you can observe that all the columns have string data types irrespective of the values in the columns.
To read a csv file with correct data types for columns, we can use the inferSchema
parameter in the csv()
method. When we set the inferSchema
parameter to True, the program scans all the values in the dataframe and assigns the best data type to each column. You can observe this in the following example.
import pyspark.sql as ps
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("sample_csv_file.csv",header=True,inferSchema=True)
print("The input csv file is:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
+-------+-----+-------+---------+
| Aditya| 45| 89| 1.0|
| Chris| 86| 85| 2.0|
| Joel| null| 85| 3.0|
|Katrina| 49| 47| 4.0|
| Agatha| 76| 89| 5.0|
| Sam| 76| 98| 6.0|
+-------+-----+-------+---------+
The data type of columns is:
[('Name', 'string'), ('Maths', 'int'), ('Physics', 'int'), ('Chemistry', 'double')]
In this example, we have set the inferSchema
parameter to True. Hence, the columns are given proper data types.
Using the inferSchema
parameter to decide the data type for columns in a pyspark dataframe is a costly operation. When we set the inferSchema
parameter to True, the program needs to scan all the values in the csv file. After scanning all the values in a given column, the data type for the particular column is decided. For large datasets, this can be a costly operation. This is why setting the inferSchema
parameter to True is a costly operation and it isn’t recommended for large datasets.
Instead of using the inferSchema
parameter, we can read csv files with specified schemas.Â
A schema contains the column names, their data types, and a boolean value nullable to specify if a particular column can contain null values or not.Â
To define the schema for a pyspark dataframe, we use the StructType()
function and the StructField()
function.Â
The StructField()
function is used to define the name and data type of a particular column. It takes the column name as its first input argument and the data type of the column as its second input argument. To specify the data type of the column names, we use the StringType()
, IntegerType()
, FloatType()
, DoubleType()
, and other functions defined in the pyspark.sql.types
module.Â
In the third input argument to the StructField()
function, we pass True or False specifying if the column can contain null values or not. If we set the third parameter to True, the column will allow null values. Otherwise, it will not.
The StructType()
function is used to create the schema for the pyspark dataframe. It takes a list of StructField
objects as its input argument and returns a StructType
object that we can use as a schema.
To read a csv file with schema using pyspark, we will use the following steps.
StructField()
function.StructField
objects to the StructType()
function to create a schema. StructType
object to the schema
parameter in the csv()
function while reading the csv file.By executing the above steps, we can read a csv file in pyspark with a given schema. You can observe this in the following example.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
list_of_cols=[StructField("Name",StringType(),True),
StructField("Maths",IntegerType(),True),
StructField("Physics",IntegerType(),True),
StructField("Chemistry",IntegerType(),True)]
schema=StructType(list_of_cols)
print("The schema is:")
print(schema)
spark.sparkContext.stop()
Output:
The schema is:
StructType([StructField('Name', StringType(), True), StructField('Maths', IntegerType(), True), StructField('Physics', IntegerType(), True), StructField('Chemistry', IntegerType(), True)])
In the above code, we have defined the schema for the csv file using the StructField()
function and the StructType()
function.
After defining the schema, you can pass it to the csv()
method to read the csv file with a proper data type for each column as shown in the following example.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
list_of_cols=[StructField("Name",StringType(),True),
StructField("Maths",IntegerType(),True),
StructField("Physics",IntegerType(),True),
StructField("Chemistry",IntegerType(),True)]
schema=StructType(list_of_cols)
dfs=spark.read.csv("sample_csv_file.csv",header=True,schema=schema)
print("The input csv file is:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv file is:
+-------+-----+-------+---------+
| Name|Maths|Physics|Chemistry|
+-------+-----+-------+---------+
| Aditya| 45| 89| null|
| Chris| 86| 85| 2|
| Joel| null| 85| null|
|Katrina| 49| 47| 4|
| Agatha| 76| 89| 5|
| Sam| 76| 98| 6|
+-------+-----+-------+---------+
The data type of columns is:
[('Name', 'string'), ('Maths', 'int'), ('Physics', 'int'), ('Chemistry', 'int')]
In the above example, we have read a csv using schema. Observe that the values in a column that cannot be converted to the given data type in the schema are replaced with null values.
The csv files need not contain the comma character as its delimiter. They might also contain characters like tabs, spaces, colons (:), semi-colons (;), pipe characters (|), etc as delimiters. For example, let us take the following file that uses the pipe character as the delimiter.
To read a csv file in pyspark with a given delimiter, you can use the sep parameter in the csv()
method. The csv()
method takes the delimiter as an input argument to the sep parameter and returns the pyspark dataframe as shown below.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv("demo_file.csv",header=True,inferSchema=True, sep="|")
print("The input csv file is:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv file is:
+------+----+----------+-----+
| Name|Roll| Language|Extra|
+------+----+----------+-----+
|Aditya| 1| Python| 11|
| Sam| 2| Java| 12|
| Chris| 3| C++| 13|
| Joel| 4|TypeScript| 14|
+------+----+----------+-----+
The data type of columns is:
[('Name', 'string'), ('Roll', 'int'), ('Language', 'string'), ('Extra', 'int')]
In the above example, the csv file contains the | character as its delimiter. To read the file, we have passed the | character to the sep parameter as input in the csv()
method.
To read multiple csv files into a pyspark dataframe at once, you can pass the list of filenames to the csv()
method as its first input argument. After execution, the csv()
method will return the pyspark dataframe with data from all files as shown below.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv(["demo_file.csv","demo_file2.csv"],header=True,inferSchema=True, sep="|")
print("The input csv files are:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv files are:
+------+----+----------+-----+
| Name|Roll| Language|Extra|
+------+----+----------+-----+
|Aditya| 1| Python| 11|
| Sam| 2| Java| 12|
| Chris| 3| C++| 13|
| Joel| 4|TypeScript| 14|
|George| 12| C#| 15|
| Sean| 13| SQL| 16|
| Joe| 14| PHP| 17|
| Sam| 15|JavaScript| 18|
+------+----+----------+-----+
The data type of columns is:
[('Name', 'string'), ('Roll', 'int'), ('Language', 'string'), ('Extra', 'int')]
In the above example, we have used the following files.
In the output, you can observe that the contents of the files are stacked horizontally in the order they are passed in the csv()
function.
If the files that we pass to the csv()
method have the same number of columns but different column names, the output dataframe will contain the column names of the first csv file. The data in the columns are stacked by their positions to create the output dataframe. You can observe this in the following example.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv(["demo_file.csv","demo_file2.csv"],header=True,inferSchema=True, sep="|")
print("The input csv files are:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv files are:
23/07/09 04:54:17 WARN CSVHeaderChecker: CSV header does not conform to the schema.
Header: Name, Roll, Language, Extra
Schema: Name, Roll, Language, Ratings
Expected: Ratings but found: Extra
CSV file: file:///home/aditya1117/codes/demo_file2.csv
+------+----+----------+-------+
| Name|Roll| Language|Ratings|
+------+----+----------+-------+
|Aditya| 1| Python| 11|
| Sam| 2| Java| 12|
| Chris| 3| C++| 13|
| Joel| 4|TypeScript| 14|
|George| 12| C#| 15|
| Sean| 13| SQL| 16|
| Joe| 14| PHP| 17|
| Sam| 15|JavaScript| 18|
+------+----+----------+-------+
The data type of columns is:
[('Name', 'string'), ('Roll', 'string'), ('Language', 'string'), ('Ratings', 'string')]
In the above example, the first csv file has the column names Name
, Roll
, Language
, and Ratings
. The second csv file has Extra as the last column instead of Ratings.
In the output, you can observe that the column names of the first csv files are selected as schema. Hence, the csv()
function prints a warning when it encounters a different column name.
If the input files contain a different number of columns, the column names in the schema of the output dataframe are selected from the CSV file with more columns. Here, the rows from the csv file with lesser columns are filled with null values in the extra columns.Â
To understand this, let us add an extra column to the demo_file.csv
. The updated file is as follows.
Now, let us read both files into a pyspark dataframe using the csv()
function.
import pyspark.sql as ps
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
spark = ps.SparkSession.builder \
.master("local[*]") \
.appName("readcsv_example") \
.getOrCreate()
dfs=spark.read.csv(["demo_file2.csv","demo_file.csv"],header=True,inferSchema=True, sep="|")
print("The input csv files are:")
dfs.show()
print("The data type of columns is:")
print(dfs.dtypes)
spark.sparkContext.stop()
Output:
The input csv files are:
23/07/09 04:57:08 WARN CSVHeaderChecker: Number of column in CSV header is not equal to number of fields in the schema:
Header length: 4, schema size: 5
CSV file: file:///home/aditya1117/codes/demo_file2.csv
+------+----+----------+-------+-----+
| Name|Roll| Language|Ratings|Grade|
+------+----+----------+-------+-----+
|Aditya| 1| Python| 11| A|
| Sam| 2| Java| 12| A|
| Chris| 3| C++| 13| A+|
| Joel| 4|TypeScript| 14| A+|
|George| 12| C#| 15| null|
| Sean| 13| SQL| 16| null|
| Joe| 14| PHP| 17| null|
| Sam| 15|JavaScript| 18| null|
+------+----+----------+-------+-----+
The data type of columns is:
[('Name', 'string'), ('Roll', 'string'), ('Language', 'string'), ('Ratings', 'string'), ('Grade', 'string')]
In the above code, the demo_file.csv
contains 4 columns. Hence, the column names given in demo_file.csv
are selected for the schema despite the fact that we have passed it as the second file to the csv()
function. You can also observe that the output pyspark data frame contains the data from the demo_file.csv
on the top of the dataframe as the schema is selected from this file.
In this article, we have discussed different ways to read a CSV file in Pyspark. To learn more about pyspark, you can read this article on pyspark vs pandas. You might also like this article on how to create an empty pyspark dataframe.
I hope you enjoyed reading this article. Stay tuned for more informative articles.
Happy Learning!
The post PySpark Read CSV File With Examples appeared first on PythonForBeginners.com.
Planet Python
https://media.notthebee.com/articles/64a9c7048b28764a9c7048b288.jpg
"…a kind of architecture that really hates people, that is designed to oppress the human spirit, and make people feel without value…" ????
Not the Bee
https://www.pewpewtactical.com/wp-content/uploads/2023/06/Eugene-Stoner-AR-10.png
Everybody is familiar with the iconic AR-15, but just where did it come from?
To learn the history of the AR-15, you have to first look at the genius behind it…Eugene Stoner.
So, follow along as we talk about Stoner, his life, and what led him to create one of the most notable rifles in history.
Table of Contents
Loading…
Born in 1922, Stoner graduated from high school right in time for the beginning of World War II. Immediately after graduation, he landed a job at Vega Aircraft Company, installing ordnance. It was here that he would first learn about manufacturing arms.
But then Pearl Harbor happened, leading Stoner to join the Marines soon after.
His background in ordnance resulted in him being shipped to the Pacific Theater, where he was involved in aviation ordnance.
After the war, Stoner hopped around from a few different engineering jobs until he landed a position with a small division of the Fairchild Engine and Airplane Corporation known as Armalite.
Stoner’s first major accomplishment at Armalite was developing a new survival weapon for U.S. Air Force pilots.
This weapon was designed to easily stow away under an airplane’s seat, and in the event of a crash, a pilot would have a rifle at the ready to harvest small game and serve as an acceptable form of self-defense as well.
The result was known as the Armalite Rifle 5 – the AR-5. Though the modern semi-auto version is known as the AR-7, this weapon can still be found in gun cabinets across America.
Prices accurate at time of writing
Prices accurate at time of writing
Eugene Stoner had already left his mark but was far from fading into the shadows. He was just getting started.
Stoner continued his work at Armalite, but it wasn’t long until another opportunity appeared for him to change the course of history…the Vietnam War.
In 1955 the U.S. Army put out a notice that they were looking for a new battle rifle. A year later, the Army further defined they wanted the new weapon to fire the 7.62 NATO.
Tinkering in his garage, Stoner emerged with a prototype for a new rifle not long afterward called the AR-10.
The AR-10 was the first rifle of its kind, as never before had a rifle utilized the materials Stoner had incorporated.
Guns had always been made of wood and steel, but Stoner drew from his extensive history in the aircraft industry, using lightweight aluminum alloys and fiberglass instead.
This made his AR-10 a lighter weapon that could better resist weather.
Unfortunately, Stoner was late to the race, and the M14 was chosen as the Army’s battle rifle of choice instead.
The designs for the AR-10 were sold to the Dutch instead. Stoner returned to his day job, focusing on the regular rut of daily life.
But then the Army called again…
As it turned out, the M14 was too heavy with too much recoil and difficult to control while under full auto.
In addition, the 7.62 NATO was overkill within the jungles of Vietnam. Often the enemy couldn’t be seen beyond 50 yards, meaning that a lighter weapon could still accomplish the job and let soldiers carry more ammunition while on patrol.
Adding further urgency to the need was the Soviet development of the AK-47.
Amid The Cold War, the idea that the communists may have a better battle rifle than American soldiers was concerning.
So, the Army needed a new battle rifle.
Returning to his AR-10 plans, Stoner set to scaling things down. The AR-10 was modified to use the .223 Remington, with the new rifle designated the Armalite Rifle–15 or AR-15.
However, Armalite didn’t have the resources to produce weaponry on a mass scale, so they sold the designs to Colt.
Colt presented the design to the Army, but Army officials dismissed the design. It seemed they preferred the traditional look and feel of wood and steel over the AR-15’s aluminum and plastic.
But the story doesn’t end there…
At an Independence Day cookout in 1960, a Colt contract salesman showed Air Force General Curtis LeMay an AR-15. Immediately, LeMay set up a series of watermelons to test the rifle.
LeMay ended up so impressed with the new gun that the very next year – after his promotion to Chief of Staff – he requested 80,000 AR-15s to replace the Air Force’s antiquated M2 rifles.
His request was denied, and the Army kept supplying American soldiers overseas with the M14.
In 1963, the Army and Marines finally ordered 85,000 AR-15s…redesignated as the M16.
Immediately, the Army began to fiddle with Stoner’s design. They changed the powder to a design that proved more corrosive and generated much higher pressures.
Also, they added the forward assist (which Stoner hated). Inexplicably, they began to advertise the weapon as “self-cleaning.”
They then shipped thousands of rifles – without manuals or cleaning gear – to men in combat overseas. Men trained on an entirely different weapon system.
As expected, American solider began to experience jammed M-16s on the battlefield.
By this point, Stoner had left Armalite, served a brief stint as a consultant for Colt, and finally landed a position at Cadillac Gauge (now Textron).
It was there between the years of 1962-1963 that he began designing one of the most versatile firearms designs of its time: the Stoner 63.
The Stoner 63 was a modular system chambered in 5.56 NATO. Stoner crafted this weapon to be something of a Mr. Potato Head. The lower receiver could be transformed into just about anything.
A carbine, rifle, belt-fed SAW, vehicle-mounted weapon, and top-fed light machine gun were all variations of the Stoner 63, which could easily be crafted from the common receiver.
Interchangeable parts were utilized across the platform, and the barrels didn’t need tools to be swapped out. This was the Swiss Army knife of guns. It was truly a game-changer.
The catch was that it didn’t like to work as well on extended missions. There were so many moving parts, with such fine tolerances, that when spending weeks in the muddy jungle with a Stoner 63, the odds of losing a component or having a dirty, jammed gun were dangerous.
While the system worked wonderfully on quick missions of a few hours, it was deemed too much of a risk for use amongst the basic infantryman.
Despite this, the Stoner 63 still saw widespread use throughout the Special Forces before finally being retired in 1983.
In 1972, Stoner finally left Cadillac Gauge to start his own company, co-founding ARES with a friend.
Aside from making improvements on the Stoner 63 — with the new model called the Stoner 86 — he also began working on yet another rifle design that sadly never took off, known as the Future Assault Rifle Concept or FARC.
Stoner would continue designing weapons with ARES until he received an offer from Knight’s Armament Company.
Knight’s Armament Company would be the final company where Stoner would produce his legendary work.
Almost immediately, Stoner developed the SR-25 rifle, a more accurate version of the AR-10.
The Navy SEALS would finally adopt the weapon in 2000 as their Mark 11 Mod 0 Sniper Weapon. It would see use until finally being phased out 17 years later in 2017.
Another sniper rifle, the KAC SR-50, was also developed but strangely fell to the wayside due to political pressure.
As police departments nationwide began to upgrade their .38 Special revolvers for the new-tech polymer Glock, Stoner jumped into the fray.
He created a polymer-framed, single-stack, striker-fired design that showed great promise.
But the weapon was so unwieldy and inaccurate (engineers had bumped Stoner’s initial 6-pound trigger pull up to 12 pounds) that it was a fiasco. Colt would later pull it from shelves in 1993 over safety issues.
It was yet another frustrating end to what was originally a great design.
Eugene Stoner passed away from brain cancer in 1997 in the garage of his Palm City, Florida home.
By the time of his death, there were nearly 100 patents that were filed in his name. Not to mention, he’d revolutionized both the world of firearms and Americans’ ability to defend themselves.
What are your thoughts on Eugene Stoner and his designs? Let us know in the comments below. Want to learn more about other firearms designers? Check out our list of the 5 Most Influential Gun Inventors. Or, for your very own AR-15, check out our list of the top recommended AR-15 models.
The post Eugene Stoner: The Man Behind the AR-15 appeared first on Pew Pew Tactical.
Pew Pew Tactical
https://static1.makeuseofimages.com/wordpress/wp-content/uploads/2023/06/man-works-on-laptop-next-to-networking-equipment.jpg
Linux is commonly preferred among network engineers—so if you’ve thought about installing it for your work, you’re not alone.
SCROLL TO CONTINUE WITH CONTENT
If you’re a network engineer, it’s easy to wonder which distributions will have the best features for your work. Here are the six best Linux distributions for network engineering:
Of all the Linux distributions, one of the most highly regarded among network engineers is Fedora—and there’s a simple reason why.
Fedora is an open-source distribution that serves as a community equivalent to Red Hat Enterprise Linux (RHEL). RHEL itself is commonly chosen as the operating system for enterprise-level systems.
As a result, network engineers who use Fedora enjoy a greater level of familiarity with the RHEL systems they may encounter throughout their careers.
Fedora also offers users an incredible arsenal of open-source tools, built-in support for containerized applications, and consistent access to cutting-edge features and software.
Download: Fedora (free)
As one of the most popular enterprise distributions, RHEL is a great option because it is robust and reinforced. Each version of RHEL has a 10-year lifecycle, meaning that you’ll be able to use your chosen version of RHEL (and enjoy little to no compatibility issues) for years.
By using RHEL, you’ll also become familiar with many of the systems you’re likely to encounter on the job.
Many of the qualities of RHEL that make it attractive as an enterprise solution are just as appealing for independent users.
RHEL comes pre-equipped with the SELinux security module, so you will find it easy to get started with managing access controls and system policies. You’ll also have access to tools like Cacti and Snort through the RPM and YUM package managers.
Download: RHEL (free for developers; $179 annually)
Much like Fedora, CentOS Stream is a distribution that stays in line with the development of RHEL. It serves as the upstream edition of RHEL, meaning that the content in the latest edition of CentOS Stream is likely to appear in RHEL’s next release.
While CentOS Stream may not offer the same stability as Fedora, its enticing inclusion of cutting-edge software makes it worth considering.
CentOS Stream also has a distinct advantage over downstream editions of RHEL following Red Hat’s decision to close public access to the source code of RHEL: it will continue to stay in line with the latest experimental changes considered for the next release of RHEL.
In the future, CentOS Stream is likely to become the best option for anyone seeking an RHEL-adjacent distribution.
Download: CentOS Stream (free)
Another powerful and reliable option for network engineers is openSUSE. openSUSE is impressively stable and offers frequent new releases, making it a good option if you prefer to avoid broken packages while still taking advantage of the latest software releases.
Out of the box, you won’t have any issues configuring basic network settings through YaST (Yet another Setup Tool). Many of the packages that come preinstalled with openSUSE can provide you with incredible utility.
Wicked is a powerful network configuration framework, for example, while Samba is perfect for enabling file-sharing between Linux and Windows systems. You won’t have any trouble installing the right tool for a job with openSUSE’s Zypper package manager.
Download: openSUSE (free)
Debian is a widely-renowned Linux distribution known for being incredibly stable and high-performance. Several branches of Debian are available, including Debian Stable (which is extremely secure and prioritizes stability) and Debian Unstable (which is more likely to break but provides access to the newest cutting-edge releases of software).
One of the biggest advantages of using Debian for network engineering is that it has an incredible package-rich repository with over 59,000 different software packages.
If you’re interested in trying out the newest niche and experimental tools in networking and cybersecurity, an installation of Debian will provide you with total access.
Download: Debian (free)
As a distribution designed for penetration testing, Kali Linux comes with a massive variety of preinstalled tools that network engineers are certain to find useful. Wireshark offers tantalizing information about packets moving across a network, Nmap provides useful clues about network security, and SmokePing provides interesting visualizations of network latency.
Not all of the software packaged with Kali Linux is useful for network engineers, but luckily, new Kali installations are completely customizable. You should plan out what packages you intend to use in advance so that you can avoid installing useless packages and keep your Kali system minimally cluttered.
Download: Kali Linux (free)
While some Linux distributions are better suited to network engineers, almost any Linux distribution can be used with the right software and configurations.
You should test out software like Nmap and familiarize yourself with networking on your new Linux distro so that lack of familiarity doesn’t become an obstacle later on.
MakeUseOf
https://static1.makeuseofimages.com/wordpress/wp-content/uploads/2023/06/documents-on-wooden-surface.jpg
Data science is constantly evolving, with new papers and technologies coming out frequently. As such, data scientists may feel overwhelmed when trying to keep up with the latest innovations.
SCROLL TO CONTINUE WITH CONTENT
However, with the right tips, you can stay current and remain relevant in this competitive field. Thus, here are eight ways to stay on top of the latest trends in data science.
Data science blogs are a great way to brush up on the basics while learning about new ideas and technologies. Several tech conglomerates produce high-quality blog content where you can learn about their latest experiments, research, and projects. Great examples are Google, Facebook, and Netflix blogs, so waste no time checking them out.
Alternatively, you can look into online publications and individual newsletters. Depending on your experience level and advancement in the field, these blogs may address topics you’d find more relatable. For example, Version Control for Jupyter Notebook is easier for a beginner to digest than Google’s Preference learning for cache eviction.
You can find newsletters by doing a simple search, but we’d recommend Data Elixir, Data Science Weekly, and KDnuggets News, as these are some of the best.
Podcasts are easily accessible and a great option when you’re pressed for time and want to get knowledge on the go. Listening to podcasts exposes you to new data science concepts while letting you carry out other activities simultaneously. Also, using interviews with experts in the field, some podcasts offer a window into the industry and let you learn from professionals’ experiences.
On the other hand, YouTube is a better alternative for audio-visual learners and has several videos at your disposal. Channels like Data School and StatQuest with Josh Starmer cover a wide range of topics for both aspiring and experienced data scientists. They also touch on new trends and methods, so following these channels is a good idea to keep current.
It’s easy to get lost in a sea of podcasts and videos, so carefully select detailed videos and the best podcasts for data science. This way, you can acquire accurate knowledge from the best creators and channels.
Online courses allow learning from data science academics and experts, who condense their years of experience into digestible content. Recent courses cover several data science necessities, from hard-core machine learning to starting a career in data science without a degree. They may not be cheap, but they are well worth their cost in the value they give.
Additionally, books play an important role as well. Reading current data science books can help you learn new techniques, understand real-world data science applications, and develop critical thinking and problem-solving skills. These books explain in-depth data science concepts you may not find elsewhere.
Such books include The Data Science Handbook, Data Science on the Google Cloud Platform, and Think Bayes. You should also check out a few data science courses on sites like Coursera and Udemy.
Attending conferences ushers you into an environment of like-minded individuals you can connect with. Although talking to strangers may feel uncomfortable, you will learn so much from the people at these events. By staying home, you will likely miss out on networking, job opportunities, and modern techniques like deep learning methods.
Furthermore, presentations allow you to observe other projects and familiarize yourself with the latest trends. Seeing what big tech companies are up to is encouraging and educative, and you can always take away something from them to apply in your work.
Data science events can be physical or virtual. Some good data science events to consider are the Open Data Science Conference (ODSC), Data Science Salon, and the Big Data and Analytics Summit.
Data science hackathons unite data scientists to develop models that solve real-world problems within a specified time frame. They can be hosted by various platforms, such as Kaggle, DataHack, or UN Big Data Hackathon.
Participating in hackathons enhances your mastery and accuracy and exposes you to the latest data science tools and popular techniques for building models. Regardless of your results, competing with other data scientists in hackathons offers valuable insights into the latest advancements in data science.
Consider participating in the NERSC Open Hackathon, BNL Open Hackathon, and other virtual hackathons. Also, don’t forget to register for physical hackathons that may be happening near your location.
Contributing to open-source data science projects lets you work with other data scientists in development. From them, you’ll learn new tools and frameworks used by the data science community, and you can study project codes to implement in your work.
Furthermore, you can collaborate with other data scientists with different perspectives in an environment where exchanging ideas, feedback, and insights is encouraged. You can discover the latest techniques data science professionals use, industry standards, best practices, and how they keep up with data science trends.
First, search for repositories tagged with the data science topic on GitHub or Kaggle. Once you discover a project, consider how to contribute, regardless of your skill level, and start collaborating with other data scientists.
Following data science thought leaders and influencers on social media keep you informed about the latest data science trends. This way, you can learn about their views on existing subject matters and up-to-date news on data science trends. Additionally, it allows you to inquire about complicated subjects and get their reply.
You can take it a step further and follow Google, Facebook, Apple, and other big tech companies on Twitter. This gives you the privilege of knowing tech trends to expect, not only limited to data science.
Kirk Borne, Ronald van Loon, and Ian Goodfellow are some of the biggest names in the data science community. Start following them and big tech companies on Twitter and other social media sites to stay updated.
Sharing your work lets you get feedback and suggestions from other data scientists with different experience levels and exposure. Their comments, questions, and critiques can help you stay up-to-date with the latest trends in data science.
You can discover trendy ideas, methods, tools, or resources you may not have known before by listening to their suggestions. For example, a person may unknowingly use an outdated version of Python until he posts his work online and someone points it out.
Sites like Kaggle and Discord have several data science groups through which you can share your work and learn. After signing up and joining a group, start asking questions and interacting with other data scientists. Prioritize knowledge, remember to be humble, and try to build mutually beneficial friendships with other data scientists.
Continuous learning is necessary to remain valuable as a data scientist, but it can be difficult to keep up all by yourself. Consequently, you’ll need to find a suitable community to help you, and Discord is one of the best platforms to find one. Find a server with people in the same field, and continue your learning with your new team.
MakeUseOf