Basic Data Analysis with MySQL Shell Python mode
https://ift.tt/2ZFyT2L
I recently watched a fantastic Python Pandas library tutorial series on YouTube. Without a doubt, Pandas is great for all sorts of data stuff. On the same token, MySQL Shell in Python mode is quite powerful in the sense that Python and the MySQL Shell (version >= 8.0) are somewhat united in the same environment. Although Pandas is in a league all its own when it comes to data analysis, between the power of MySQL and Python, we can also perform some basic analysis easily in MySQL Shell Python mode. In this blog post, I will cover some basic data analysis using Python mode in the MySQL Shell. Continue reading to see examples…
OS, Software, and DB used:
- OpenSuse Leap 15.1
- MySQL 8.0.21
Self-Promotion:
If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!
Data Set Used
You can download the Stack Overflow Developer Survey Results data set used in this post for your own exploration if you would like.
Basic Data Analysis with MySQL Shell Python Mode: Connecting, table row and column count, column names
For starters, we use the available global db
object and get a connection to an existing table in the database/schema by passing in a name to the get_table()
method:
1
|
MySQL localhost:33060+ ssl learning Py > data = db.get_table(‘so_data’)
|
I store the ‘so_data’ table in a shell.Object
variable named ‘data’. We can call the count()
method against this object and get a count of the total rows in the table:
1 2
|
MySQL localhost:33060+ ssl learning Py > data.count() 88883
|
Related: Read the post, Dynamic MySQL CREATE TABLE statement with pandas and pyodbc, I wrote and see how I dynamically created the ‘so_data’ table and populated it with accompanying data.
Calling the select()
method on the ‘data’ object, I can essentially retrieve all rows and columns from the table. However, at this time, I am only interested in the actual column names of the table. I’ll store this result in a ‘rows’ object variable:
1
|
MySQL localhost:33060+ ssl learning Py > rows = data.select().execute()
|
With the ‘rows’ variable, I call the get_column_names()
method on it and am returned a Python list
of all the table’s column names:
1 2 3 4 5 6 7 8 9 10 11 12 13
|
MySQL localhost:33060+ ssl learning Py > cols = rows.get_column_names()
MySQL localhost:33060+ ssl learning Py > cols [
"Respondent", "MainBranch", "Hobbyist", "OpenSourcer", "OpenSource", "Employment",
"Country", "Student", "EdLevel", "UndergradMajor", "EduOther", "OrgSize", "DevType", "YearsCode", "Age1stCode", "YearsCodePro", "CareerSat", "JobSat", "MgrIdiot", "MgrMoney", "MgrWant", "JobSeek", "LastHireDate", "LastInt",
"FizzBuzz", "JobFactors", "ResumeUpdate", "CurrencySymbol", "CurrencyDesc", "CompTotal", "CompFreq", "ConvertedComp",
"WorkWeekHrs", "WorkPlan", "WorkChallenge", "WorkRemote", "WorkLoc", "ImpSyn", "CodeRev", "CodeRevHrs", "UnitTests",
"PurchaseHow", "PurchaseWhat", "LanguageWorkedWith", "LanguageDesireNextYear", "DatabaseWorkedWith","DatabaseDesireNextYear",
"PlatformWorkedWith", "PlatformDesireNextYear", "WebFrameWorkedWith", "WebFrameDesireNextYear", "MiscTechWorkedWith",
"MiscTechDesireNextYear", "DevEnviron", "OpSys", "Containers", "BlockchainOrg", "BlockchainIs", "BetterLife", "ITperson",
"OffOn", "SocialMedia", "Extraversion", "ScreenName", "SOVisit1st", "SOVisitFreq", "SOVisitTo", "SOFindAnswer",
"SOTimeSaved", "SOHowMuchTime", "SOAccount", "SOPartFreq", "SOJobs", "EntTeams", "SOComm", "WelcomeChange", "SONewContent", "Age", "Gender", "Trans", "Sexuality", "Ethnicity", "Dependents", "SurveyLength", "SurveyEase" ]
|
As you can see, there are quite a lot of columns in this table. Instead of counting them myself, I use the Python len()
method and get a count of the ‘cols’ list
object:
1 2
|
MySQL localhost:33060+ ssl learning Py > len(cols) 85
|
Summary: We can use combinations of MySQL Shell methods: get_table()
, count()
, select()
, get_column_names()
and the Python len()
method and determine pertinent table meta-data information.
Basic Data Analysis with MySQL Shell Python Mode: Select specific columns, distinct/unique column values, and constraining rows
We learned in the previous section that table ‘so_data’ has over 80k rows of data, along with 85 total columns. That alone is nothing to scoff at. Instead of loading up our screens with many many rows and columns, I’ll utilize several of the available MySQL Shell methods, choosing certain columns and constraining the number of returned rows (if any).
The select()
method can accept a list of columns names, separated by commas. In this next query, I specify just the ‘SocialMedia’ column and limit the total number of rows to 10, using the limit()
method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
|
MySQL localhost:33060+ ssl learning Py > data.select(‘SocialMedia’).limit(10).execute() +————-+ | SocialMedia | +————-+ | Twitter | | Instagram | | Reddit | | Reddit | | Facebook | | YouTube | | YouTube | | YouTube | | Twitter | | YouTube | +————-+ 10 rows in set (0.0010 sec)
|
Pro Tip: The limit()
method parameter is the number of rows you want to be returned from theselect()
method.
While the above query does provide good information, suppose we need to know of all the unique values in the ‘SocialMedia’ column. We can easily include the DISTINCT
keyword in the call to select()
with the desired column name:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
|
MySQL localhost:33060+ ssl learning Py > data.select(‘DISTINCT (SocialMedia)’).execute() +————————–+ | SocialMedia | +————————–+ | Twitter | | Instagram | | Reddit | | Facebook | | YouTube | | NA | | VK ВКонта́кте | | WhatsApp | | I don”t use social media| | WeChat | | LinkedIn | | Snapchat | | Weibo | | Hello | | Youku Tudou | +————————–+ 15 rows in set (0.2199 sec)
|
Based on the returned query results, we can see there are 15 unique values for the ‘SocialMedia’ column.
Summary: The select()
method is capable of choosing either all table columns or individual columns depending on your needs and the column names supplied as parameters. If you need specific columns in your query results, provide those columns separated by commas as parameters to select()
. select()
also allows MySQL keywords to be included with columns should you need any of them as in the example query using DISTINCT
.
On the other hand, constrain the number of output rows returned from any query using the limit()
method by specifying the number of desired rows using limit()
‘s number parameter.
Basic Data Analysis with MySQL Shell Python Mode: Counting, group by, and other aggregate functions
In the previous section, we executed a query using select()
and DISTINCT
, retrieving the unique values in the ‘SocialMedia’ column. We have these results from that query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
|
MySQL localhost:33060+ ssl learning Py > data.select(‘DISTINCT (SocialMedia)’).execute() +————————–+ | SocialMedia | +————————–+ | Twitter | | Instagram | | Reddit | | Facebook | | YouTube | | NA | | VK ВКонта́кте | | WhatsApp | | I don”t use social media| | WeChat | | LinkedIn | | Snapchat | | Weibo | | Hello | | Youku Tudou | +————————–+ 15 rows in set (0.2199 sec)
|
What is the total number of non-NULL
rows for each unique ‘SocialMedia’ column value in the ‘so_data’ table? Can we find out with MySQL Shell Python mode? Absolutely.
We can use the COUNT()
aggregate function on the ‘SocialMedia’ column right in the select()
method. However, we need a GROUP BY
clause in the query as well. MySQL Shell has us covered with a same-named method, group_by()
.
In this query, I’ll retrieve a count of the actual values in the ‘SocialMedia’ column (ignoring NULL
‘s in that count) and group those counts by the ‘SocialMedia’ column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
|
MySQL localhost:33060+ ssl learning Py > data.select(‘SocialMedia’, ‘COUNT(SocialMedia) AS num_medias’).group_by(‘SocialMedia’).execute() +————————–+————+ | SocialMedia | num_medias | +————————–+————+ | Twitter | 11398 | | Instagram | 6261 | | Reddit | 14374 | | Facebook | 13178 | | YouTube | 13830 | | NA | 4446 | | VK ВКонта́кте | 603 | | WhatsApp | 13347 | | I don”t use social media| 5554 | | WeChat | 667 | | LinkedIn | 4501 | | Snapchat | 628 | | Weibo | 56 | | Hello | 19 | | Youku Tudou | 21 | +————————–+————+ 15 rows in set (0.2566 sec)
|
Summary: We can easily query using aggregate functions in the select()
method and group on appropriate columns with the group_by()
method.
Basic Data Analysis with MySQL Shell Python Mode: Filter rows with where and bind query criteria values
The WHERE
clause is responsible for filtering out which rows are returned from a query by way of some predicate test. Without a WHERE
clause, all rows are returned from a SELECT
query. Maybe you want that. Maybe not. If not, use WHERE
to filter rows according to your needs.
The WHERE
clause is not limited to only the SELECT
statement, as it is highly important in the DML commands UPDATE
and DELETE
. Without a WHERE
clause targeting a specific row or rows, all rows are affected – in the case of DML (UPDATE
and DELETE
) – or returned from a SELECT
query. MySQL Shell has a where()
method we can use to filter the rows in a select()
query just the same as in regular MySQL (or any SQL dialect).
It is generally a good practice not to include potential user input values into our query expressions. Most programming languages have some sort of binding mechanism in place that imposes a sort of parameterized query and/or a prepared statement. Using parameterized queries and prepared statements, we can greatly reduce the risk of SQL Injection attacks.
MySQL Shell has a bind()
method we can use instead of directly concatenating values into the query strings of the where()
predicate test(s). bind()
accepts named parameters (which is what I will use in the examples below) or the universal ‘?’ style of parameter binding.
Readers may be interested in the ‘DatabaseWorkedWith’ column of the ‘so_data’ table so let’s filter that column by rows where the ‘SocialMedia’ column is ‘Hello’ using where()
and bind()
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
MySQL localhost:33060+ ssl learning Py > qry_cols = ‘DatabaseWorkedWith’
MySQL localhost:33060+ ssl learning Py > data.select(qry_cols).where(‘SocialMedia = :soc_media’).bind(‘soc_media’, ‘Hello’).execute() +—————————————————————————————————————————————–+ | DatabaseWorkedWith | +—————————————————————————————————————————————–+ | Microsoft SQL Server;MySQL;SQLite | | Cassandra | | MySQL;Redis | | Microsoft SQL Server;MySQL | | NA | | NA | | Cassandra;Elasticsearch;Microsoft SQL Server;Oracle;SQLite | | PostgreSQL | | Cassandra | | NA | | NA | | DynamoDB | | Cassandra | | MongoDB;MySQL;Oracle;Other(s): | | SQLite | | NA | | NA | | NA | | Cassandra;Couchbase;DynamoDB;Elasticsearch;Firebase;MariaDB;MongoDB;Microsoft SQL Server;MySQL;Oracle;PostgreSQL;Redis;SQLite;Other(s): | +—————————————————————————————————————————————–+ 19 rows in set (0.3925 sec)
|
In this query, I use bind()
and include a matching named placeholder (minus the colon : prefix) and corresponding value for what is specified in the where()
method predicate. For example, in the where()
method I used the ‘:soc_media’ named parameter and represented it in bind()
with ‘soc_media’ and the actual accompanying value, ‘Hello’. Pretty straightforward.
Oftentimes, you need to filter a SELECT
query by more than one column or expression using multiple predicates. The MySQL Shell where()
method easily accepts multiple predicate conditions just as a regular MySQL WHERE
clause would using the AND
and OR
logical operators.
But, with multiple where()
predicates, that also means multiple parameterized values right? Yes, it does.
However, multiple calls to bind()
can be chained one after another, for each of the needed bound values. See the following query for a better understanding:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
MySQL localhost:33060+ ssl learning Py > data.select(qry_cols).where(‘SocialMedia = :soc_media AND DatabaseWorkedWith <> :na’).bind(‘soc_media’, ‘Hello’).bind(‘na’, ‘NA’).execute() +—————————————————————————————————————————————–+ | DatabaseWorkedWith | +—————————————————————————————————————————————–+ | Microsoft SQL Server;MySQL;SQLite | | Cassandra | | MySQL;Redis | | Microsoft SQL Server;MySQL | | Cassandra;Elasticsearch;Microsoft SQL Server;Oracle;SQLite | | PostgreSQL | | Cassandra | | DynamoDB | | Cassandra | | MongoDB;MySQL;Oracle;Other(s): | | SQLite | | Cassandra;Couchbase;DynamoDB;Elasticsearch;Firebase;MariaDB;MongoDB;Microsoft SQL Server;MySQL;Oracle;PostgreSQL;Redis;SQLite;Other(s): | +—————————————————————————————————————————————–+ 12 rows in set (0.3308 sec)
|
In the above query, we filtered using where()
on columns ‘SocialMedia’ and ‘DatabaseWorkedWith’. For each of the columns, we also include a separate call to bind()
.
The first bind()
method call includes the value ‘Hello’ for the ‘SocialMedia’ column named parameter ‘:soc_media’ and the second bind()
method call has the matching parameters for the ‘DatabaseWorkedWith’ with column and the ‘NA’ value for the ‘:na’ placeholder.
Summary: MySQL Shell provides powerful where()
and bind()
methods for row-filtering needs.
Be sure and check out, X DevAPI User Guide for MySQL Shell in Python Mode, for in-depth information on many of the topics covered in today’s post along with much much more.
Additional MySQL Shell Python mode articles you should read
I have written several blog posts about MySQL Shell Python mode so feel free to check any of those that interest you:
MySQL Shell Python mode is jam-packed with a ton of goodies and features. It opens up new options for working with data in the MySQL ecosystem. If you have not tried MySQL in Python mode, give it a shot. I am quite sure you will really like what you see.
Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!
A Call To Action!
Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.
Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.
To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)
Be sure and visit the “Best Of” page for a collection of my best blog posts.
Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.
Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.
The post Basic Data Analysis with MySQL Shell Python mode appeared first on Digital Owl’s Prose.
technology
via Planet MySQL https://ift.tt/2iO8Ob8
September 16, 2020 at 12:02PM