Relationalize Unstructured Data In AWS Athena with GrokSerDe

Managing the logs in a centralized repository is one of the most common best practices in the DevOps world. Application logs, system logs, error logs, and any databases logs also will be pushed into your centralized repository. You can use ELK stack or Splunk to visualize the logs to get better insights about it. But as a SQL guy, I wanted to solve this problem with Bigdata ecosystem(use SQL). As a part of that process, we can relationalize unstructured data in AWS Athena with the help of GrokSerDe.

Here S3 is my centralized repository. I know it will not scale like ElasticSearch, but why should I miss this Fun. For this use case, Im going to rationalize the SQL Server Error log in AWS Athena. Let’s take a look at the SQL server’s error log pattern.

2019-09-21 12:53:17.57 Server      UTC adjustment: 0:00
2019-09-21 12:53:17.57 Server      (c) Microsoft Corporation.
2019-09-21 12:53:17.57 Server      All rights reserved.
2019-09-21 12:53:17.57 Server      Server process ID is 4152.

Its looks like

yyyy-mm-dd space hh:mm:ss:ms space User space message

But sometimes, it has many lines like below.

2019-09-21 12:53:17.57 Server      Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64) 
  Aug 22 2017 17:04:49 
  Copyright (C) 2017 Microsoft Corporation
  Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2016 Datacenter 10.0 <X64> (Build 14393: ) (Hypervisor)
2019-09-21 12:53:17.57 Server      UTC adjustment: 0:00
2019-09-21 12:53:17.57 Server      (c) Microsoft Corporation.

If you see the 2nd, 3rd line we have the only message. And we know these all are just for information purpose, we’ll not get any useful information with that. Also as a part of Data cleansing, we should clean up some unwanted lines to make this relationalize.

I can consider the below format for my relationalize structure.

  • Year – Integer
  • Month – Integer
  • Day – Integer
  • Hour – Integer
  • Minute – Integer
  • Second – Integer
  • User – String
  • Message – String

We can convert this into a Grok pattern for this.

%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}\\s*%{TIME:time} %{LOG_LEVEL:user}\\s*( )*%{GREEDYDATA:message}

Create the table in Athena:

CREATE EXTERNAL TABLE `sql_errorlog`(
  `year` string , 
  `month` string , 
  `day` string , 
  `time` string , 
  `user` string , 
  `message` string )  
ROW FORMAT SERDE 
  'com.amazonaws.glue.serde.GrokSerDe' 
WITH SERDEPROPERTIES ( 
  'input.format'='%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}\\s*%{TIME:time} %{LOG_LEVEL:user}\\s*( )*%{GREEDYDATA:message}', 
'input.grokCustomPatterns'='LOG_LEVEL \[a-zA-Z0-9\]*') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://bhuvi-datalake/sql-error-log/'

The table got created. I used a custom pattern for pulling the user column.

Query the data:

SELECT *
FROM "default"."sql_errorlog" limit 10;

SELECT *
FROM "default"."sql_errorlog"
WHERE message LIKE '%shutdown%';

SELECT *
FROM "default"."sql_errorlog"
WHERE message LIKE '%Login failed%'

SELECT concat ('Server started at: ',year,'-',month,'-',day,' ',time) AS StartupTime
FROM "default"."sql_errorlog"
WHERE message LIKE '%Server process ID is%';

This is just a beginner guide, you can play around with windows logs, linux syslog, if you are a DBA then you may like to use this for MySQL, PostgreSQL, MongoDB logs.

BONUS: Regex Serde

If you are a developer, then regex might be easy for you. You can create a table with Regex Serde. Thanks to LeftJoin Who helped to write this Regex

CREATE EXTERNAL TABLE `bhuvi`(
  `date` string , 
  `time` string , 
  `user` string , 
  `message` string )
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ( 
  'input.regex'='(.*\\-.*\\-.*)\\s+(\\d+:\\d+:\\d+.\\d+)\\s+(\\S+)\\s+(.*?)$') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://bhuvi-datalake/bhuvi-1/'

References:

  1. Manage the SQL server workload with Customer Cloudwatch metrics.
  2. AWS Athena Grok Serde.

via Planet MySQL
Relationalize Unstructured Data In AWS Athena with GrokSerDe

Why Database Schema Optimization Matters

Schema Optimization

If you have been around MySQL for any length of time, you are probably aware that choosing the correct data types and optimizing your schema are actually important tasks.  A few years back at Percona Live 2016, I gave an introductory talk on schema review and optimization. Here’s the video:

 

I was thinking about that talk in the context of some of my current clients.  Though I had worked on extremely large database deployments during my earlier tenure at Percona, it was often more of an outlier.  Now, working as a Technical Account Manager with our largest clients, it is much more common.

The Fundamental Problem

I’d like to expand my thoughts on the “choosing the smallest data type you can” principle from my 2016 slides through the lens of a few of my 2019 clients.  I gave an example of two copies of the same table (a simple 4 column, 3 index table with ~4 million rows), one using a bigint for the primary key and one using a regular unsigned int for the primary key:

[root@sample-host plive_2016]# ls -alh action*ibd
-rw-rw---- 1 mysql mysql 908M Apr 7 16:22 action_bigint.ibd 
-rw-rw---- 1 mysql mysql 636M Apr 7 16:23 action.ibd

In this example, there was almost a 30% space savings associated with using the smaller data type.  Obviously, at the scale of 1GB of space, this is trivial. One comment I made during the talk references the adage “storage is cheap”.  While this can be true at a certain scale, I’m seeing this thinking break down more frequently with my largest clients.

The Problem Magnified at Scale

As an illustration, one of my clients is running roughly 10,000 Percona Server instances in their sharded deployment.  These servers are running on bare metal with above-average hardware (for performance concerns). While that sounds like a lot of servers (and it definitely is), you also have to take into consideration other operational concerns such as backups.  For the sake of some easier math, let’s assume the following:

  • 5 servers/shard
  • 500G data directory
  • 5 backups of each shard (various time ranges such as daily, weekly, monthly)

Using those numbers as an estimate, one would be looking at roughly the following for space:

  • ~4.7 petabytes storage for running instances (SSD)
  • ~6 petabytes storage for backups (HDD)

The Business Impact on Large Deployments

Suddenly, at that scale, the 30% space that seemed trivial in my example seems a bit more important.  Let’s run some numbers based on the current $/TB pricing of SSD and spinning HDD with the 30% reduction in space:

  • SSD Savings ~ $140,000
    • $100/TB Cost
    • 30% of 4.7PB = 1.4PB
  • HDD Savings ~ $46,000
    • $25/TB Cost
    • 30% of 6PB = 1.8PB

Saving 3 petabytes of storage would reduce the raw storage costs by nearly $200,000.  On top of the actual capital costs of 3PB of storage, you have to consider server count, power, and maintenance (among others) which would increase this cost significantly.  Clearly, this is just a theoretical example of the potential impact a small mistake like picking the wrong data type can have on the overall infrastructure cost at scale. Generally, by the time a company reaches this scale, these sorts of optimizations have already been made and we need to start looking deeper and more creatively at other alternatives.

While this is definitely an oversimplification of calculating storage costs based on raw hardware prices, it does beg the question: even though it may not seem important now, why not ensure your schema is optimized before it becomes an expensive problem to fix!

For a more holistic view of how optimizations such as this one can save you money, check out our Database Management Savings Calculator or reach out to us for a more thorough review.

via Planet MySQL
Why Database Schema Optimization Matters

Laravel Fireable Attributes

Fireable

Latest Version on Packagist StyleCI Build Status Total Downloads

An elegant way to trigger events based on attributes changes.

Installation

Install package through Composer

$ composer require envant/fireable

Usage

  1. Add the FireableAttributes trait to your model
  2. Define the attributes with specified events via the protected $fireableAttributes property on the model

Example

Let’s say we need to trigger specified events when specific model attributes are updated.

For example, you need to notify user when he gets an "approved" status. Instead of observing model’s "dirty" attributes and firing events manually we could do it more elegantly by assigning specified events to attributes or even certain values of attributes.

class User extends Authenticatable {  use FireableAttributes;   protected $fireableAttributes = [  'status' => [  'approved' => UserApproved::class,  'rejected' => UserRejected::class,  ],  ]; }

Also you may not need to track certain values, so you can assign an event directly to an attribute itself. So, in the example below, each time the user’s email is changed, the appropriate event will be fired.

class User extends Authenticatable {  use FireableAttributes;   protected $fireableAttributes = [  'email' => EmailUpdated::class,  ]; }

Change log

Please see the changelog for more information on what has changed recently.

Testing

Contributing

Please see contributing.md for details and a todolist.

Security

If you discover any security related issues, please email author email instead of using the issue tracker.

Credits

License

license. Please see the license file for more information.

via Laravel News Links
Laravel Fireable Attributes

How to Reveal Your Saved Wifi Passwords in Windows or macOS

Riddle me this: You’re out somewhere and you need to hop on a wifi network with a new device. You realize you have the wifi password saved on your laptop, but not on whatever device you’re looking to connect. And you’re either too lazy to ask for the password again, or you have no way to acquire it in your present condition.

What do you do? Easy. Pull out your laptop and look it up. Here’s how:

Windows

To find a saved wifi password, you have a few options. First, you can pull up a Command Prompt and type in this somewhat-complicated string:

netsh wlan show profile [NAME OF YOUR WIFI NETWORK] key=clear

You’ll want to replace the [NAME OF YOUR WIFI NETWORK] part with, well, the name of whatever SSID you’ve connected to. When you do, and hit Enter, you’ll see the password for said wifi network in the “Key Content” listing of the Security settings field—it should be pretty apparent.

If you’d like to try another way, you can pull up your passwords via the Windows 10 Settings app. Launch it, click on Network & Internet, scroll down a bit and click on Network and Sharing Center, click on the blue “Wi-Fi” link next to the “Connections:” field, click on “Wireless Properties,” click on the Security tab, and select “Show characters.”

There are other utilities you can try to get your passwords in an even simpler manner, but I should note that Windows Defender might not like them very much. At least, that was the case when I tried downloading Nirsoft’s WirelessKeyView—surely a harmless program, but one that makes Windows Defender freak out once it finishes transferring to your system.

Mac

On macOS, revealing up a saved wifi password is simple. First, you can pull up Terminal and type in the following:

security find-generic-password -wa [NAME OF YOUR WIFI NETWORK]

Same deal as before: Replace [NAME OF YOUR WIFI NETWORK] with exactly what it says. You’ll next have to authenticate into your system as an administrator, but once you do that, the password for whatever wifi network you typed in should appear within Terminal.

You can also just dig through Keychain—specifically, the Keychain Access application—to sniff out a saved password. Launch the application and click on the System Keychain in the upper-left corner. Find the wifi network you want to look up and double-click on it.

When you do, you’ll see a box that looks like the following. Click “Show Password” to do just that—after you authenticate yourself, of course.

via Lifehacker
How to Reveal Your Saved Wifi Passwords in Windows or macOS

Dealmaster: A bunch of Amazon devices are on sale for Prime members today

Dealmaster: A bunch of Amazon devices are on sale for Prime members today

Ars Technica

Greetings, Arsians! The Dealmaster is back with another round of deals to share. Today’s list is headlined by a new round of discounts on Amazon devices, including the company’s Fire TV Stick streamers, Fire tablets, and Kindle e-readers, among others. The catch? Most of the discounts are for Prime members only.

Still, that covers a whole lot of people, and a few of the discounts here either match or come close to the prices we saw during Amazon’s Prime Day event in July. The company’s

latest (and waterproof) Kindle Paperwhite

, for one,

is down to $90

, while the

entry-level Kindle

is down to $65. Both are $5 more than they were on Prime Day. At $50 and $30, respectively,

the Fire HD 8

and

Fire 7

are now matching their Prime Day prices and are still generally worthwhile choices for people wanting to spend as little as possible on a tablet. The

Fire TV Stick 4K

and 1080p

Fire TV Stick

aren’t as steeply discounted, meanwhile, but they’re both $15 off for those in need of a new streaming stick. The company’s

Cloud Cam

security camera and

Echo Show 5

smart display are significantly discounted for those who don’t have Prime, too.

The big caveat here is that Amazon is announcing new hardware of some sort next week. The company held an event last September where it mainly introduced new Echo devices and other smart home accessories, so it’s not certain that the Fire and Kindle devices here will be replaced, but there’s at least some chance that these discounts are designed to clear out inventory. Still, most of what’s here is a good value all the same. And if you’re not interested in having more Amazon in your life, we also have deals on Roku streamers, Logitech keyboards and mice, external hard drives, and more. Have a look at the full rundown below.

Note: Ars Technica may earn compensation for sales from links on this post through affiliate programs.

Table of Contents

Top 10 deals of the day

Amazon device deals

  • Prime only: Amazon Fire TV Stick 4K HDR media streamer for $34.99 at Amazon (normally $49.99).
  • Prime only: Amazon Fire TV Stick 1080p media streamer for $24.99 at Amazon (normally $39.99).
  • Prime only: Amazon Fire TV Recast (500GB) over-the-air DVR for $169.99 at Amazon (normally $229.99).
  • Prime only: Amazon Fire HD 8 (16GB, ads) tablet – 8-inch 1280×800 for $49.99 at Amazon (normally $79.99).
  • Prime only: Amazon Fire 7 (16GB, ads) tablet – 7-inch 1024×600 for $29.99 at Amazon (normally $49.99).
  • Prime only: Amazon Kindle Paperwhite (8GB, ads) e-reader for $89.99 at Amazon (normally $129.99).
  • Prime only: Amazon Kindle (4GB, ads) e-reader for $64.99 at Amazon (normally $89.99).
  • Amazon Cloud Cam 1080p security camera for $89.99 at Amazon (normally $119.99).
  • Amazon Echo 5 smart display for $64.99 at Amazon and Best Buy (normally $89.99).

Laptop and desktop PC deals

  • Apple Mac Mini (late 2018) – Intel Core i3-8100B, 8GB RAM, 128GB SSD for $699 at Amazon (normally $749.99).
  • Samsung Chromebook Plus V2 laptop – Intel Core m3-7Y30, 12.2-inch 1920×1200 touch, 4GB RAM, 128GB eMMC for $399 at Best Buy (normally $449.99).
  • Asus ROG Strix G gaming laptop – Intel Core i5-9300H, 15.6-inch 1080p 120Hz, 8GB RAM, 512GB SSD, GTX 1660 Ti 6GB for $999 at Walmart (normally $1,299).
  • Lenovo ThinkPad P53 mobile workstation laptop – Intel Core i7-9750H, 15.6-inch 3840×2160 OLED touch, 16GB RAM, 512GB SSD, Nvidia Quadro T2000 4GB GPU for $1,484.45 (use code: THINKSEPT - normally $1,899).
  • Dell XPS Tower desktop – Intel Core i7-9700, 16GB RAM, 512GB SSD, GeForce GTX 1050 Ti for $949.99 at Dell (use code: DTXPSAFF1 – normally $1,399.99).
  • AMD Ryzen 7 2700X 8-core/16-thread desktop processor for $197.99 at Amazon and Newegg (normally $239.99).

Video game deals

Gaming deals

TV and home entertainment deals

Electronics deals

Accessories and miscellaneous deals

  • LEGO Ideas NASA Apollo Saturn V 21309 kit (1,900 pieces) for $99.99 at Amazon (normally $119.99).
  • Instant Pot Ultra (6qt) electric pressure cooker for $83.77 at Amazon (normally $109).
  • Anker PowerPort Atom III wall charger – 60W, 45W USB-C PD, 1x USB-A for $31.99 at Amazon (normally $42.99).
  • Anker Roav Viva smart car charger – Alexa, 2x USB-A for $19.96 at Amazon (clip 11% coupon – normally $35.99).
  • Anker PowerWave Pad wireless charger – 10W for Galaxy phones, 7.5W for iPhone for $9.99 at Amazon (use code: LABOR2503 - normally $17.99).
  • Anker PowerLine II 3-in-1 charging cable – USB-C, microUSB, Lightning for $11.24 at Amazon (use code: ANKER8436 - normally $17.99).

via Ars Technica
Dealmaster: A bunch of Amazon devices are on sale for Prime members today

A Peek Into the Soviet Computer Revolution

One of the largest and coolest collections of Soviet computers in the world resides in an apartment complex in Mariupol, Ukraine. Dmitriy Cherepanov started Club 8-bit with a small collection of computers built when the Soviet bloc was crafting its own personal computers.

It’s like looking into an alternative universe. The machines popular to kids growing up in the Soviet bloc look just different enough from what we’re familiar within the western world, but still carries that same since of nostalgia you or I might get for a Commodore 64 or Macintosh II.

Cherepanov has been collecting and restoring these computers for over a decade, and his museum of PCs is a fascinating look at the wide scope of the 80s PC revolution.

via Gizmodo
A Peek Into the Soviet Computer Revolution

Behold the power of shotgun

Good news from The Sun UK:

IN HIS SIGHTS SAS hero kills five terrorists in seven seconds with shotgun to stop ‘suicide bombing’

AN SAS hero armed with a shotgun reportedly killed five terrorists in just seven seconds to stop a suicide bombing attack.

The Brit soldier had stormed a bomb factory as part of an SAS raid on an ISIS outpost in Baghdad, Iraq.

As the SAS ‘breach team’ entered a courtyard, they were confronted by a group of heavily armed jihadis.

One of the brave Brits fired at them with his Benelli M4 Super 90 semi-automatic shotgun, killing three would-be bombers.

Another two terrorists appeared from a building and he reportedly shot them dead as well.

Several other jihadis then emerged, but immediately surrendered after seeing two of the bodies “didn’t have heads”, reports claim.

Ha ha ha, fuck yeah!!!

I’m reminded of what the great Clint Smith said about shotguns:

Shotguns at the right range with the right load will physically remove a chunk of shit off your opponent and throw that shit on the floor.  And you have to get someone to come in and clean this shit up with a shovel.”

Apparently that SAS operator was at the right range with the right load and that chunk of shit was Muhammad Bin Terrorist’s head.

“The terrorists were no more than a few feet away when the SAS team came face to face with them,” the source said.

“They had just finished morning prayers and were loading weapons into a vehicle. We now think they were about to carry out an attack.

“One of the breach team opened fire…it was a case of bang, bang, bang, then bang, bang. It was over in seven seconds.”

Suicide vests filled with slabs of plastic explosives and ball bearings were found on two of the dead jihadis.

The vests are understood to have been designed for mass casualties.

I remember being told a story by one of my Marine buddies.  They were having problems in Iraq where they would shoot some terrorist in a breaching operation but the terrorist had enough time before dying to pull the pin on a grenade or set off explosives and get the Marines who entered the room.

So a few of them bought 1 oz. deer slugs when they got rotated back home and loaded those up in their shotguns on their next deployment.

A 1 oz. deer slug hitting center of mass at no more than 10 yards is so traumatic that it shuts a human being off like a light.

There is nothing like the power of a 12 gauge scattergun at close range for ending threats.

via
Behold the power of shotgun