Crowder sets the record straight on Tim Pool’s podcast

https://www.louderwithcrowder.com/media-library/image.jpg?id=32892647&width=980

With all that has been said about the boss, about this company, and most importantly, ABOUT YOU, Crowder is sitting down with Tim Pool. They’ll be talking about #StopBigCon, the Daily Wire, and ALL of the issues plaguing the conservative media landscape today.


Timcast IRL – Steven Crowder Joins To Discuss StopBigCon.Com Live At 8PM EST

www.youtube.com

Louder With Crowder

Why MySQL Could Be Slow With Large Tables

https://www.percona.com/blog/wp-content/uploads/2023/01/Why-MySQL-Could-Be-Slow-With-Large-Tables-300×168.jpgWhy MySQL Could Be Slow With Large Tables

Why MySQL Could Be Slow With Large Tables16 years ago, our founder Peter Zaitsev covered this topic and some of the points described there are still valid, and we will cover more on this blog. While the technologies have evolved and matured enough, there are still some people thinking that MySQL is only for small projects or that it can’t perform well with large tables.

Some startups adopted MySQL in its early days such as Facebook, Uber, Pinterest, and many more, which are now big and successful companies that prove that MySQL can run on large databases and on heavily used sites.

With disks being faster nowadays and CPU and memory resources being cheaper, we could easily say MySQL can handle TBs of data with good performance. For instance, in Percona Managed Services, we have many clients with TBs worth of data that are well performant.

In this blog post, we will review key topics to consider for managing large datasets more efficiently in MySQL.

Primary keys:

This is one of the most important things to consider when creating a new table in MySQL, we should always create an explicit primary key (PK). InnoDB will sort the data in primary key order, and that will serve to reference actual data pages on disk. If we don’t specify a primary key, MySQL will check for other unique indexes as candidates for PK, and if there are none, it will create an internal clustered index to serve as the primary key, which is not the most optimal.

When there is no application logic or candidate to choose as a primary key, we can use an auto_increment column as the primary key. 

NOTE: As of MySQL 8.0.30, Generated Invisible Primary Keys were introduced to add an invisible primary key when no explicit PK is defined. You can refer to the documentation for further details.

Also, keep in mind that a portion of the primary key will be added at the end of each secondary index, so try to avoid selecting strings as the primary key, as it will make the secondary indexes to be larger and the performance will not be optimal. 

Redundant indexes:

It is known that accessing rows by fetching an index is more efficient than through a table scan in most cases. However, there are cases where the same column is defined on multiple indexes in order to serve different query patterns, and sometimes some of the indexes created for the same column are redundant, leading to more overhead when inserting or deleting data (as indexes are updated) and increased disk space for storing the indexes for the table.

You can use one of our tools pt-duplicate-key-checker to detect the duplicate keys.

Example (using the employee sample DB):

Suppose we have the following schema:

db1 employees> show create table employees\G
*************************** 1. row ***************************
       Table: employees
Create Table: CREATE TABLE `employees` (
  `emp_no` int NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`),
  KEY `idx_last_name` (`last_name`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Now, suppose that we need to filter by last_name and hire_date, we would create the following index:

ALTER TABLE employees ADD INDEX idx_last_name_hire_date (last_name,hire_date);

We would end up with the following schema:

db1 employees> show create table employees\G
*************************** 1. row ***************************
       Table: employees
Create Table: CREATE TABLE `employees` (
  `emp_no` int NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`),
  KEY `idx_last_name` (`last_name`),
  KEY `idx_last_name_hire_date` (`last_name`,`hire_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Now, the index idx_last_name and idx_last_name_hire_date have the same prefix (last_name).

The new index idx_last_name_hire_date can be used to serve queries filtered by only last_name or, by last_name and hire_date, leaving the last_name index redundant.

We can corroborate that by using pt-duplicate-key-checker:

[user1] percona@db1: ~ $ pt-duplicate-key-checker -d employees
# ########################################################################
# employees.employees                                                     
# ########################################################################


# idx_last_name is a left-prefix of idx_last_name_hire_date
# Key definitions:
#   KEY `idx_last_name` (`last_name`),
#   KEY `idx_last_name_hire_date` (`last_name`,`hire_date`)
# Column types:
#   `last_name` varchar(16) not null
#   `hire_date` date not null
# To remove this duplicate index, execute:
ALTER TABLE `employees`.`employees` DROP INDEX `idx_last_name`;


# ########################################################################
# Summary of indexes                                                      
# ########################################################################

# Size Duplicate Indexes   350357634
# Total Duplicate Indexes  1
# Total Indexes            17

Data types:

It’s not uncommon to find databases where the data type is not fitted correctly. There are many cases where there are int fields where data could fit in a smallint field or fixed-sized char fields that could be stored in a variable-sized varchar field. This may not be a huge problem for small tables, but for tables with millions of records, overprovisioning data types will only make the table to be bigger in size and performance, not the most optimal. 

Make sure you design the data types correctly while planning for the future growth of the table.

Example:

Creating four simple tables to store strings but using different data types:

db1 test> CREATE TABLE tb1 (id int auto_increment primary key, test_text char(200)); 
Query OK, 0 rows affected (0.11 sec)

db1 test> CREATE TABLE tb2 (id int auto_increment primary key, test_text varchar(200)); 
Query OK, 0 rows affected (0.05 sec)

db1 test> CREATE TABLE tb3 (id int auto_increment primary key, test_text tinytext); 
Query OK, 0 rows affected (0.13 sec)

db1 test> CREATE TABLE tb4 (id int auto_increment primary key, test_text text); 
Query OK, 0 rows affected (0.11 sec)

Inserting 2,000 rows with text:

[user1] percona@db1: ~ $ for i in {1..2000}; do for tb in {1..4}; do mysql test -e "INSERT INTO tb$tb (test_text) VALUES ('Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse euismod, nulla sit amet rhoncus venenatis, massa dolor lobortis nisi, in.');"; done; done

All four tables have 2,000 rows:

[user1] percona@db1: ~ $ mysql test -e "select count(*) from tb1; select count(*) from tb2; select count(*) from tb3; select count(*) from tb4;"
+----------+
| count(*) |
+----------+
|     2000 |
+----------+
+----------+
| count(*) |
+----------+
|     2000 |
+----------+
+----------+
| count(*) |
+----------+
|     2000 |
+----------+
+----------+
| count(*) |
+----------+
|     2000 |
+----------+

Let’s look at the disk space usage for the tables:

[user1] percona@db1: ~ $ sudo ls -lh /var/lib/mysql/test/|grep tb
-rw-r-----. 1 mysql mysql 592K Dec 30 02:48 tb1.ibd
-rw-r-----. 1 mysql mysql 464K Dec 30 02:48 tb2.ibd
-rw-r-----. 1 mysql mysql 464K Dec 30 02:48 tb3.ibd
-rw-r-----. 1 mysql mysql 464K Dec 30 02:48 tb4.ibd

We can see that tb1 is larger than the others, as it is storing the text on a fixed size char (200) field that will store the defined 200 characters without caring about the actual inserted string length, while the varchar, tinytext, and text fields are variable sized fields and will store only the actual length of the string (in the example we inserted 143 characters).

Compression:

Compression is the process of restructuring the data by changing its encoding in order to store it in fewer bytes. There are many compression tools and algorithms for data out there. 

MySQL supports native compression for InnoDB tables using the Zlib library with the LZ77 compression algorithm. It allows for saving disk space and data in memory at the expense of CPU usage for compressing and decompressing the data. If CPU usage is not a bottleneck in your setup, you can leverage compression as it can improve performance which means that less data needs to be read from disk and written to memory, and indexes are compressed too. It can help us to save costs on storage and backup times.

The compression ratio depends on multiple factors, but as with any other compression method, it is more efficient on text than on binaries, so tables with text fields will have a better compression ratio. 

Example (using the employee sample DB):

Created a new table employees_compressed:

mysql> CREATE TABLE employees_compressed LIKE employees;
Query OK, 0 rows affected (0.12 sec)

mysql> ALTER TABLE employees_compressed ROW_FORMAT=COMPRESSED;
Query OK, 0 rows affected (0.14 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> INSERT INTO employees_compressed SELECT * FROM employees;

Size comparison:

[user1] percona@db1: ~ $ sudo ls -lh /var/lib/mysql/employees/|grep employees
-rw-r-----. 1 mysql mysql 704M Dec 30 02:28 employees.ibd
-rw-r-----. 1 mysql mysql 392M Dec 30 17:19 employees_compressed.ibd

In this simple example, we had a compression ratio of ~45%!

There are a couple of blog posts from Yves that describe and benchmark MySQL compression:

Compression Options in MySQL (Part 1)
Compression Options in MySQL (Part 2)

Archive or purge old or non-used data:

Some companies have to retain data for multiple years either for compliance or for business requirements. However, there are many cases where data is stored and needed only for a short time; for example, why keep application session information for many years?

While MySQL can handle large data sets, it is always recommended to keep only the used data in the databases, as this will make data access more efficient, and also will help to save costs on storage and backups. There is a good blog post from Gaurav, MySQL Data Archival With Minimal Disruption,  showing how we can easily archive old data using pt-archiver.

Partitioning:

Partitioning is a feature that allows dividing a large table into smaller sub-tables based on a partition key. The most common use case for table partitioning is to divide the data by date.

For example: Partitioning a table by year, can be beneficial if you have data for many years and your query patterns are filtered by year. In this case, it would be more efficient to read only one smaller partition rather than one large table with information from many years.

It is very important to analyze the partition key before partitioning based on query patterns because if queries do not always use the partition key as a filtering condition, they will need to scan one or multiple partitions to get the desired data, which results in a huge performance penalty. 

It is a cool feature but as mentioned above, it is not suitable for every workload and it needs to be planned carefully, as choosing a poor partition key can result in huge performance penalties.

Sharding:

Sharding is the concept of splitting data horizontally, i.e. by distributing data into multiple servers (shards), meaning that the different portions of data for a given table, may be stored on many different servers. This can help to split large data sets into smaller ones stored in multiple servers.

The data is split in a similar way to partitioning, using a sharding key, which is the pattern of how the data is split and distributed among the shards. This needs to be handled at the application layer, and have a coordinator that reads the query and distributes the query to the specific shard where the data is stored.

Also, it is important to carefully select the appropriate sharding key depending on the query patterns to the table in order to solve the majority of queries by routing only to one shard, as having to look for the information from many shards and then filter it, process and aggregating it is an expensive operation.

From the above, not all applications or workloads may be fitted for sharding, and, adding that it requires to be properly handled to the application, it may add complexity to the environments.

MongoDB supports this natively, however, MySQL doesn’t, but there are some efforts in the MySQL world to implement sharding. 

Some of them are:

  • MySQL Cluster: 

MySQL NDB Cluster is an in-memory database clustering solution developed by Oracle for MySQL. It supports native sharding being transparent for the application. It is available under a paid subscription.

  • ProxySQL:

It is a feature-rich open-source MySQL proxy solution, that allows query routing for the most common MySQL architectures (PXC/Galera, Replication, Group Replication, etc.). 

It allows sharding by configuring a set of backend servers (shards) and a set of query rules, to route the application queries to the specified shards.

Note that it requires some handling on the application as it doesn’t support the merging and data retrieval from multiple shards.

You can find more information in this good blog from Marco: MySQL Sharding with ProxySQL

  • Vitess:

It is an open source database clustering solution created by PlanetScale that is compatible with the MySQL engine. It supports native sharding. You can find more information about Vitess on our blog post from Alkin: Introduction to Vitess on Kubernetes for MySQL – Part I of III.

MyRocks:

MyRocks is a storage engine developed by Facebook and made open source. It was developed for optimizing data storage and access for big data sets. MyRocks is shipped in Percona Server for MySQL

There is a cool blog post from Vadim covering big data sets in MyRocks:

MyRocks Use Case: Big Dataset

Query tuning:

It is common to find applications that at the beginning perform very well, but as data grows the performance starts to decrease. The most common cause is that poorly written queries or poor schema design are well-performant with minimum data, however, as data grows all those problems are uncovered. You can make use of the slow_query_log and pt-query-digest to find your problematic queries.

Administration:

Performing administrative tasks on large tables can be painful, specifically schema changes and backups.

For schema changes, Percona has a tool pt-online-schema-change that can help us to perform schema changes with minimal disruption to the database. It works by creating a new table with the desired schema change applied, and it copies existing data in batches from the original table to the new one. Ongoing changes are copied from the original table to the new one using triggers.

This way, in a large table, instead of having a huge blocking operation for an alter, pt-OSC can run in the background without locking and in batches to minimize performance impact. 

For backups of large datasets, Percona XtraBackup can help to reduce the time to backup and recovery, it is a hot physical backup solution that copies the data files of the tables while saving the ongoing changes to the database as redo logs. It supports native compression and encryption.

Remember that monitoring your databases is always important to help you find issues or bottlenecks, you can use and install for free Percona Monitoring and Management to take a deeper look at your servers’ and databases’ health. It provides QAN (Query Analyzer) to help you find problematic queries in your databases. 

Conclusion

The old myth that MySQL can’t handle large datasets is nothing but a myth. With hardware being more powerful and cheaper, and the technology evolving, now it is easier than ever to manage large tables in MySQL. 

Percona Database Performance Blog

Steven Crowder and the Daily Wire are publicly beefing about a $50M contract

https://media.notthebee.com/articles/63c9666c7b6d763c9666c7b6d8.jpg

After proclaiming that he was "done being quiet," Steven Crowder went on a tirade on his show yesterday against an unnamed "Big Con" conservative establishment that offered him a contract. As you’ll see below, the Big Con company is the Daily Wire, and the contract was worth $50 million over 4 years.

Not the Bee

This Hall Effect Stick Upgrade Kit Will Solve Joy-Con Drift Forever

https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/acb50d93703de9d9a8d20785286af5ba.jpg

GuliKit, makers of the truly excellent KingKong Pro 2 wireless controller for the Nintendo Switch, which we reviewed last year, has just released an upgrade/repair kit for the official Nintendo Joy-Cons that brings its drift-free Hall effect joysticks to the handheld console’s native controllers.

It’s an unfortunate fact that if you own a Nintendo Switch and play it regularly, you’ve possibly already experienced the issue known as ‘Joy-Con drift’ where the Switch detects joystick inputs even when a player’s fingers aren’t touching them at all. The most likely cause is related to components in modern controllers called potentiometers that physically wear down over time with prolonged use.

The issue can make games unplayable, and the only real solutions are to try to convince Nintendo to repair your Joy-Cons or repair them yourself. Even so, both are only a temporary fix when the same problematic joystick hardware is being used as a replacement.

The biggest selling point behind GuliKit’s KingKong Pro 2 wireless controller was that it uses upgraded joysticks that rely on Hall effect sensing, where magnets and magnetic sensors detect even the subtlest movements of the sticks. This eliminates moving parts rubbing against each other and wearing down over time, potentially eliminating joystick drift forever. On certain platforms, if the software is there to support it, you can also use Hall effect sticks to eliminate dead zones or customize how large your stick’s input radius is.

The KingKong Pro 2 was a workaround for Joy-Con drift, however, not a solution. Now, GuliKit has made that controller’s Hall effect joysticks available as a drop-in replacement/upgrade for the joystick hardware that still ships inside the Joy-Cons.

G/O Media may get a commission

Up to $100 credit

Samsung Reserve

Reserve the next gen Samsung device
All you need to do is sign up with your email and boom: credit for your preorder on a new Samsung device.

It looks like you can get a pair of them on Amazon for about $30 right now, or a four-pack for $53 if you want to help a friend and save a few bucks in the process. But while GuliKit promises these are a “100% fitable, perfect replacement, drop in with no hassles” fix, swapping out the sticks in your Joy-Cons isn’t as easy as swapping out AA batteries in a radio.

JoyCon Drift Fix! How to Replace the Nintendo Switch Left Joy-Con Joystick

iFixit has shared a video on YouTube of the process of swapping out a Joy-Con’s joystick, and while it’s relatively straightforward, it will definitely help to have the right tools on hand, including tweezers for manipulating ribbon cables, and a special tri-point screwdriver for dealing with the non-standard screws Nintendo loves to use. It goes without saying that an upgrade/repair like this will definitely void your Joy-Cons’ warranties, and probably the Switch’s too, but if you’re suffering from Joy-Con drift with no solution available, this seems like the best way to go right now.

Gizmodo

10 Things I Look for When I Audit Laravel Codebases


Oftentimes one of the first steps along my journey of working with a new company involves some kind of review of their code in the form of a code audit. How long the process takes depends on how much code they have and what I find. A longer audit should be more thorough, but a shorter audit should still be helpful. I put together this list of 10 things I would typically look for in an initial 2-3 hour audit of a Laravel project.

In this particular scenario I don’t assume that I have any kind of access to any of the company’s environments or the ability to consult with the company’s past or present developers. All I have is the code they’ve given me.

The goal of the audit is not to point fingers or tell people what they’ve done wrong, but to familiarize myself with the way they do things and to better understand the state their codebase is in. Once I have completed this process I can better advise them of potential improvements that could be made. The audit doesn’t necessitate setting up a dev environment. Although that may be helpful, the code should speak for itself. Later, as a follow up, I can set up a dev environment to test my assumptions and determine whether the code actually does what it appears to do.

One of the first things I look at when examining a company’s codebase is whether or not they are using version control. If they send me a copy of their code via email or a file sharing program, I already know that the answer is probably no. Without version control I don’t know who made what changes when and I don’t have a good way to allow multiple developers to work on the project at the same time. Ideally the client would have their project on Github because this is what I am most familiar with, but other solutions like Bitbucket and Gitlab would suffice. Version control also opens up the door to continuous integration and delivery which are both good strategies for an organization to adopt.

As a developer, the README file is where I go first when I want to know about a project on Github. It’s here that I look for instructions about installing the project in a local dev environment, what kind of requirements it may have, what the code does at a high level, and maybe some examples of how it is used. If the project has documentation, I would expect to find a link to that documentation here. However, if the default Laravel README is present, that’s not a good sign. It means the developer hasn’t taken the time to document their code in the most obvious and accessible place to do so. It may also mean that I am on my own when it comes to discovering any caveats involved in setting up my own development environment.

The composer.json and composer.lock files contain information about the project’s core dependencies. Once I’ve confirmed that these files exist, I check which version of Laravel the project is using. If I see that it’s using an older version of Laravel such as version 5.6 which was released in 2018, I might ask why they’ve avoided upgrading. If a major version upgrade is too much, why not at least upgrade to the lastest minor version which would be 5.8?

Having the latest version makes it easier to work with other software packages and take advantage of new features which save developers time and improve performance. If the company is not able or willing to make the investment to upgrade to the latest major version, they should at least keep up with the latest minor version which will occasionally eliminate newly-discovered security vulnerabilities and provide minor improvements. They should also be aware of when their version reaches end-of-life and will no longer be officially supported with security fixes.

In the fourth step I look at the routes in the routes directory. I want to see that routes are using middleware, groups, and controllers appropriately. This also gives me a feel for the scope of the project. Here I should be able to see every URL accessible by a user and every API endpoint.

I then will look for instances where the code interacts with the database to perform a select, insert, update, or delete operation. Ideally most queries and commands would be done through the appropriate model using Eloquent instead of the the database facade (DB) and the query builder. The models should contain fillable fields and methods relating them to other models.

I also look for queries that are manually constructed from strings which may introduce opportunities for SQL injection which could compromise the database. I don’t analyze every query individually, but if I see an obvious “N+1” problem (poorly optimized queries that lead to an excessive number of requests to the database) I will make a note of it.

For the sixth step in my audit I check to see that they are using migrations. Migrations are a valuable tool included with Laravel and allow the developer to codify schema changes and reliably apply them in production and development environments. Since each migration is a file with a timestamp in the filename it’s easy to see when the first and last migrations were created. If they are using migrations, then I check the tables in the migrations against their models. Are there any models without tables? Are there any tables without models?

I also take this opportunity to examine the structure of individual tables including column data types. I look for a consistent naming convention for both models and tables – one that takes advantage of Laravel’s automatic model-to-table mapping feature is preferrable. If you have, for example, a customer_orders table, the corresponding model would be CustomerOrder.

I like to see that the project provides a way for a developer to seed a new database for development or testing purposes. This can be a valuable resource and provides a much safer way of populating the database than importing production data. Laravel provides ways to seed database via Database Seeders and Factories. Seeders can also be used when setting up the initial production environment database and can indicate what data is required as a minimum for the app to run.

For this step I want to see how the frontend is rendered. This can vary widely between projects. If the project is using Blade templates, I look to see if any templates include logic that they shouldn’t and whether they are using layouts. I’ll scan the package.json file for anything interesting.

I mainly want to know how the frontend is connected to the backend. Is the app rendered server-side or client-side? If they’re using Vue or Inertia, I might spend more time here.

I don’t go into depth here I just want to see if there are tests and what testing library they are using. I might examine a few of the tests to see what the assertions are and get a feel for the level of coverage.

The last thing I do is look for examples of changes I could make to improve readability and maintainability. I do this by scanning through PHP files where I’m most likely to find business logic and database interactions. I want to find examples of code that is difficult to understand and repetitive. Some things I look for include variable and function names that are ambiguous or misleading, methods that are never called, and snippets of code that appear multiple times as if they were copied and pasted. I may also look for inconsistencies in syntax and use of indentation. From this I can kind of guess which if any coding standards the developers adhere to.

After performing this type of audit I should have enough information to take back to the company to advise them on what my next steps would be depending on their goals. The next logical step may involve setting up my own dev environment and testing the core features of the app as a user.

With my environment set up I should be able to validate migrations and run tests. I could also go through my notes from the first audit and test any assumptions I had.

Laravel News Links

The Ultimate Guide to Database Corruption: Part 2 – B-Tree Index Corruption

https://www.percona.com/blog/wp-content/uploads/2023/01/B-tree-Index-Corruption.jpgB-tree Index Corruption

B-tree Index CorruptionThis blog is in continuation of my previous blog on the basic understanding of corruption with the subject line The Ultimate Guide to Database Corruption: Part 1 – An Overview. If you have not already gone through it, I encourage you to read it to understand data corruption.

Introduction

This blog focuses on B-tree indexes and various corruption-related issues that occur with them. Understanding how Postgres implements the B-tree index is needed before one can begin understanding index corruption. Once we have a proper understanding of the index structure, and how the b-tree indexes are accessed, it would be easy to understand how the corruption affects the data retrieval operation, and various methods to resolve them.

Some case studies will also be covered in this blog.

B-tree index structure

In PostgreSQL, a B-tree index structure is implemented using Lehman and Yao’s high-concurrency B-tree algorithm. Logically, it comprises four types of pages organized in a hierarchical structure.

  • The meta page
  • The root page
  • Internal pages
  • Leaf pages

Meta page:

This is the first page of an index that basically contains metadata information, such as a type of index, and is also called the page-zero. The location of the root page can be obtained from this page. If this page is corrupted or inaccessible, it is impossible to access the index.

Root pages and internal pages:

The root page is the very first page that contains links to the pages called internal pages and leaf pages. In terms of storage, internal pages are no different than the root page; they also store pointers to other internal pages. The only difference is that there is only one root page in every index, while there may be a number of internal pages. 

These pages do not contain any information to access table records.

Leaf pages:

These pages lie at the last level and cannot be spawned further. They contain actual information to access table data. The data stored in these pages are values and CTIDs from the table where the particular value lies.

A typical structure of a B-tree index is as below.

B-tree index

As narrated in the diagram above, the root page and internal pages have tuples that are linked with the internal pages or leaf pages. Every internal and leaf page should have equal or greater values than the value linked from the previous page and the next higher value than the associated value in that page. The first tuple in every root and internal page is blank; it points to the page that contains all the values lower than its immediate right-hand neighbor.

For example, in the image above, 

  • The root page has three tuples (blank, 40, 80), and the “internal page 1” is associated with the tuple whose value is blank, and the next higher value in the root page is 40. So here, the “internal page 1” may contain values less than 40. 
  • While the “internal page 2” contains values greater than or equal to 40 and less than 80. 

Case studies

In the case of index corruption, it gives different results as per the position where the data is corrupted. Here, corruption may exist in any page (root, internal, and leaf). But, while studying it carefully, one may understand that corruption may not always reveal its existence. On occasions, it misguides users as well. If the page header or format is corrupt, it throws an error while executing the query itself. But, when the actual data inside pages are corrupted instead of format, it cannot detect it as corrupted, but it returns some results.

In this blog, I have deliberately chosen an index without any internal pages, so that readers may understand that internal pages may not necessarily be present in every B-tree index. The index used here for example is a primary key index.

This section deals with cases where queries return incorrect results due to corruption. Here, we delve into data corruption with leaf pages and internal pages.

Case 1 –  Data gets corrupted and becomes unsearchable

This is a classical case of corruption where Postgres tries to find a record, but it cannot be discovered because the value does not belong to its current parent node. In this case, in one of the leaf pages, two adjacent bits get exchanged with each other. Details are as below.

Here, we randomly pick a number from a table and prepare a case on it. Below is the record and details related to it. The record (id = 612) described in the snapshot below will be our target to describe corruption.

We have an index on the id column of corruption_test, and as described below, positions of underlined bits (10th and 11th) get exchanged in an index. Due to this, the actual value becomes 1124 from 612; however, the value in the associated table is still the same.

Table name: – corruption_test

Index name: – test_tbl_pkey

Corruption in page:- Leaf page 2

Associated CTID in table:- (101,6)

Actual value in an index tuple:- 00000010 01100100 (612)

Value in an index tuple after corruption:- 00000100 01100100 (1124)

The below image delineates what actually transpired in the test_tbl_pkey index.

test_tbl_pkey index.

Now, we will see how queries respond when we request to retrieve the data for values associated with this case.

Querying a table using the pristine number:

As described above, the actual (non-corrupted) value is 612. Let’s see what result we receive by querying using “id = 612” predicate.

Now, what if we search data using previously extracted CTID in the table?

This is sufficient to take you by surprise as the record actually exists in the table, but still, it fails to retrieve it when we directly query it using a particular value.

This happens because 612 was replaced by 1124 after corruption in the index. Here, the search performs the index scan to obtain the record easily, and the index is unable to locate the desired record; hence, it is unable to show any data. The below describes this situation.

Querying a table using a corrupted number:

As we have understood that 612 does not exist in the index, it may not be gleaned from the index. But, we know that 1124 exists in the index. Curious to know what happens when querying that 1124.

Let us dive a little deeper to know why we did not get any record upon querying corrupted data.

Here, as per a valid structure of an index, 1124 ought to be there in Leaf page 3, but it may not be found there as it does not exist in either the page or the table. While Leaf page 2 contains details pertaining to 1124, PostgreSQL will not search 1124 in Leaf page 2 as it does not logically satisfy to explore the value 1124 in that page; hence, the value will not be found.

The image below narrates the situation.

Case 2 –  Incorrect data pointers (CTID) stored against the value

In this case, PostgreSQL tries to find a record from the index, and it can find the record, but shows the false one. This happens because false CTID values are stored in an index. Due to this, it shows a completely different record.

Here, one bit of CTID gets changed, which gives a fresh impetus to corruption. We randomly pick a record (id = 245) from a table as a subject to describe the test case. The snapshot below is the record details.

Table name: – corruption_test

Index name: – test_tbl_pkey

Corruption in page:- Leaf page 1

Associated CTID in table:- (40,5)

Value at the CTID:- 245

Actual CTID in an index tuple :- (40,5) – (00101000,00000101)

CTID in an index tuple after corruption:- (56,5) – (00111000,00000101)

As described above, due to a change of value in the 5th bit, it stores a different CTID. The case is as described in the below.

Now, we will observe how different queries behave in this kind of situation.

Querying a table using an in-doubt number:

As described above, the value in doubt here is 245. Let’s see what result we receive by querying using the “id = 245” predicate.

This is shocking and perplexing as the data returned is completely different than we expected. No need to explain what its implication could be on day-to-day business.

Here, in an index, we can observe that the CTID stored against 245 points to a different value. So, the query returns different values here. The below image describes what actually happened internally.

Querying a table by selecting id only:

If we select only the id column in a query, it shows the correct result.

The reason to get the correct result is that it performs an “index-only scan” because the index exists on the id column. It is therefore not required to visit table data to show records as the index contains all the required records.

Case 3 –  Incorrect index format

Every database object has a pre-defined format. PostgreSQL (or any other database) is programmed to read and write in a specific format that is just a sequence of characters (or bytes). If one or more characters get changed, the object becomes unreadable. In such a corruption case, the query instantly returns an error, and the error texts are not the same every time.

Let’s look into various cases of errors.

Corrupted meta page:

As described above, the meta page is a core page of any index; it stores the metadata of an index. So, if any format-related corruption is there in the meta page, the index becomes unrecognizable. Due to this, the query returns no result. 

The below snapshot shows the result when querying a particular record.

The below query shows the result when the whole table is queried.

Here is a catch! It is understandable not to receive data if a query performs an index scan. But, it is perturbing when a query cannot deliver because of an index corruption when it is not expected to use an index. But, this happens due to a planner.

Corrupted non-meta page:

If format-related corruption exists in a non-meta page (root, internal, or leaf), it returns a different error. However, it does not affect queries that perform sequential scans.

Causes

After reading the above sections, one may understand corruption is nothing but storing inappropriate bytes in database files, which leads to anomalies in data. But, the question here is how it occurs. Although many events leave corruption, it is difficult to create any reproducible test case for corruption because there is no surety of what actually caused it. The only option left to us is to speculate around them. The majority of issues are caused by hardware failure or hardware issues, and the following are the most probable reasons.

  • Faulty RAID disks or RAID controllers
  • Defective disk or RAM
  • Disks without power failure protection
  • Overloaded CPU
  • PostgreSQL/OS bug

Additionally, there are some problems caused by users. Some of them are as below.

  • Killing a connection with signal 9
  • Abruptly stopping PostgreSQL
  • Copying file for backup without pg_start_backup
  • Executing pg_resetwal

Corruption is not a frequent affair; however, it becomes routine in case of faulty hardware or a bug.

Detection

As already mentioned in the case studies section, it is difficult (albeit there are exceptions) to detect corruption when data inside an index is corrupted. This is because data cannot be scanned by a database product; however, this is not true with the format.

When a B-tree index’s format is corrupted, we can use functions of amcheck and pageinspect extensions to verify its format. Here is an example of verifying an index page using the bt_page_items function of the amcheck extension.

We can iterate through all the pages of an index and list out what pages are corrupted. However, this hardly makes sense as we can not change data inside an index.

Repairing

Removing corruption from a B-tree index does not require a thorough analysis; it is just a matter of re-creating an index. As we know, an index is a subset of table data to enable quicker data retrieval. As long as there is no issue in the table data, we can rebuild the index. It can be performed by the following options.

  1. Drop and recreate an index
  2. Use reindex command
  3. Rebuild using pg_repack utility

Out of all of these, pg_repack is the most viable option as it starts creating an index in parallel so that running operations do not get affected. For more information, kindly visit the pg_repack blog.

Anticipation

After reviewing various scenarios related to corruption, we can surely say that it is really scary as it may spook us by showing unwanted and unexpected results. Data is a valuable asset, hence, it is needless to describe its business impact; we can surely reckon the same. Well, how about nipping it in the bud?

Yes, it is possible. We can get corruption recorded in the database using a checksum feature. While performing initdb, we can use the -k option to enable checksum and later need to keep the data_checksums parameter enabled. This will record every corruption pg_stat_database view. Here, if we already know where the corruption exists, it is possible to take necessary action before it is propagated to users.

Summary

A robust structure of B-tree indexes is designed for faster data access. However, when certain bits in bit sequences change, it becomes unreadable or starts returning false data, which has an impact on day-to-day business. Unequivocally, this is a horrible situation. However, it is possible to detect certain kinds of corruption, and it is recommended to take preemptive measures to tackle such a situation.

Please feel free to post your queries or suggestions in the comments section.

Percona Database Performance Blog

It’s time to stop…

https://www.louderwithcrowder.com/media-library/image.png?id=32860182&width=980

Conservative media giants are no better than Big Tech. The people you thought were fighting for you have been putting quick profits and the appeasement of their tech overlords ahead of any real conservative values. I can’t continue if this does. It’s time we put a stop to the Big Con.


It’s time to stop…

www.youtube.com

Louder With Crowder

John Ludhi/nbshare.io: Pandas Datareader To Download Stocks Data From Google And Yahoo Finance

Pandas Datareader To Download Stocks Data From Google And Yahoo Finance

Datareader package can be used to access data from multiple sources such as yahoo and google finance…

To install the package, use pip

In [ ]:
!pip install pandas_datareader

Use DataReader to access data from yahoo finance

In [ ]:
import numpy as np
import pandas as pd

from pandas_datareader import data as wb
aapl = wb.DataReader('AAPL', data_source='yahoo', start='1995-1-1')

At the time of writing this notebook, there is a bug in DataReader because of which You might run in to following error

TypeError: string indices must be integers

How To Fix DataReader TypeError: string indices must be integers

Use the following snippet to work around the above error.

In [9]:
import numpy as np
import pandas as pd
import yfinance as yf
yf.pdr_override()
from pandas_datareader import data as wb
[*********************100%***********************]  1 of 1 completed
In [11]:
aapl = wb.DataReader('AAPL', data_source='yahoo', start='1995-1-1')
[*********************100%***********************]  1 of 1 completed
In [15]:
aapl
Out[15]:
Open High Low Close Adj Close Volume
Date
1995-01-03 0.347098 0.347098 0.338170 0.342634 0.288771 103868800
1995-01-04 0.344866 0.353795 0.344866 0.351563 0.296296 158681600
1995-01-05 0.350446 0.351563 0.345982 0.347098 0.292533 73640000
1995-01-06 0.371652 0.385045 0.367188 0.375000 0.316049 1076622400
1995-01-09 0.371652 0.373884 0.366071 0.367885 0.310052 274086400
2023-01-09 130.470001 133.410004 129.889999 130.149994 130.149994 70790800
2023-01-10 130.259995 131.259995 128.119995 130.729996 130.729996 63896200
2023-01-11 131.250000 133.509995 130.460007 133.490005 133.490005 69458900
2023-01-12 133.880005 134.259995 131.440002 133.410004 133.410004 71379600
2023-01-13 132.029999 134.919998 131.660004 134.759995 134.759995 57758000

7059 rows × 6 columns

Use DataReader to access data from google finance

In [13]:
googl = wb.DataReader('GOOGL', data_source='googl', start='1995-1-1')
[*********************100%***********************]  1 of 1 completed
In [14]:
googl
Out[14]:
Open High Low Close Adj Close Volume
Date
2004-08-19 2.502503 2.604104 2.401401 2.511011 2.511011 893181924
2004-08-20 2.527778 2.729730 2.515015 2.710460 2.710460 456686856
2004-08-23 2.771522 2.839840 2.728979 2.737738 2.737738 365122512
2004-08-24 2.783784 2.792793 2.591842 2.624374 2.624374 304946748
2004-08-25 2.626627 2.702703 2.599600 2.652653 2.652653 183772044
2023-01-09 88.360001 90.050003 87.860001 88.019997 88.019997 29003900
2023-01-10 85.980003 88.669998 85.830002 88.419998 88.419998 30467800
2023-01-11 89.180000 91.599998 89.010002 91.519997 91.519997 26862000
2023-01-12 91.480003 91.870003 89.750000 91.129997 91.129997 30258100
2023-01-13 90.849998 92.190002 90.129997 92.120003 92.120003 26309900

4634 rows × 6 columns

Note – The pandas_datareader library does not have a built-in function for retrieving options data. However, there are other libraries and APIs you can use to obtain options data, such as yfinance or alpha_vantage for Yahoo Finance and Alpha Vantage, iex for IEX Cloud API.

Check out following tutorial if you want to use yfinance
How To Use yfinance Package

Planet Python

How to Install a Minecraft Bedrock Server on Raspberry Pi

https://static1.makeuseofimages.com/wordpress/wp-content/uploads/2023/01/muo-diy-raspberry-pi-minecraft-bedrock-server-featured.jpg

Playing Minecraft with friends and family requires either putting up with split screen mode, or using multiple devices. For the best results, these should connect to a Minecraft server.

But paying for a Minecraft server is expensive. Why not build your own? It is now possible to run Minecraft Bedrock Server on a Raspberry Pi.

Why Use Minecraft Bedrock Server?

Over the years, Minecraft has evolved beyond the original Java game. As of 2016, Minecraft Bedrock Edition has been the main version, released on PC, consoles, and mobile.

While this brings new features, improved graphics, and better stability to the game, Minecraft Bedrock Edition is not compatible with the old desktop and mobile Java version. As such, if you had installed Minecraft server on a Raspberry Pi, you would only be able to connect from the corresponding Java version (whether on a PC or another Pi).

As there is now a (Java-based) Minecraft Bedrock-compatible server for Raspberry Pi, you can use it to host games played on any device running Bedrock. This gives you the advantage of being fully in control of the server, from setting invites and assigning access rights to installing mods and backing up the world.

Which Raspberry Pi Will Run a Minecraft Bedrock Server?

For this project you have a choice of the Raspberry Pi 3 or Raspberry Pi 4. Naturally the Pi 4 with its 2GB, 4GB, and 8GB variants is the best option. However, you should be able to run Minecraft Bedrock Edition server on a Raspberry Pi 3.

To test this project, I used a Raspberry Pi 3 B+. This device has a 1.4GHz 64-bit quad-core processor and 1GB of RAM. Initial setup was over Wi-Fi, using SSH, but a better response and lower latency can be enjoyed with an Ethernet connection to your router.

Anything lower than a Raspberry Pi 3 should be avoided.

What You Need for a Minecraft Bedrock Server

To host the server software, you will need an operating system. For optimum performance, opt for a lightweight OS–Raspberry Pi OS Lite is probably the best option here.

See our guide to installing an operating system on the Raspberry Pi before proceeding. It is recommended that you configure the installation to automatically connect to your Wi-Fi network (if you’re using one), and have SSH enabled on the Raspberry Pi. If you’re not using SSH, you’ll need a keyboard and display set up and connected.

You will also need to install:

  • Git
  • Java SDK
  • Latest Bedrock Edition-compatible Java build of Minecraft

Follow the steps below to install these and configure your Minecraft Bedrock server.

Configure Raspberry Pi OS for Minecraft Bedrock Edition Server

Before you can install the server software, you will need to configure the Raspberry Pi. These steps assume you have already installed Raspberry Pi OS.

Start by ensuring the operating system is up-to-date:

 sudo apt update && sudo apt upgrade 

Next, open the Raspberry Pi configuration tool, raspi-config:

 sudo raspi-config 

Use the arrow keys to select System Options > GPU Memory and the GPU to 16. This ensures the majority of system resources are dedicated to the server. Hit Tab to select OK.

If you haven’t already enabled SSH at this point, do so by selecting Interfacing Options > SSH press Tab to select Yes and press Enter to confirm.

Next, hit Tab to select Finish, then Enter to reboot the Raspberry Pi.

Set up Minecraft Bedrock Server on Your Raspberry Pi

With the Raspberry Pi restarted, install Git

 sudo apt install git 

This software allows you to clone a GitHub repository to your computer, and is required for installing Minecraft Bedrock server.

You can now install Java.

 sudo apt install default-jdk 

This installs the default (current) version of Java. You can check which version by entering

 java -version 

(Note that to install a specific Java release, use a specific version name, such as sudo apt install openjdk-8-jdk.)

At the time of writing, the default-jdk version was 11.0.16.

Install Minecraft Bedrock Server on Raspberry Pi

You’re not ready to install the server. Begin by entering

 git clone https: 

Wait while this completes, then switch to the Nukkit directory

 cd Nukkit 

Here, update the submodule:

 git submodule update –init 

That will take a while to complete. When done, change permissions on mvnw

 chmod +x mvnw 

Finally:

 ./mvnw clean package 

This final command is the longest part of the process. It’s a good opportunity to boot Minecraft Bedrock Edition on your PC, mobile, or console in readiness.

Run Minecraft Bedrock Server on Raspberry Pi

When ready, change directory:

 cd target 

Here, launch the server software:

 java -jar nukkit-1.0-SNAPSHOT.jar 

You’ll initially be instructed to enter your preferred server language.

Once that is done, Nukkit starts, the server properties are imported, and the game environment is launched. This begins with the default gamemode set to Survival, but you can switch that later.

Once everything appears to be running, enter

 status 

This will display various facts such as memory use, uptime, available memory, load, and number of players.

You can also use the help command (or hit ?) to check what instructions can be used to administer the server. These can either be input directly into the Pi with a keyboard, via SSH, or from the Minecraft game’s chat console (remember to precede each command with “/” in the console).

Connect to Minecraft Bedrock Server from Another Device

With everything set up, you’re ready to connect to your server. To do this

  1. Launch Minecraft Bedrock Edition on any other device
  2. Select Servers > Add Server
  3. Input the server’s Name and IP address (you should know this from using SSH)
  4. Tap Play to enter the server right away, or Save
  5. Subsequent connections, the server will be listed under Additional Servers – simply select it then Join Server

A moment later, you should be in the Minecraft server world. On the server side, this will be recorded:

Create a Minecraft Bedrock Edition Server With Your Raspberry Pi

While a few steps are required to enable the Bedrock Edition server on Raspberry Pi, the end results are good. Our test device, you will recall, was a Raspberry Pi 3B+, more than adequate for 2-5 players. A Raspberry Pi 4 will probably perform better for a greater number of players.

Using a Raspberry Pi is just one of many ways you can create a Minecraft server for free.

MUO – Feed