Dissecting an Extreme Sparse MySQL Table Using ibdNinja

I previously wrote a blog post to introduce the concept of MySQL B+ Tree splits and explained how to deliberately construct a highly sparse table by hacking the current rules. Then in the next post, I introduced ibdNinja, a tool I developed to parse the MySQL data file. In this follow-up post, I demonstrate how to use ibdNinja to analyze and verify this table’s data file.

This post combines two topics:

  1. Constructing a highly sparse InnoDB table.

    Reference: Hack MySQL: How to Create an Extremely Sparse InnoDB Table (B+Tree)

  2. Using ibdNinja to analyze the table’s data file and identify interesting findings.

    Reference:ibdNinja: A powerful tool for parsing and analyzing MySQL 8.0 (.ibd) data files

1. Constructing a Highly Sparse Table

By running the following SQL file using the source command in the MySQL client, you can create a highly sparse InnoDB table called tbl:

mysql> source /PATH_TO_THIS_FILE/load.sql

load.sql:

-- Create the database and table
CREATE DATABASE IF NOT EXISTS test;
USE test;

DROP TABLE IF EXISTS tbl;
CREATE TABLE tbl (
    pk INT NOT NULL,
    val CHAR(240) NOT NULL DEFAULT 'a',
    PRIMARY KEY (pk)
) ENGINE=InnoDB CHARSET=LATIN1;

-- Create the stored procedure
DELIMITER //

CREATE PROCEDURE PopulateTable()
BEGIN
    DECLARE i INT;

    -- Insert initial values from 1 to 32
    SET i = 1;
    WHILE i <= 32 DO
        INSERT INTO test.tbl (pk, val) VALUES (i, 'a');
        SET i = i + 1;
    END WHILE;

    -- Insert values from 10000001 to 10000027
    SET i = 10000001;
    WHILE i <= 10000027 DO
        INSERT INTO test.tbl (pk, val) VALUES (i, 'a');
        SET i = i + 1;
    END WHILE;
    -- The first leaf page splits into 2 leaf pages:
    -- [1 ... 29],[30, 31, 32, 10000001 ... 10000027]

    -- Insert values from 10000028 to 10000054
    SET i = 10000028;
    WHILE i <= 10000054 DO
        INSERT INTO test.tbl (pk, val) VALUES (i, 'a');
        SET i = i + 1;
    END WHILE;

    -- Insert alternating values starting from 33 to 1000000
    SET i = 33;
    WHILE i < 100000 DO
        INSERT INTO test.tbl (pk, val) VALUES (i + 1, 'a');
        INSERT INTO test.tbl (pk, val) VALUES (i, 'a');
        SET i = i + 2;
    END WHILE;
END //

DELIMITER ;

-- Call the procedure
CALL PopulateTable();

After executing the script, you will have created the table tbl with 100,054 records, each having a size of 262 bytes. The resulting B+ Tree structure will be highly sparse, as illustrated in the following diagram:

image-1

Expected B+ Tree Structure

2. Using ibdNinja to Analyze and Verify the Table

1️⃣ Verifying that the Leftmost Page at Level 0 Contains 29 Records

First, run the following command to get an overview of the tbl.ibd data file:

./ibdNinja -f ../innodb-run/mysqld/data/test/tbl.ibd

image-2

From the output, you can see that the table contains a B+ Tree index with Index ID = 547.

Next, run the following command to get the number of levels in the B+ Tree and the page ID of the leftmost page at each level:

./ibdNinja -f ../innodb-run/mysqld/data/test/tbl.ibd -e 547

image-3

The output will show that the leftmost page ID at Level 0 (leaf level) is 5. Let’s analyze this page:

./ibdNinja -f ../innodb-run/mysqld/data/test/tbl.ibd -p 5

The tool will display detailed information about all the records in Page 5. Let’s take a look at the last record as an example:

image-4

Finally, the tool will print a summary of Page 5:

image-5

This confirms our expectation that the leftmost page contains 29 records.

2️⃣ Verifying that Subsequent Pages Contain Only 2 Records

We expect that, starting from the second page, each page in the leaf level will contain only 2 records, making the tree extremely sparse. Let’s verify this.

From the analysis of Page 5, we know that its next page ID is 7.

image-6

Let’s analyze Page 7:

./ibdNinja -f ../innodb-run/mysqld/data/test/tbl.ibd -p 7

image-7

The tool shows that Page 7 indeed contains only 2 records, with a space utilization of just 3.20%, confirming our expectation.

3️⃣ Verifying the Overall Sparsity of the B+ Tree

To analyze the entire B+ Tree, run the following command:

./ibdNinja -f ../innodb-run/mysqld/data/test/tbl.ibd -i 547

The tool will print a summary:

image-8

The tool then provides detailed statistics for both non-leaf and leaf levels:

image-9

This confirms that the B+ Tree is highly sparse at the leaf level.

4️⃣ Making the Situation Even Worse

The leaf level currently has a utilization rate of only 3.20%, which is already very low. But we can make it worse by dropping a column from the table.

Since the val column (a CHAR(240)) occupies 240 bytes of each record, we can drop this column using Instant Drop Column. The space previously occupied by this column remains allocated but becomes unusable, creating unutilizable gaps in the pages

To drop the column, execute the following command in the MySQL client:

ALTER TABLE tbl DROP COLUMN val;

Then, reanalyze the B+ Tree using the ibdNinja tool:

./ibdNinja -f ../innodb-run/mysqld/data/test/tbl.ibd -i 547

image-10

The analysis shows that 2.93% of the space in the index is occupied by dropped columns, which cannot be reused. The free space ratio remains unchanged.

5️⃣ Releasing Space with OPTIMIZE TABLE

To reclaim the wasted space, run the following command in the MySQL client:

OPTIMIZE TABLE tbl;

After running this command, analyze the B+ Tree again (note that the Index ID will change):

./ibdNinja -f ../innodb-run/mysqld/data/test/tbl.ibd -i 548

image-11

The leaf level’s size is reduced from 781MB to 2MB, with a space utilization of 90%. The B+ Tree is no longer sparse.

The ibdNinja tool provides a practical way to explore .ibd files and gain insights into InnoDB’s internal structures. It can be helpful for understanding MySQL’s storage engine and troubleshooting data-related issues. Whether you’re exploring InnoDB for learning or operational purposes, this tool offers assistance.