Why InnoDB Index Internals Matter
InnoDB stores data and indexes very differently from MyISAM and other engines. Understanding clustered and secondary indexes is essential for:
- Designing primary keys that scale
- Avoiding hidden performance problems
- Reducing random I/O and buffer pool pressure
- Interpreting EXPLAIN output correctly
This article walks through how InnoDB stores rows, how lookups really work, and practical rules for choosing good indexes.
InnoDB pages and B+‑trees in 2 minutes
InnoDB stores data in fixed-size pages (typically 16 KB) and organises indexes as B+‑trees. Every index is a B+‑tree; the difference is what the leaf pages contain.
Conceptual B+‑tree shape:
┌───────────────┐
│ Root page │
└──────┬────────┘
│
┌─────┴─────┐
│ Internal │
│ pages │
└─────┬─────┘
│
┌─────┴─────┐
│ Leaf │ <─ actual data or row pointers
│ pages │
└───────────┘
All lookups, inserts, updates and deletes navigate these trees.
What is the clustered index?
InnoDB stores the table’s data in the clustered index. There is exactly one clustered index per table. Its leaf pages contain the full row, not just pointers.
In most cases, the clustered index is the PRIMARY KEY. If you do not define one, InnoDB will choose or create a key:
- If there is a non-null UNIQUE index, it may be used as the clustered index.
- Otherwise, InnoDB creates a hidden 6‑byte row ID and clusters on that.
Leaf pages of the clustered index look like this:
┌───────────────────────────────────────────────┐
│ Clustered index leaf page │
├───────────────────────────────────────────────┤
│ PK: 101 | col_a | col_b | col_c | ... │
│ PK: 104 | col_a | col_b | col_c | ... │
│ PK: 110 | col_a | col_b | col_c | ... │
└───────────────────────────────────────────────┘
The rows are physically ordered by the clustered index key. This order drives the performance of range scans, inserts, and secondary index lookups.
Creating a table with a good clustered index
CREATE TABLE orders (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
customer_id BIGINT UNSIGNED NOT NULL,
status TINYINT NOT NULL,
created_at DATETIME NOT NULL,
total_cents INT NOT NULL,
PRIMARY KEY (id), -- clustered index
KEY idx_customer_created (customer_id, created_at),
KEY idx_status_created (status, created_at)
) ENGINE=InnoDB;
Here PRIMARY KEY (id) is the clustered index. All row data lives on its leaf pages.
What are secondary indexes?
Secondary indexes are all non-clustered indexes (non-PK B+‑trees). Their leaf pages do not contain full rows; they contain:
- The secondary index key columns
- The clustered index key (the primary key value) as a pointer
Example leaf page of idx_customer_created (customer_id, created_at):
┌────────────────────────────────────────────────────┐
│ Secondary index leaf page │
├────────────────────────────────────────────────────┤
│ customer_id: 42 | created_at: 2024-01-01 | PK: 101 │
│ customer_id: 42 | created_at: 2024-01-02 | PK: 104 │
│ customer_id: 77 | created_at: 2024-01-01 | PK: 110 │
└────────────────────────────────────────────────────┘
To get the full row via a secondary index, InnoDB:
- Searches the secondary index B+‑tree
- Finds the matching leaf entries and their PK values
- Uses each PK to look up the row in the clustered index
This extra step is called a “secondary lookup” or “bookmark lookup”.
Visualising a secondary index lookup
SELECT * FROM orders WHERE customer_id = 42 ORDER BY created_at;
Step 1: secondary index search
┌───────────────────────┐
│ idx_customer_created │
└───────┬───────────────┘
│ (find PKs: 101, 104, ...)
▼
┌───────────────────────┐
│ clustered index │
└───────────────────────┘
Step 2: for each PK found, fetch full row
Covering indexes and when they help
If all columns needed by a query are contained in a secondary index, InnoDB can avoid the second lookup and read only the secondary index. This is a covering index.
Example:
CREATE TABLE accounts (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
email VARCHAR(255) NOT NULL,
status TINYINT NOT NULL,
created_at DATETIME NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY idx_email (email),
KEY idx_status_created (status, created_at)
) ENGINE=InnoDB;
-- Covering index example:
SELECT status, created_at
FROM accounts
WHERE status = 1
ORDER BY created_at
LIMIT 100;
The index (status, created_at) covers this query: all selected and filtered columns are in the index, so no clustered lookup is needed.
How index choice affects performance
Primary key design impacts everything
- Every secondary index entry stores the primary key value.
- A wide primary key (e.g. long VARCHAR) makes all secondary indexes larger.
- Larger indexes mean more pages, more I/O, and less effective buffer pool usage.
Prefer short, stable primary keys, commonly:
BIGINT UNSIGNED AUTO_INCREMENT- Or a short, fixed-length business key if absolutely required
Insert patterns and page splits
Because InnoDB orders rows by the clustered index, insert pattern matters:
- Monotonically increasing PKs (e.g. AUTO_INCREMENT) append at the end of the B+‑tree, minimising page splits.
- Random PKs (e.g. random UUID) cause inserts all over the tree, leading to more page splits, fragmentation, and random I/O.
If you must use UUIDs, consider:
- Using binary(16) instead of char(36)
- Using ordered UUID variants (time-based or ULID-like) where appropriate
Range scans and physical order
Range scans on the clustered index (e.g. WHERE id BETWEEN 1000 AND 2000) are efficient because rows are physically adjacent on disk.
Range scans on secondary indexes are also efficient for reading keys, but each row still requires a clustered lookup unless the index is covering.
Step-by-step: analysing a query with EXPLAIN
1. Inspect the execution plan
EXPLAIN
SELECT id, status, total_cents
FROM orders
WHERE customer_id = 42
AND created_at >= '2024-01-01'
ORDER BY created_at
LIMIT 100;
Key things to check:
key: which index is chosen?type: is itrange,ref, or worse (ALL)?rows: how many rows are estimated?Extra: look for “Using index” (covering) vs “Using where; Using filesort”.
2. Compare columns used vs index definition
Suppose EXPLAIN shows idx_customer_created is used. The index is defined as:
KEY idx_customer_created (customer_id, created_at)
The query has:
- WHERE:
customer_id,created_at - ORDER BY:
created_at - SELECT:
id, status, total_cents
The index supports the WHERE and ORDER BY, but not all SELECT columns, so it is not covering. Expect secondary lookups.
3. Decide if a covering index is worth it
You could add:
ALTER TABLE orders
ADD KEY idx_customer_created_cover
(customer_id, created_at, status, total_cents, id);
Now the query can be fully served from the secondary index. Trade-offs:
- Faster reads for this pattern
- More disk and memory usage
- Slower writes (more index maintenance)
Only add such indexes for genuinely hot queries.
Best practices for InnoDB indexes
1. Always define an explicit PRIMARY KEY
- Avoid letting InnoDB create a hidden row ID; it is opaque and still stored in all secondary indexes.
- Choose a short, numeric PK whenever possible.
2. Keep the primary key narrow and stable
- Avoid long VARCHAR or composite business keys as PKs.
- Do not update PK values; this effectively moves the row in the clustered index.
3. Design secondary indexes around real queries
- Index columns in the order they are used in WHERE and ORDER BY.
- Put the most selective columns first when possible.
- Use covering indexes selectively for the most critical read queries.
4. Beware of redundant indexes
Indexes that are left prefixes of others are often redundant:
KEY idx_a (a)
KEY idx_a_b (a, b)
idx_a_b can typically serve queries on a alone, so idx_a may be removable. Reducing redundant indexes improves write performance and saves memory.
5. Monitor index usage
Use performance_schema and information_schema to understand index usage patterns (where supported), and periodically review:
- Unused or rarely used indexes
- Hot queries that might benefit from better indexing
- Tables with very wide primary keys
Safe index maintenance on RHEL/Rocky Linux
Index changes can be expensive. On RHEL/Rocky Linux, combine MySQL tools with shell commands for safer operations.
Check table size and index layout
SELECT
table_name,
index_name,
SUM(stat_value) AS stat_value
FROM mysql.innodb_index_stats
WHERE database_name = 'appdb'
AND table_name = 'orders'
GROUP BY table_name, index_name;
Plan online index changes
For large tables, prefer online DDL where supported:
ALTER TABLE orders
ADD KEY idx_status_created (status, created_at),
ALGORITHM=INPLACE,
LOCK=NONE;
Always verify support for ALGORITHM=INPLACE and LOCK=NONE in your MySQL version; some operations still require copying and blocking writes.
Monitor impact during changes
# On RHEL/Rocky Linux, monitor MySQL and I/O
sudo iostat -x 5
sudo vmstat 5
mysql -e "SHOW PROCESSLIST\G"
Schedule heavy index rebuilds during low-traffic windows and ensure you have tested the DDL in a staging environment first.
Conclusion
InnoDB’s clustered and secondary index design underpins how every query and write behaves. A well-chosen primary key, carefully targeted secondary indexes, and an understanding of covering indexes can dramatically improve performance and reduce resource usage. Treat indexes as first-class schema objects, review them regularly against real queries, and apply changes cautiously to keep your MySQL systems predictable and efficient.
This article offers general technical guidance. Validate all configurations in a safe environment before applying them to production.


Leave a Reply