Load Balancing Strategies for MySQL Router

Why InnoDB Index Internals Matter

InnoDB stores data and indexes very differently from MyISAM and other engines. Understanding clustered and secondary indexes is essential for:

Designing primary keys that scale
Avoiding hidden performance problems
Reducing random I/O and buffer pool pressure
Interpreting EXPLAIN output correctly

This article walks through how InnoDB stores rows, how lookups really work, and practical rules for choosing good indexes.

InnoDB pages and B+‑trees in 2 minutes

InnoDB stores data in fixed-size pages (typically 16 KB) and organises indexes as B+‑trees. Every index is a B+‑tree; the difference is what the leaf pages contain.

Conceptual B+‑tree shape:

┌───────────────┐
│   Root page   │
└──────┬────────┘
       │
 ┌─────┴─────┐
 │  Internal │
 │   pages   │
 └─────┬─────┘
       │
 ┌─────┴─────┐
 │   Leaf    │  <─ actual data or row pointers
 │   pages   │
 └───────────┘

All lookups, inserts, updates and deletes navigate these trees.

What is the clustered index?

InnoDB stores the table’s data in the clustered index. There is exactly one clustered index per table. Its leaf pages contain the full row, not just pointers.

In most cases, the clustered index is the PRIMARY KEY. If you do not define one, InnoDB will choose or create a key:

If there is a non-null UNIQUE index, it may be used as the clustered index.
Otherwise, InnoDB creates a hidden 6‑byte row ID and clusters on that.

Leaf pages of the clustered index look like this:

┌───────────────────────────────────────────────┐
│ Clustered index leaf page                     │
├───────────────────────────────────────────────┤
│ PK:  101  | col_a | col_b | col_c | ...       │
│ PK:  104  | col_a | col_b | col_c | ...       │
│ PK:  110  | col_a | col_b | col_c | ...       │
└───────────────────────────────────────────────┘

The rows are physically ordered by the clustered index key. This order drives the performance of range scans, inserts, and secondary index lookups.

Creating a table with a good clustered index

CREATE TABLE orders (
  id            BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  customer_id   BIGINT UNSIGNED NOT NULL,
  status        TINYINT NOT NULL,
  created_at    DATETIME NOT NULL,
  total_cents   INT NOT NULL,
  PRIMARY KEY (id),               -- clustered index
  KEY idx_customer_created (customer_id, created_at),
  KEY idx_status_created (status, created_at)
) ENGINE=InnoDB;

Here PRIMARY KEY (id) is the clustered index. All row data lives on its leaf pages.

What are secondary indexes?

Secondary indexes are all non-clustered indexes (non-PK B+‑trees). Their leaf pages do not contain full rows; they contain:

The secondary index key columns
The clustered index key (the primary key value) as a pointer

Example leaf page of idx_customer_created (customer_id, created_at):

┌────────────────────────────────────────────────────┐
│ Secondary index leaf page                          │
├────────────────────────────────────────────────────┤
│ customer_id: 42 | created_at: 2024-01-01 | PK: 101 │
│ customer_id: 42 | created_at: 2024-01-02 | PK: 104 │
│ customer_id: 77 | created_at: 2024-01-01 | PK: 110 │
└────────────────────────────────────────────────────┘

To get the full row via a secondary index, InnoDB:

Searches the secondary index B+‑tree
Finds the matching leaf entries and their PK values
Uses each PK to look up the row in the clustered index

This extra step is called a “secondary lookup” or “bookmark lookup”.

Visualising a secondary index lookup

SELECT * FROM orders WHERE customer_id = 42 ORDER BY created_at;

Step 1: secondary index search
┌───────────────────────┐
│ idx_customer_created  │
└───────┬───────────────┘
        │  (find PKs: 101, 104, ...)
        ▼
┌───────────────────────┐
│   clustered index     │
└───────────────────────┘
Step 2: for each PK found, fetch full row

Covering indexes and when they help

If all columns needed by a query are contained in a secondary index, InnoDB can avoid the second lookup and read only the secondary index. This is a covering index.

Example:

CREATE TABLE accounts (
  id          BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  email       VARCHAR(255) NOT NULL,
  status      TINYINT NOT NULL,
  created_at  DATETIME NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY idx_email (email),
  KEY idx_status_created (status, created_at)
) ENGINE=InnoDB;

-- Covering index example:
SELECT status, created_at
FROM accounts
WHERE status = 1
ORDER BY created_at
LIMIT 100;

The index (status, created_at) covers this query: all selected and filtered columns are in the index, so no clustered lookup is needed.

How index choice affects performance

Primary key design impacts everything

Every secondary index entry stores the primary key value.
A wide primary key (e.g. long VARCHAR) makes all secondary indexes larger.
Larger indexes mean more pages, more I/O, and less effective buffer pool usage.

Prefer short, stable primary keys, commonly:

BIGINT UNSIGNED AUTO_INCREMENT
Or a short, fixed-length business key if absolutely required

Insert patterns and page splits

Because InnoDB orders rows by the clustered index, insert pattern matters:

Monotonically increasing PKs (e.g. AUTO_INCREMENT) append at the end of the B+‑tree, minimising page splits.
Random PKs (e.g. random UUID) cause inserts all over the tree, leading to more page splits, fragmentation, and random I/O.

If you must use UUIDs, consider:

Using binary(16) instead of char(36)
Using ordered UUID variants (time-based or ULID-like) where appropriate

Range scans and physical order

Range scans on the clustered index (e.g. WHERE id BETWEEN 1000 AND 2000) are efficient because rows are physically adjacent on disk.

Range scans on secondary indexes are also efficient for reading keys, but each row still requires a clustered lookup unless the index is covering.

Step-by-step: analysing a query with EXPLAIN

1. Inspect the execution plan

EXPLAIN
SELECT id, status, total_cents
FROM orders
WHERE customer_id = 42
  AND created_at &gt;= '2024-01-01'
ORDER BY created_at
LIMIT 100;

Key things to check:

key: which index is chosen?
type: is it range, ref, or worse (ALL)?
rows: how many rows are estimated?
Extra: look for “Using index” (covering) vs “Using where; Using filesort”.

2. Compare columns used vs index definition

Suppose EXPLAIN shows idx_customer_created is used. The index is defined as:

KEY idx_customer_created (customer_id, created_at)

The query has:

WHERE: customer_id, created_at
ORDER BY: created_at
SELECT: id, status, total_cents

The index supports the WHERE and ORDER BY, but not all SELECT columns, so it is not covering. Expect secondary lookups.

3. Decide if a covering index is worth it

You could add:

ALTER TABLE orders
  ADD KEY idx_customer_created_cover
    (customer_id, created_at, status, total_cents, id);

Now the query can be fully served from the secondary index. Trade-offs:

Faster reads for this pattern
More disk and memory usage
Slower writes (more index maintenance)

Only add such indexes for genuinely hot queries.

Best practices for InnoDB indexes

1. Always define an explicit PRIMARY KEY

Avoid letting InnoDB create a hidden row ID; it is opaque and still stored in all secondary indexes.
Choose a short, numeric PK whenever possible.

2. Keep the primary key narrow and stable

Avoid long VARCHAR or composite business keys as PKs.
Do not update PK values; this effectively moves the row in the clustered index.

3. Design secondary indexes around real queries

Index columns in the order they are used in WHERE and ORDER BY.
Put the most selective columns first when possible.
Use covering indexes selectively for the most critical read queries.

4. Beware of redundant indexes

Indexes that are left prefixes of others are often redundant:

KEY idx_a (a)
KEY idx_a_b (a, b)

idx_a_b can typically serve queries on a alone, so idx_a may be removable. Reducing redundant indexes improves write performance and saves memory.

5. Monitor index usage

Use performance_schema and information_schema to understand index usage patterns (where supported), and periodically review:

Unused or rarely used indexes
Hot queries that might benefit from better indexing
Tables with very wide primary keys

Safe index maintenance on RHEL/Rocky Linux

Index changes can be expensive. On RHEL/Rocky Linux, combine MySQL tools with shell commands for safer operations.

Check table size and index layout

SELECT
  table_name,
  index_name,
  SUM(stat_value) AS stat_value
FROM mysql.innodb_index_stats
WHERE database_name = 'appdb'
  AND table_name = 'orders'
GROUP BY table_name, index_name;

Plan online index changes

For large tables, prefer online DDL where supported:

ALTER TABLE orders
  ADD KEY idx_status_created (status, created_at),
  ALGORITHM=INPLACE,
  LOCK=NONE;

Always verify support for ALGORITHM=INPLACE and LOCK=NONE in your MySQL version; some operations still require copying and blocking writes.

Monitor impact during changes

# On RHEL/Rocky Linux, monitor MySQL and I/O
sudo iostat -x 5
sudo vmstat 5
mysql -e "SHOW PROCESSLIST\G"

Schedule heavy index rebuilds during low-traffic windows and ensure you have tested the DDL in a staging environment first.

Conclusion

InnoDB’s clustered and secondary index design underpins how every query and write behaves. A well-chosen primary key, carefully targeted secondary indexes, and an understanding of covering indexes can dramatically improve performance and reduce resource usage. Treat indexes as first-class schema objects, review them regularly against real queries, and apply changes cautiously to keep your MySQL systems predictable and efficient.

This article offers general technical guidance. Validate all configurations in a safe environment before applying them to production.