Dashboard displaying MySQL InnoDB Cluster failover test in progress.

How to Test Automatic Failover in InnoDB Cluster

,

InnoDB indexing is one of the most important performance topics in MySQL, yet many engineers only have a vague idea of how clustered and secondary indexes really work.

This article walks through the physical structure of InnoDB indexes, how lookups are executed, and practical patterns for designing efficient indexes. The goal is to help you reason about query plans instead of guessing.

1. InnoDB index fundamentals

Every InnoDB table is stored as a B+Tree index. That primary index is called the clustered index. All table data lives in the leaf pages of this tree.

Key points:

  • Table data is physically ordered by the clustered index key.
  • There is exactly one clustered index per table.
  • If you do not define a primary key, InnoDB will create one (not always what you want).

Conceptual view of a simple clustered index on id:

┌─────────────────────────────────────────────┐
│              Clustered index (id)          │
├───────────────┬────────────────────────────┤
│ Internal page │ Pointers to child pages    │
├───────────────┴────────────────────────────┤
│ Leaf page 1   │ rows with id 1..100        │
│ Leaf page 2   │ rows with id 101..200      │
│ Leaf page 3   │ rows with id 201..300      │
└─────────────────────────────────────────────┘
Each leaf row contains: (id, col1, col2, col3, ...)

Because the table is ordered by the clustered index, range scans on that key are very efficient.

2. How InnoDB chooses the clustered index

InnoDB picks the clustered index using this precedence:

  1. Explicit PRIMARY KEY.
  2. First UNIQUE index where all key columns are NOT NULL.
  3. Hidden 6-byte row ID generated by InnoDB.

Example table:

CREATE TABLE orders (
  id           BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  customer_id  BIGINT UNSIGNED NOT NULL,
  status       TINYINT NOT NULL,
  created_at   DATETIME NOT NULL,
  total_cents  INT NOT NULL,
  PRIMARY KEY (id),
  KEY idx_customer_created (customer_id, created_at),
  KEY idx_status_created (status, created_at)
) ENGINE=InnoDB;

Here, the clustered index is on id. All row data is physically ordered by id, not by customer_id or created_at.

3. Secondary indexes: structure and contents

Secondary indexes are also B+Trees, but their leaf pages do not store full rows. Instead, they store:

  • The secondary index key columns.
  • The clustered index key (the primary key columns).

Conceptual view of a secondary index on (customer_id, created_at):

┌─────────────────────────────────────────────┐
│ Secondary index: idx_customer_created      │
├────────────────────────────────────────────┤
│ Leaf entries contain:                      │
│ (customer_id, created_at, id)              │
│                                            │
│ To fetch full row:                         │
│ 1. Locate (customer_id, created_at, id)    │
│ 2. Use id to probe clustered index         │
└─────────────────────────────────────────────┘

This is crucial: every secondary index lookup involves the clustered index if the query needs columns not covered by the secondary index.

4. Step-by-step: how a secondary index lookup works

Consider the query:

SELECT total_cents
FROM orders
WHERE customer_id = 42
  AND created_at >= '2024-01-01'
  AND created_at <  '2024-02-01';

With KEY idx_customer_created (customer_id, created_at), a typical execution path is:

  1. Search the secondary index tree for the first matching key:
    (customer_id = 42, created_at = '2024-01-01', id = ?)
  2. Scan forward through the secondary index leaf pages while:
    • customer_id = 42
    • created_at < '2024-02-01'
  3. For each matching entry, use the stored id to look up the full row in the clustered index.
  4. Read total_cents from the clustered index leaf page.

This two-step process (secondary index, then clustered index) is called a bookmark lookup or row lookup.

4.1 Covering indexes

If the secondary index contains all columns needed for the query, InnoDB can avoid the second step. This is called a covering index.

For example:

KEY idx_customer_created_total (customer_id, created_at, total_cents)

Now the same query can be satisfied entirely from the secondary index leaf pages, which is usually much cheaper.

5. Visualising the lookup path

The following diagram shows how a lookup goes from a secondary index to the clustered index:

┌─────────────────────────────┐
│ Secondary index (cust, ts) │
└─────────────┬──────────────┘
              │ search (42, 2024-01-01, ?)
              ▼
      ┌────────────────┐
      │ Leaf entries   │
      │ (42, ts, id)   │
      └──────┬─────────┘
             │ use id
             ▼
┌─────────────────────────────┐
│   Clustered index (id)     │
│   Full row: (id, ...)      │
└─────────────────────────────┘

The cost of step 2 depends heavily on:

  • How many matching secondary index entries there are.
  • How well the clustered index leaf pages are cached in the buffer pool.
  • How scattered the rows are across pages.

6. Practical indexing patterns

6.1 Choosing a good primary key

Because the primary key defines the physical layout, it affects almost everything:

  • Prefer short, stable, monotonically increasing keys where possible (e.g. BIGINT AUTO_INCREMENT or a time-based key). This reduces page splits and fragmentation.
  • Avoid random primary keys (e.g. UUID v4) on very hot tables unless you understand the performance trade-offs. They cause random inserts and more page splits.
  • Do not use large composite primary keys unless necessary; every secondary index stores the primary key, so big PKs make all secondary indexes bigger.

6.2 Designing secondary indexes for common queries

Start from your most frequent and most expensive queries. For each query:

  1. Identify the filter and join columns in WHERE and JOIN.
  2. Identify the sort columns in ORDER BY and GROUP BY.
  3. Design index key order to support equality first, then ranges, then sort order.
  4. Consider adding selected columns as trailing index columns to make it covering.

Example query pattern:

SELECT id, created_at, total_cents
FROM orders
WHERE customer_id = ?
  AND status = ?
  AND created_at &gt;= ?
ORDER BY created_at DESC
LIMIT 50;

A good index might be:

KEY idx_customer_status_created (
  customer_id,
  status,
  created_at DESC
)

Optionally, to make it covering:

KEY idx_customer_status_created_covering (
  customer_id,
  status,
  created_at DESC,
  total_cents
)

This allows:

  • Equality conditions on customer_id and status.
  • Range and ordering on created_at.
  • Potentially no clustered index lookups if all selected columns are in the index.

6.3 When you see “Using index” vs “Using where”

Run EXPLAIN to inspect plans:

  • Extra = Using index usually means the index is covering (no row lookup needed).
  • Extra = Using where means MySQL still applies a filter after using the index; this is normal.
  • type = ref or range with a suitable key is typically good.

7. Common anti-patterns and how to fix them

7.1 Missing primary key

Without an explicit primary key, InnoDB creates a hidden one. Problems:

  • You cannot use it in queries.
  • Secondary indexes still store it, but you cannot see it.
  • Replication and logical dumps can be less predictable.

Best practice: always define a primary key explicitly.

7.2 Over-indexing

Each secondary index has costs:

  • Extra disk space and buffer pool usage.
  • Extra work on INSERT, UPDATE, and DELETE.

Best practice:

  • Index for real query patterns, not “just in case”.
  • Periodically review unused indexes using performance schema or slow query logs.
  • Remove redundant indexes (e.g. if you have (a, b), a separate index on a is often redundant).

7.3 Ignoring index order

Index column order matters. An index on (a, b) can efficiently support:

  • WHERE a = ?
  • WHERE a = ? AND b = ?
  • WHERE a = ? AND b > ?

But it cannot efficiently support WHERE b = ? alone.

Best practice: put the most selective equality conditions first, then range/sort columns.

8. Observing index behaviour in practice

On a RHEL/Rocky Linux host you can quickly explore how plans change with indexes.

  1. Create a test table:
    CREATE TABLE demo (
      id          BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
      user_id     BIGINT UNSIGNED NOT NULL,
      created_at  DATETIME NOT NULL,
      payload     VARCHAR(255) NOT NULL,
      PRIMARY KEY (id)
    ) ENGINE=InnoDB;
  2. Insert some data (generate with your favourite script or client).
  3. Run a query and explain it:
    EXPLAIN
    SELECT payload
    FROM demo
    WHERE user_id = 10
      AND created_at &gt;= '2024-01-01';
  4. Add an index and re-run:
    ALTER TABLE demo
      ADD KEY idx_user_created (user_id, created_at);
  5. Compare the EXPLAIN output before and after. Look at:
    • key (which index is used)
    • rows (estimated rows examined)
    • Extra (Using index / Using where)

This simple exercise makes the “secondary index then clustered index” pattern visible in a safe environment.

9. Summary and best practices

  • InnoDB stores table data in the clustered index; choose your primary key carefully.
  • Secondary indexes store their key plus the primary key, not full rows.
  • Most secondary index lookups require an extra clustered index probe per matching row.
  • Covering indexes can avoid that extra lookup and significantly reduce I/O.
  • Design indexes around real query patterns and keep them as small and as few as practical.

This article offers general technical guidance. Validate all configurations in a safe environment before applying them to production.

Understanding how InnoDB navigates clustered and secondary indexes turns EXPLAIN output from a mystery into a map. With that mental model you can design indexes deliberately, reduce I/O, and keep critical queries fast as your data grows.

Smart reads for curious minds

We don’t spam! Read more in our privacy policy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *