Diagram illustrating the structure and relationship between InnoDB clustered indexes and secondary indexes in MySQL.

Understanding InnoDB Clustered Indexes and Secondary Indexes

Most MySQL performance problems come down to how InnoDB uses indexes. Understanding the difference between clustered and secondary indexes is essential for designing schemas, tuning queries, and explaining slow behaviour.

This article walks through how InnoDB stores data, how different index types behave, and what that means for practical query design.

1. InnoDB index basics

InnoDB stores all indexes as B+Trees. Each index is ordered by its key columns and split into pages (typically 16 KB). Pages form a tree structure:

┌───────────┐      ┌───────────┐
│   Root    │─────▶│  Internal │
└────┬──────┘      └────┬──────┘
     │                 │
     ▼                 ▼
  ┌───────┐        ┌───────┐
  │ Leaf  │        │ Leaf  │
  └───────┘        └───────┘

Only leaf pages contain row data or pointers to row data. Everything else is navigation.

InnoDB has two main index types:

  • Clustered index: the primary index that stores the actual row data in its leaf pages.
  • Secondary indexes: additional indexes whose leaf pages store key values plus a pointer to the clustered index.

2. What is the clustered index?

The clustered index is how InnoDB physically organises table data. There is exactly one clustered index per table:

  • If you define a PRIMARY KEY, that becomes the clustered index.
  • If not, InnoDB picks the first unique, non-null index.
  • If none exists, InnoDB creates a hidden 6-byte row ID and clusters on that.

Example table:

CREATE TABLE orders (
  id           BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  customer_id  BIGINT UNSIGNED NOT NULL,
  status       VARCHAR(16) NOT NULL,
  created_at   DATETIME NOT NULL,
  total_cents  INT NOT NULL,
  PRIMARY KEY (id),
  KEY idx_customer_created (customer_id, created_at)
) ENGINE=InnoDB;

Here, PRIMARY KEY (id) is the clustered index. Its tree is ordered by id, and each leaf page holds full rows:

Clustered index (PRIMARY): ordered by id

Leaf record layout (conceptual):
[id] [customer_id] [status] [created_at] [total_cents]

Looking up a row by primary key uses just this tree: one index lookup, then the row is already in the leaf page.

3. Secondary indexes: how they point to rows

A secondary index is any non-clustered index. InnoDB stores:

  • The secondary index key columns.
  • Plus the clustered index key as a hidden suffix.

For the orders table, idx_customer_created is a secondary index. The leaf records conceptually look like:

Secondary index idx_customer_created: ordered by (customer_id, created_at, id)

Leaf record layout (conceptual):
[customer_id] [created_at] [id]

The id at the end is the pointer back to the clustered index. InnoDB uses it to find the full row.

This means a query using a secondary index often needs two lookups:

  1. Search the secondary index tree to find matching keys and their PRIMARY KEY values.
  2. For each matching row, look up the full row in the clustered index using the primary key (a "bookmark lookup").

This has big implications for performance.

4. Visualising a secondary index lookup

Consider:

SELECT status, created_at
FROM orders
WHERE customer_id = 42
ORDER BY created_at
LIMIT 10;

InnoDB can use idx_customer_created:

┌─────────────────────────────┐
│ Secondary index (customer) │
└────────────┬────────────────┘
             │ search (customer_id = 42)
             ▼
       ┌───────────┐    ┌───────────┐
       │ Leaf page │    │ Leaf page │
       └────┬──────┘    └────┬──────┘
            │               │
            ▼               ▼
      [42, t1, id=100] [42, t2, id=105] ...

Then for each id:

┌─────────────────────────────┐
│ Clustered index (PRIMARY)  │
└────────────┬────────────────┘
             │ search by id
             ▼
       [id=100, full row]
       [id=105, full row]

If the query only needs columns contained in the secondary index, InnoDB can skip the second step. This is called a covering index.

5. Covering indexes and when to use them

A covering index is a secondary index that contains all columns needed for the query (in the index itself), so InnoDB does not need to read the clustered index.

For example, if you frequently run:

SELECT created_at, total_cents
FROM orders
WHERE customer_id = ?
  AND created_at >= ?
  AND created_at < ?
ORDER BY created_at
LIMIT 50;

You could define:

CREATE INDEX idx_customer_created_total
  ON orders (customer_id, created_at, total_cents);

Now the index leaf pages contain:

[customer_id] [created_at] [total_cents] [id]

The query can be served entirely from the secondary index, avoiding extra reads from the clustered index. This improves performance, especially when many rows match the WHERE clause.

Best practices for covering indexes:

  • Use them for high-volume, latency-sensitive queries.
  • Keep them narrow; every extra column increases index size and write cost.
  • Avoid duplicating large text/blob columns; they are better left in the clustered index only.

6. Choosing a good primary key

Because the clustered index defines the physical order of rows and is appended to every secondary index, the primary key design matters a lot.

6.1 Desirable properties

  • Stable: primary key values should never change. Changing them means moving rows in the clustered index and updating all secondary indexes.
  • Short: shorter keys make all indexes smaller and faster.
  • Monotonically increasing (for write-heavy workloads): reduces page splits and fragmentation, because inserts go to the end of the index.

6.2 Common patterns

  • Surrogate integer primary keys (e.g. BIGINT AUTO_INCREMENT) are usually a good default.
  • Avoid very wide composite primary keys if you have many secondary indexes.
  • Be careful with random keys (e.g. UUIDs) as primary keys; they cause random insert patterns and larger indexes.

Example pattern:

CREATE TABLE users (
  id         BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  uuid       BINARY(16) NOT NULL,
  email      VARCHAR(255) NOT NULL,
  created_at DATETIME NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY uk_users_uuid (uuid),
  UNIQUE KEY uk_users_email (email)
) ENGINE=InnoDB;

Here, the clustered index is short and insert-friendly (id), while stable external identifiers (uuid, email) are secondary indexes.

7. How indexes affect writes

Every INSERT, UPDATE, or DELETE must maintain:

  • The clustered index.
  • All secondary indexes that reference changed columns.

Implications:

  • More indexes = more work per write.
  • Wide primary keys = more data to store in every secondary index.
  • Random primary keys = more page splits and fragmentation.

When designing indexes, balance read performance against write cost. For OLTP systems, only keep indexes that are actively used by important queries.

8. Inspecting index usage

To see indexes and their column order:

SHOW INDEX FROM orders\G

To check how MySQL plans to use indexes for a query:

EXPLAIN SELECT created_at, total_cents
FROM orders
WHERE customer_id = 42
ORDER BY created_at
LIMIT 10;\G

Key fields to look at:

  • key: which index is chosen.
  • rows: estimated number of rows to examine.
  • Extra: look for "Using index" (covering index) or "Using where; Using index".

For production systems, also enable and inspect the performance schema and sys schema views (where available) to see index usage over time, not just for a single EXPLAIN.

9. Practical index design workflow

  1. Start from queries: list your most important SELECTs and UPDATEs.
  2. Design a good primary key:
    • Prefer a short, auto-incrementing integer.
    • Avoid changing primary key values.
  3. Add minimal secondary indexes:
    • Index columns used in WHERE and JOIN conditions.
    • Align index order with your most common predicates and ORDER BYs.
  4. Use covering indexes selectively:
    • For hot, latency-critical queries.
    • Keep them as narrow as possible.
  5. Review periodically:
    • Drop unused or low-value indexes.
    • Re-test with EXPLAIN after schema or workload changes.

10. Safe index changes in production

Changing indexes on large tables can be disruptive. On RHEL/Rocky Linux, plan carefully:

  1. Measure table size:
    SELECT table_schema, table_name,
           data_length, index_length
    FROM information_schema.tables
    WHERE table_schema = 'app_db'
      AND table_name = 'orders';
  2. Estimate impact window: large tables may take minutes or hours to rebuild indexes.
  3. Use online DDL where supported:
    ALTER TABLE orders
      ADD INDEX idx_customer_created_total
      (customer_id, created_at, total_cents),
      ALGORITHM=INPLACE,
      LOCK=NONE;

    Check your MySQL version's documentation for exact online DDL capabilities and limitations.

  4. Monitor during change: watch replication lag, InnoDB row lock waits, and I/O.
  5. Have a rollback plan: be ready to revert schema changes or fail over if needed.

This article offers general technical guidance. Validate all configurations in a safe environment before applying them to production.

Smart reads for curious minds

We don’t spam! Read more in our privacy policy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *