Diagram illustrating flow control settings in MySQL Group Replication setup.

How to Enable and Tune Flow Control in Group Replication

Choosing a primary key in InnoDB is not just a modelling decision. It directly controls the clustered index layout, affects every secondary index, and can make the difference between smooth performance and constant I/O pressure.

This article explains how InnoDB clustered indexes work, how primary key design interacts with storage and queries, and gives concrete, step-by-step guidance for designing keys in real systems.

How InnoDB stores rows: the clustered index

InnoDB stores table data in a B+Tree called the clustered index. The leaf pages of this tree contain the full row. The tree is ordered by the primary key (PK).

Conceptually:

┌───────────────────────────┐
│ InnoDB clustered index    │
├───────────────┬───────────┤
│ Non-leaf page │  PK only  │
├───────────────┴───────────┤
│ Leaf pages: full rows     │
│                           │
│  PK → [all columns...]    │
└───────────────────────────┘

Key consequences:

  • The physical order of rows on disk is the PK order.
  • Range scans on the PK are very efficient (sequential I/O, good cache locality).
  • Page splits and fragmentation depend heavily on how new PK values arrive.

What happens if you do not define a primary key?

InnoDB always needs a clustered index. If you do not define a PK, InnoDB chooses one:

  • If there is a UNIQUE NOT NULL index, it becomes the clustered index.
  • Otherwise, InnoDB creates a hidden 6-byte row id and uses that as the clustered index.

Hidden primary keys cause several problems:

  • You cannot see or query the hidden key, but it takes space in every secondary index.
  • Replication and debugging become harder because there is no stable, visible identifier.
  • Application code may accidentally rely on non-deterministic row ordering.

Best practice: always define an explicit, stable primary key.

Secondary indexes and the primary key

InnoDB secondary indexes are also B+Trees, but their leaf pages do not store the full row. Instead they store:

  • Secondary index key columns.
  • The primary key value of the row.

Diagram:

┌───────────────────────────┐
│ Secondary index on email │
├───────────────────────────┤
│ Leaf:                    │
│  email, PK  → lookup row │
└───────────────────────────┘

┌───────────────────────────┐
│ Clustered index (PK)     │
├───────────────────────────┤
│ Leaf:                    │
│  PK, col1, col2, ...     │
└───────────────────────────┘

Lookup path for a query using a secondary index:

  1. Search secondary index by its key.
  2. Read the PK from the secondary index leaf.
  3. Use the PK to search the clustered index and fetch the full row.

Implications:

  • The wider your PK, the larger all secondary indexes become.
  • The more secondary indexes you have, the more often the PK is duplicated.
  • Changing the PK requires rebuilding all secondary indexes.

Step-by-step: choosing a good primary key

Step 1: Decide between natural and surrogate keys

Natural key: a column (or set) that already exists in the data model (e.g. email, country_code, isbn).

Surrogate key: an artificial identifier, typically an AUTO_INCREMENT integer or a generated UUID.

Evaluate:

  • Stability – primary keys should almost never change.
  • Width – prefer small, fixed-width types (4 or 8 bytes) for the PK.
  • Business rules – natural keys can be constrained, but may need to change when rules change.

For most OLTP tables, a surrogate integer PK is the most practical choice:

CREATE TABLE user_account (
  id          BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  email       VARCHAR(255) NOT NULL,
  created_at  DATETIME NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY uq_user_email (email)
) ENGINE=InnoDB;

This keeps the clustered index narrow and stable, while enforcing business uniqueness via a separate unique index.

Step 2: Consider insert patterns and page splits

InnoDB B+Trees work best when inserts are mostly append-only at the right-hand side of the tree (increasing PK values). This minimises page splits and fragmentation.

Compare these PK patterns:

  • Monotonically increasing integer (e.g. AUTO_INCREMENT):
    • New rows land at the end of the clustered index.
    • Good for write throughput and buffer pool locality.
  • Random UUID (e.g. UUID()):
    • New rows are inserted into random pages across the tree.
    • Frequent page splits, higher fragmentation, more I/O.
  • Business key with no order (e.g. hashed value):
    • Similar behaviour to random UUID.

If you must use UUIDs (for example, for cross-system uniqueness), consider:

  • Storing them in a compact binary format.
  • Using an ordered UUID variant (time-based) to reduce randomness.
  • Still having a separate integer PK for clustering, and storing the UUID as a unique secondary index.

Step 3: Minimise primary key width

Every secondary index entry stores the PK. A wide PK multiplies storage and memory use.

Example comparison:

  • BIGINT UNSIGNED: 8 bytes.
  • CHAR(36) UUID string: 36 bytes (plus overhead, collation, alignment).
  • Composite PK on three columns: sum of all column sizes plus overhead.

Example of an overly wide PK:

CREATE TABLE order_line (
  order_id   CHAR(36) NOT NULL,
  line_no    INT NOT NULL,
  PRIMARY KEY (order_id, line_no),
  ...
) ENGINE=InnoDB;

Better approach:

CREATE TABLE customer_order (
  id        BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  order_uid CHAR(36) NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY uq_order_uid (order_uid)
) ENGINE=InnoDB;

CREATE TABLE order_line (
  id          BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  order_id    BIGINT UNSIGNED NOT NULL,
  line_no     INT NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY uq_order_line (order_id, line_no),
  CONSTRAINT fk_order_line_order
    FOREIGN KEY (order_id) REFERENCES customer_order(id)
) ENGINE=InnoDB;

This keeps clustered indexes narrow and still enforces the business rules.

Step 4: Avoid mutable primary keys

Changing a PK value is expensive in InnoDB because:

  • The row must move to a different place in the clustered index tree.
  • All secondary index entries must be updated with the new PK value.

For any column that might change (e.g. email, username, phone_number):

  • Do not use it as a PK.
  • Use a surrogate PK and a UNIQUE index instead.

Step 5: Align primary keys with common access patterns

While the PK is primarily for clustering, it also affects how queries perform. Consider your most frequent queries:

  • OLTP workloads usually access rows by a single identifier – a surrogate integer PK works well.
  • Reporting queries often scan ranges by time – consider adding a secondary index on created_at (or composite keys including it).
  • Multi-tenant schemas may benefit from composite indexes including tenant id and PK.

Example for a multi-tenant table:

CREATE TABLE account_event (
  id          BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  tenant_id   BIGINT UNSIGNED NOT NULL,
  created_at  DATETIME NOT NULL,
  event_type  VARCHAR(64) NOT NULL,
  payload     JSON NOT NULL,
  PRIMARY KEY (id),
  KEY idx_tenant_created (tenant_id, created_at)
) ENGINE=InnoDB;

This keeps the clustered index simple while still supporting efficient tenant-based range scans.

Practical checks and maintenance

Check primary key definitions

To list PKs for tables in a schema:

SELECT
  TABLE_NAME,
  COLUMN_NAME,
  SEQ_IN_INDEX
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = 'your_database'
  AND INDEX_NAME = 'PRIMARY'
ORDER BY TABLE_NAME, SEQ_IN_INDEX;

Look for:

  • Tables without a PK (no rows returned for that table).
  • Composite PKs with many or wide columns.
  • Unexpected PKs chosen by InnoDB (e.g. a natural unique key instead of a surrogate key).

Adding a primary key to an existing table

Warning: changing primary keys or adding them to large tables can be expensive and may lock the table depending on MySQL version and DDL settings. Test on a copy first and plan maintenance windows.

Example: table without PK, add surrogate PK:

ALTER TABLE legacy_data
  ADD COLUMN id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;

On RHEL/Rocky Linux, if you need to assess impact first, you can copy the structure and test:

CREATE TABLE legacy_data_test LIKE legacy_data;
ALTER TABLE legacy_data_test
  ADD COLUMN id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;

Measure the time and I/O impact on the test copy before scheduling on production.

Monitoring index size and impact

To see how PK width affects index sizes:

SELECT
  TABLE_NAME,
  INDEX_NAME,
  SUM(PAGES) AS pages
FROM INFORMATION_SCHEMA.INNODB_INDEX_STATS
WHERE DATABASE_NAME = 'your_database'
GROUP BY TABLE_NAME, INDEX_NAME
ORDER BY pages DESC;

Large secondary indexes dominated by PK width are good candidates for PK redesign in future schema versions.

Common anti-patterns and how to fix them

Anti-pattern 1: UUID string primary keys everywhere

Problem:

  • Random insert pattern causes fragmentation.
  • Wide PK inflates all secondary indexes.
  • String comparison is slower than integer comparison.

Mitigation:

  • Introduce a surrogate integer PK.
  • Keep UUID as UNIQUE secondary index.
  • Gradually migrate application code to use the new PK internally.

Anti-pattern 2: Composite PK mirroring a unique business key

Example:

PRIMARY KEY (tenant_id, email)

Issues:

  • PK is wider than necessary.
  • All secondary indexes carry both columns.
  • Changing email becomes very expensive.

Fix:

  • Use surrogate PK.
  • Keep UNIQUE KEY (tenant_id, email) for business rules.

Summary and practical recommendations

InnoDB’s clustered index design means primary keys are not just identifiers; they are the backbone of storage layout and index design. For most OLTP workloads, the following rules work well:

  • Always define an explicit primary key.
  • Prefer narrow, monotonically increasing surrogate integer PKs.
  • Enforce business uniqueness with separate UNIQUE indexes.
  • Avoid using mutable or wide columns as PKs.
  • Be cautious when changing PKs on large tables; always test first.

This article offers general technical guidance. Validate all configurations in a safe environment before applying them to production.

Smart reads for curious minds

We don’t spam! Read more in our privacy policy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *