Contents

Pagination: SQL

This post contains information about pagination using native mechanisms provided by RDBMS
Chosen dialect is PostgreSQL, but following snippets can be easily adopted to other management systems

In this post we’ll rely on a record id field, but won’t use SERIAL auto-incremental integers by 2 reasons:

  • these integers cannot be used for anything other than a pagination in distributed systems
  • native implementations does increment the value on every transaction regardless of the transaction success or and actual data change, so within batch upserts the SERIAL will grow dramatically with a lots of gaps between the neighbours

There’s also some meaningful thing about the ids: the human-readable ids are better choice to use with IaC

For example:

  1. When having some ansible playbook to iterate over API entities:
records_to_update:
  - id: record_a
    name: RecordName
    # ...
    desc: This record has been created using Ansible by id

will be much easier to understand which objects we’re operating with and took O(1) to find the object, while:

records_to_update:
  - id: 019bd109-217a-7ab7-983b-e17f685bfe8c
    name: RecordName
    # ...
    desc: This record has been created using Ansible by id

will provide less context and:

records_to_update:
  - name: RecordName
    # ...
    desc: This record has been created using Ansible by name

may take up to O(n) to find the required record

  1. Or having news site links like:
https://site/news/DD-MM-YYYY-something-happend

instead of:

https://site/news/019bd109-217a-7ab7-983b-e17f685bfe8c

Offset pagination lets you keep the human-readable ids without changing the table scheme
But in our example, we’ll have an additional external_id that can be helpful in this context

Offset pagination

This one is more of a traditional kind pagination and basically requires only:

  • non-unique ordered integer field to iterate over

Model

Let’s first define a database record model:

create table if not exists service_record
(
  id          text   not null primary key,
  tenant_id   text   not null,
  external_id text   not null,
  created_at  bigint not null, -- required for pagination
  constraint service_record_unique_record unique (tenant_id, external_id)
);

Since we’ll anyway be querying using tenant_id, let’s add an index for this example:

create index if not exists service_record_pagination on service_record (tenant_id, id);

Insert some records into the table:

insert into service_record (
  id,
  tenant_id,
  external_id,
  created_at
)
values ('record-1', 'tenant_a', 'external-tenant-a-record-1', 10),
       ('record-2', 'tenant_a', 'external-tenant-a-record-2', 20),
       ('record-3', 'tenant_a', 'external-tenant-a-record-3', 30),
       ('record-4', 'tenant_a', 'external-tenant-a-record-4', 40),
       ('record-5', 'tenant_a', 'external-tenant-a-record-5', 50),
       ('record-6', 'tenant_a', 'external-tenant-a-record-6', 60),
       ('record-7', 'tenant_a', 'external-tenant-a-record-7', 70),
       ('record-8', 'tenant_a', 'external-tenant-a-record-8', 80),
       ('record-9', 'tenant_a', 'external-tenant-a-record-9', 90)
on conflict do nothing;

-- +--------+---------+--------------------------+----------+
-- |id      |tenant_id|external_id               |created_at|
-- +--------+---------+--------------------------+----------+
-- |record-1|tenant_a |external-tenant-a-record-1|10        |
-- |record-2|tenant_a |external-tenant-a-record-2|20        |
-- |record-3|tenant_a |external-tenant-a-record-3|30        |
-- |record-4|tenant_a |external-tenant-a-record-4|40        |
-- |record-5|tenant_a |external-tenant-a-record-5|50        |
-- |record-6|tenant_a |external-tenant-a-record-6|60        |
-- |record-7|tenant_a |external-tenant-a-record-7|70        |
-- |record-8|tenant_a |external-tenant-a-record-8|80        |
-- |record-9|tenant_a |external-tenant-a-record-9|90        |
-- +--------+---------+--------------------------+----------+

Request

Following request parameters:

Parameter Type Description
pageSize integer Number of objects per page
pageNum integer Next page number

are being converted into:

$$ limit = pageSize $$ $$ offset = (pageNum - 1)*pageSize $$

Response

Response field values:

Field Type Description
list list of Records Current page records
totalCount integer Total objects count
totalPages integer Total pages count

are being fetched and calculated from:

$$ list~by~query $$ $$ totalCount~by~count(1)~over()~window~function $$ $$ totalPages = (totalCount + pageSize - 1)/pageSize $$

in the numerator:

  • pageSize is added to avoid rounding totalPages down
  • -1 handling case with pageSize = 1, so totalPages will be equal to totalCount

Query

So there’s always only one query to fetch the required page records:

-- fetching records with pageSize = 2, so expected offsets are 0, 2, 4 ... 8

select id, tenant_id, created_at, count(1) over() as total_count
from service_record
where tenant_id = $1
order by created_at desc 
limit $2 offset $3;

-- $1 = 'tenant_a'
-- $2 = 2
-- $3 = 0

-- pageSize = 2, pageNum = 1 => limit = 2, offset = (1-1)*2 = 0
--
-- +--------+---------+--------------------------+----------+-----------+
-- |id      |tenant_id|external_id               |created_at|total_count|
-- +--------+---------+--------------------------+----------+-----------+
-- |record-9|tenant_a |external-tenant-a-record-9|90        |9          |
-- |record-8|tenant_a |external-tenant-a-record-8|80        |9          |
-- +--------+---------+--------------------------+----------+-----------+


-- -------------------- totalPages = (9+2-1)/2 = 5 ----------------------
-- ----- client knows how many pages exist, sets the next pageNum -------


-- pageSize = 2, pageNum = 2 => limit = 2, offset = (2-1)*2 = 2
--
-- +--------+---------+--------------------------+----------+-----------+
-- |id      |tenant_id|external_id               |created_at|total_count|
-- +--------+---------+--------------------------+----------+-----------+
-- |record-7|tenant_a |external-tenant-a-record-7|70        |9          |
-- |record-6|tenant_a |external-tenant-a-record-6|60        |9          |
-- +--------+---------+--------------------------+----------+-----------+

-- ...

-- pageSize = 2, pageNum = 5 => limit = 2, offset = (5-1)*2 = 8
--
-- +--------+---------+--------------------------+----------+-----------+
-- |id      |tenant_id|external_id               |created_at|total_count|
-- +--------+---------+--------------------------+----------+-----------+
-- |record-1|tenant_a |external-tenant-a-record-1|10        |9          |
-- +--------+---------+--------------------------+----------+-----------+

Cursor-based pagination

This kind of pagination depends on:

  • a chronologically ordered unique sortable id

That can be numeric or alphanumeric depending on the number of records you’re willing to ingest

Since the id colum type size will affect the size of the indices that depend on it, it will also affect:

  • performance of data ingestion
  • performance of data search
  • storage utilization
  • cpu utilization

Model

There is only one basic parameter will be exchanged between the client and the server:

Request

Parameter Type Description
continue base64 string Encoded parameters for current cursor

In our example for every request we’ll be encoding:

  • limit (number of records per page)
  • id (lowest id value currently returned)

For the first request, the string “continue” will be empty, indicating the start cursor position

Response

Response field values:

Field Type Description
list list of Records Current page records
continue base64 string Encoded parameters for the next cursor

Medium ingestion rate

For medium ingestion rate we can use built-in uuid type with uuidv7 that uses 16 bytes per value

Model

Creating the same record model, but now we need to use id column instead of created_at:

create table if not exists service_record
(
  id          uuid   not null primary key default uuidv7(), -- required for pagination
  tenant_id   text   not null,
  external_id text   not null,
  created_at  bigint not null,
  constraint service_record_unique_record unique (tenant_id, external_id)
);

Same index for per-tenant record querying:

create index if not exists service_record_pagination on service_record (tenant_id, id);

Inserting data:

insert into service_record (
  tenant_id,
  external_id,
  created_at
)
values ('tenant_a', 'external-tenant-a-record-1', 10),
       ('tenant_a', 'external-tenant-a-record-2', 20),
       ('tenant_a', 'external-tenant-a-record-3', 30),
       ('tenant_a', 'external-tenant-a-record-4', 40),
       ('tenant_a', 'external-tenant-a-record-5', 50),
       ('tenant_a', 'external-tenant-a-record-6', 60),
       ('tenant_a', 'external-tenant-a-record-7', 70),
       ('tenant_a', 'external-tenant-a-record-8', 80),
       ('tenant_a', 'external-tenant-a-record-9', 90)
on conflict do nothing;

-- +------------------------------------+---------+--------------------------+----------+
-- |id                                  |tenant_id|external_id               |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217a-7ab7-983b-e17f685bfe8c|tenant_a |external-tenant-a-record-1|10        |
-- |019bd109-217b-708a-9ec3-5d48297f2d59|tenant_a |external-tenant-a-record-2|20        |
-- |019bd109-217b-70e9-9773-328be7cdad04|tenant_a |external-tenant-a-record-3|30        |
-- |019bd109-217b-714e-9b01-8d23a6ce744d|tenant_a |external-tenant-a-record-4|40        |
-- |019bd109-217b-71b4-aef8-7eb6c977e7a1|tenant_a |external-tenant-a-record-5|50        |
-- |019bd109-217b-71ef-b232-fdfabbd605aa|tenant_a |external-tenant-a-record-6|60        |
-- |019bd109-217b-722d-8003-2b759b3109ab|tenant_a |external-tenant-a-record-7|70        |
-- |019bd109-217b-7264-a6f7-e7d822664f13|tenant_a |external-tenant-a-record-8|80        |
-- |019bd109-217b-72a9-b14f-26cb50e1592d|tenant_a |external-tenant-a-record-9|90        |
-- +------------------------------------+---------+--------------------------+----------+

Query

There are two cursor-based pagination queries possible:

  1. First pagination query with empty id value:
select id, tenant_id, external_id, created_at from service_record
where tenant_id = $1
order by id desc
limit 2;

-- $1 = 'tenant_a'

-- +------------------------------------+---------+--------------------------+----------+
-- |id                                  |tenant_id|external_id               |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217b-72a9-b14f-26cb50e1592d|tenant_a |external-tenant-a-record-9|90        |
-- |019bd109-217b-7264-a6f7-e7d822664f13|tenant_a |external-tenant-a-record-8|80        |
-- +------------------------------------+---------+--------------------------+----------+

-- ---------------------- client receives next "continue" string ------------------------
-- -------- {"id": "019bd109-217b-7264-a6f7-e7d822664f13", "limit": 2} | base64 ---------
  1. After the first query done, client receives:
  • the first n = limit records
  • new continue string containing last received id
  1. Client sends continue back to the API
  2. API extracts the id field from it and queries records with the lower values
-- fetching records with limit = 2 with ids lower than 019bd109-217b-7264-a6f7-e7d822664f13 ...

select id, tenant_id, external_id, created_at from service_record
where tenant_id = $1 and id < $2
order by id desc
limit 2;

-- $1 = 'tenant_a'
-- $2 = '019bd109-217b-7264-a6f7-e7d822664f13' | lowest id from previous response

-- +------------------------------------+---------+--------------------------+----------+
-- |id                                  |tenant_id|external_id               |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217b-722d-8003-2b759b3109ab|tenant_a |external-tenant-a-record-7|70        |
-- |019bd109-217b-71ef-b232-fdfabbd605aa|tenant_a |external-tenant-a-record-6|60        |
-- +------------------------------------+---------+--------------------------+----------+

-- limit - 2, id = 019bd109-217b-71ef-b232-fdfabbd605aa
--
-- +------------------------------------+---------+--------------------------+----------+
-- |id                                  |tenant_id|external_id               |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217b-722d-8003-2b759b3109ab|tenant_a |external-tenant-a-record-7|70        |
-- |019bd109-217b-71ef-b232-fdfabbd605aa|tenant_a |external-tenant-a-record-6|60        |
-- +------------------------------------+---------+--------------------------+----------+

-- ...

-- limit = 2, id = 019bd109-217b-708a-9ec3-5d48297f2d59
--
-- +------------------------------------+---------+--------------------------+----------+
-- |id                                  |tenant_id|external_id               |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217a-7ab7-983b-e17f685bfe8c|tenant_a |external-tenant-a-record-1|10        |
-- +------------------------------------+---------+--------------------------+----------+

High ingestion rate

For high ingestion rate we can use various implementations of integer-based Twitter’s SnowflakeID
This will allow us to use built-in bigint type that require only 8 bytes per value

Model

Creating table with bigint type for id column:

create table if not exists service_record
(
  id          bigint not null primary key, -- required for pagination
  tenant_id   text   not null,
  external_id text   not null,
  created_at  bigint not null,
  constraint service_record_unique_record unique (tenant_id, external_id)
);

Index for per-tenant queries as well:

create index if not exists service_record_pagination on service_record (tenant_id, id);

Inserting data:

insert into service_record (
  id,
  tenant_id,
  external_id,
  created_at
)
values (2013299597367054336, 'tenant_a', 'external-tenant-a-record-1', 10),
       (2013299597367054337, 'tenant_a', 'external-tenant-a-record-2', 20),
       (2013299597367054338, 'tenant_a', 'external-tenant-a-record-3', 30),
       (2013299597367054339, 'tenant_a', 'external-tenant-a-record-4', 40),
       (2013299597367054340, 'tenant_a', 'external-tenant-a-record-5', 50),
       (2013299597367054341, 'tenant_a', 'external-tenant-a-record-6', 60),
       (2013299597367054342, 'tenant_a', 'external-tenant-a-record-7', 70),
       (2013299597367054343, 'tenant_a', 'external-tenant-a-record-8', 80),
       (2013299597367054344, 'tenant_a', 'external-tenant-a-record-9', 90)
on conflict do nothing;

-- +-------------------+---------+--------------------------+----------+
-- |id                 |tenant_id|external_id               |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054336|tenant_a |external-tenant-a-record-1|10        |
-- |2013299597367054337|tenant_a |external-tenant-a-record-2|20        |
-- |2013299597367054338|tenant_a |external-tenant-a-record-3|30        |
-- |2013299597367054339|tenant_a |external-tenant-a-record-4|40        |
-- |2013299597367054340|tenant_a |external-tenant-a-record-5|50        |
-- |2013299597367054341|tenant_a |external-tenant-a-record-6|60        |
-- |2013299597367054342|tenant_a |external-tenant-a-record-7|70        |
-- |2013299597367054343|tenant_a |external-tenant-a-record-8|80        |
-- |2013299597367054344|tenant_a |external-tenant-a-record-9|90        |
-- +-------------------+---------+--------------------------+----------+

Query

Here everything’s the same as with uuidv7:

  • querying first n = limit records
  • evaluating the lowest id value
select id, tenant_id, external_id, created_at from service_record
where tenant_id = $1
order by id desc
limit 2;

-- $1 = 'tenant_a'

-- +-------------------+---------+--------------------------+----------+
-- |id                 |tenant_id|external_id               |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054344|tenant_a |external-tenant-a-record-9|90        |
-- |2013299597367054343|tenant_a |external-tenant-a-record-8|80        |
-- +-------------------+---------+--------------------------+----------+

-- -------------- client receives next "continue" string ---------------
-- --------- {"id": 2013299597367054343, "limit": 2} | base64 ----------
  • querying next n record values with id lower than previous one
-- fetching records with limit = 2 with ids lower than 2013299597367054343 ...

select id, tenant_id, external_id, created_at from service_record
where tenant_id = $1 and id < $2
order by id desc
limit 2;

-- $1 = 'tenant_a'
-- $2 = 2013299597367054343 | lowest id from previous response

-- +-------------------+---------+--------------------------+----------+
-- |id                 |tenant_id|external_id               |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054342|tenant_a |external-tenant-a-record-7|70        |
-- |2013299597367054341|tenant_a |external-tenant-a-record-6|60        |
-- +-------------------+---------+--------------------------+----------+

-- limit = 2, id = 2013299597367054341
--
-- +-------------------+---------+--------------------------+----------+
-- |id                 |tenant_id|external_id               |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054340|tenant_a |external-tenant-a-record-5|50        |
-- |2013299597367054339|tenant_a |external-tenant-a-record-4|40        |
-- +-------------------+---------+--------------------------+----------+

-- ...

-- limit = 2, id = 2013299597367054337
--
-- +-------------------+---------+--------------------------+----------+
-- |id                 |tenant_id|external_id               |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054336|tenant_a |external-tenant-a-record-1|10        |
-- +-------------------+---------+--------------------------+----------+