Pagination: SQL
This post contains information about pagination using native mechanisms provided by RDBMS
Chosen dialect is PostgreSQL, but following snippets can be easily adopted to other management systems
In this post we’ll rely on a record id field, but won’t use SERIAL auto-incremental integers by 2 reasons:
- these integers cannot be used for anything other than a pagination in distributed systems
- native implementations does increment the value on every transaction regardless of the transaction success or and actual data change, so within batch upserts the SERIAL will grow dramatically with a lots of gaps between the neighbours
There’s also some meaningful thing about the ids: the human-readable ids are better choice to use with IaC
For example:
- When having some ansible playbook to iterate over API entities:
records_to_update:
- id: record_a
name: RecordName
# ...
desc: This record has been created using Ansible by id
will be much easier to understand which objects we’re operating with and took O(1) to find the object, while:
records_to_update:
- id: 019bd109-217a-7ab7-983b-e17f685bfe8c
name: RecordName
# ...
desc: This record has been created using Ansible by id
will provide less context and:
records_to_update:
- name: RecordName
# ...
desc: This record has been created using Ansible by name
may take up to O(n) to find the required record
- Or having news site links like:
https://site/news/DD-MM-YYYY-something-happend
instead of:
https://site/news/019bd109-217a-7ab7-983b-e17f685bfe8c
Offset pagination lets you keep the human-readable ids without changing the table scheme
But in our example, we’ll have an additional external_id that can be helpful in this context
Offset pagination
This one is more of a traditional kind pagination and basically requires only:
- non-unique ordered integer field to iterate over
Model
Let’s first define a database record model:
create table if not exists service_record
(
id text not null primary key,
tenant_id text not null,
external_id text not null,
created_at bigint not null, -- required for pagination
constraint service_record_unique_record unique (tenant_id, external_id)
);
Since we’ll anyway be querying using tenant_id, let’s add an index for this example:
create index if not exists service_record_pagination on service_record (tenant_id, id);
Insert some records into the table:
insert into service_record (
id,
tenant_id,
external_id,
created_at
)
values ('record-1', 'tenant_a', 'external-tenant-a-record-1', 10),
('record-2', 'tenant_a', 'external-tenant-a-record-2', 20),
('record-3', 'tenant_a', 'external-tenant-a-record-3', 30),
('record-4', 'tenant_a', 'external-tenant-a-record-4', 40),
('record-5', 'tenant_a', 'external-tenant-a-record-5', 50),
('record-6', 'tenant_a', 'external-tenant-a-record-6', 60),
('record-7', 'tenant_a', 'external-tenant-a-record-7', 70),
('record-8', 'tenant_a', 'external-tenant-a-record-8', 80),
('record-9', 'tenant_a', 'external-tenant-a-record-9', 90)
on conflict do nothing;
-- +--------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +--------+---------+--------------------------+----------+
-- |record-1|tenant_a |external-tenant-a-record-1|10 |
-- |record-2|tenant_a |external-tenant-a-record-2|20 |
-- |record-3|tenant_a |external-tenant-a-record-3|30 |
-- |record-4|tenant_a |external-tenant-a-record-4|40 |
-- |record-5|tenant_a |external-tenant-a-record-5|50 |
-- |record-6|tenant_a |external-tenant-a-record-6|60 |
-- |record-7|tenant_a |external-tenant-a-record-7|70 |
-- |record-8|tenant_a |external-tenant-a-record-8|80 |
-- |record-9|tenant_a |external-tenant-a-record-9|90 |
-- +--------+---------+--------------------------+----------+
Request
Following request parameters:
| Parameter | Type | Description |
|---|---|---|
| pageSize | integer | Number of objects per page |
| pageNum | integer | Next page number |
are being converted into:
$$ limit = pageSize $$ $$ offset = (pageNum - 1)*pageSize $$
Response
Response field values:
| Field | Type | Description |
|---|---|---|
| list | list of Records | Current page records |
| totalCount | integer | Total objects count |
| totalPages | integer | Total pages count |
are being fetched and calculated from:
$$ list~by~query $$ $$ totalCount~by~count(1)~over()~window~function $$ $$ totalPages = (totalCount + pageSize - 1)/pageSize $$
in the numerator:
- pageSize is added to avoid rounding totalPages down
- -1 handling case with pageSize = 1, so totalPages will be equal to totalCount
Query
So there’s always only one query to fetch the required page records:
-- fetching records with pageSize = 2, so expected offsets are 0, 2, 4 ... 8
select id, tenant_id, created_at, count(1) over() as total_count
from service_record
where tenant_id = $1
order by created_at desc
limit $2 offset $3;
-- $1 = 'tenant_a'
-- $2 = 2
-- $3 = 0
-- pageSize = 2, pageNum = 1 => limit = 2, offset = (1-1)*2 = 0
--
-- +--------+---------+--------------------------+----------+-----------+
-- |id |tenant_id|external_id |created_at|total_count|
-- +--------+---------+--------------------------+----------+-----------+
-- |record-9|tenant_a |external-tenant-a-record-9|90 |9 |
-- |record-8|tenant_a |external-tenant-a-record-8|80 |9 |
-- +--------+---------+--------------------------+----------+-----------+
-- -------------------- totalPages = (9+2-1)/2 = 5 ----------------------
-- ----- client knows how many pages exist, sets the next pageNum -------
-- pageSize = 2, pageNum = 2 => limit = 2, offset = (2-1)*2 = 2
--
-- +--------+---------+--------------------------+----------+-----------+
-- |id |tenant_id|external_id |created_at|total_count|
-- +--------+---------+--------------------------+----------+-----------+
-- |record-7|tenant_a |external-tenant-a-record-7|70 |9 |
-- |record-6|tenant_a |external-tenant-a-record-6|60 |9 |
-- +--------+---------+--------------------------+----------+-----------+
-- ...
-- pageSize = 2, pageNum = 5 => limit = 2, offset = (5-1)*2 = 8
--
-- +--------+---------+--------------------------+----------+-----------+
-- |id |tenant_id|external_id |created_at|total_count|
-- +--------+---------+--------------------------+----------+-----------+
-- |record-1|tenant_a |external-tenant-a-record-1|10 |9 |
-- +--------+---------+--------------------------+----------+-----------+
Cursor-based pagination
This kind of pagination depends on:
- a chronologically ordered unique sortable id
That can be numeric or alphanumeric depending on the number of records you’re willing to ingest
Since the id colum type size will affect the size of the indices that depend on it, it will also affect:
- performance of data ingestion
- performance of data search
- storage utilization
- cpu utilization
Model
There is only one basic parameter will be exchanged between the client and the server:
Request
| Parameter | Type | Description |
|---|---|---|
| continue | base64 string | Encoded parameters for current cursor |
In our example for every request we’ll be encoding:
- limit (number of records per page)
- id (lowest id value currently returned)
For the first request, the string “continue” will be empty, indicating the start cursor position
Response
Response field values:
| Field | Type | Description |
|---|---|---|
| list | list of Records | Current page records |
| continue | base64 string | Encoded parameters for the next cursor |
Medium ingestion rate
For medium ingestion rate we can use built-in uuid type with uuidv7 that uses 16 bytes per value
Model
Creating the same record model, but now we need to use id column instead of created_at:
create table if not exists service_record
(
id uuid not null primary key default uuidv7(), -- required for pagination
tenant_id text not null,
external_id text not null,
created_at bigint not null,
constraint service_record_unique_record unique (tenant_id, external_id)
);
Same index for per-tenant record querying:
create index if not exists service_record_pagination on service_record (tenant_id, id);
Inserting data:
insert into service_record (
tenant_id,
external_id,
created_at
)
values ('tenant_a', 'external-tenant-a-record-1', 10),
('tenant_a', 'external-tenant-a-record-2', 20),
('tenant_a', 'external-tenant-a-record-3', 30),
('tenant_a', 'external-tenant-a-record-4', 40),
('tenant_a', 'external-tenant-a-record-5', 50),
('tenant_a', 'external-tenant-a-record-6', 60),
('tenant_a', 'external-tenant-a-record-7', 70),
('tenant_a', 'external-tenant-a-record-8', 80),
('tenant_a', 'external-tenant-a-record-9', 90)
on conflict do nothing;
-- +------------------------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217a-7ab7-983b-e17f685bfe8c|tenant_a |external-tenant-a-record-1|10 |
-- |019bd109-217b-708a-9ec3-5d48297f2d59|tenant_a |external-tenant-a-record-2|20 |
-- |019bd109-217b-70e9-9773-328be7cdad04|tenant_a |external-tenant-a-record-3|30 |
-- |019bd109-217b-714e-9b01-8d23a6ce744d|tenant_a |external-tenant-a-record-4|40 |
-- |019bd109-217b-71b4-aef8-7eb6c977e7a1|tenant_a |external-tenant-a-record-5|50 |
-- |019bd109-217b-71ef-b232-fdfabbd605aa|tenant_a |external-tenant-a-record-6|60 |
-- |019bd109-217b-722d-8003-2b759b3109ab|tenant_a |external-tenant-a-record-7|70 |
-- |019bd109-217b-7264-a6f7-e7d822664f13|tenant_a |external-tenant-a-record-8|80 |
-- |019bd109-217b-72a9-b14f-26cb50e1592d|tenant_a |external-tenant-a-record-9|90 |
-- +------------------------------------+---------+--------------------------+----------+
Query
There are two cursor-based pagination queries possible:
- First pagination query with empty
idvalue:
select id, tenant_id, external_id, created_at from service_record
where tenant_id = $1
order by id desc
limit 2;
-- $1 = 'tenant_a'
-- +------------------------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217b-72a9-b14f-26cb50e1592d|tenant_a |external-tenant-a-record-9|90 |
-- |019bd109-217b-7264-a6f7-e7d822664f13|tenant_a |external-tenant-a-record-8|80 |
-- +------------------------------------+---------+--------------------------+----------+
-- ---------------------- client receives next "continue" string ------------------------
-- -------- {"id": "019bd109-217b-7264-a6f7-e7d822664f13", "limit": 2} | base64 ---------
- After the first query done, client receives:
- the first n = limit records
- new
continuestring containing last received id
- Client sends
continueback to the API - API extracts the
idfield from it and queries records with the lower values
-- fetching records with limit = 2 with ids lower than 019bd109-217b-7264-a6f7-e7d822664f13 ...
select id, tenant_id, external_id, created_at from service_record
where tenant_id = $1 and id < $2
order by id desc
limit 2;
-- $1 = 'tenant_a'
-- $2 = '019bd109-217b-7264-a6f7-e7d822664f13' | lowest id from previous response
-- +------------------------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217b-722d-8003-2b759b3109ab|tenant_a |external-tenant-a-record-7|70 |
-- |019bd109-217b-71ef-b232-fdfabbd605aa|tenant_a |external-tenant-a-record-6|60 |
-- +------------------------------------+---------+--------------------------+----------+
-- limit - 2, id = 019bd109-217b-71ef-b232-fdfabbd605aa
--
-- +------------------------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217b-722d-8003-2b759b3109ab|tenant_a |external-tenant-a-record-7|70 |
-- |019bd109-217b-71ef-b232-fdfabbd605aa|tenant_a |external-tenant-a-record-6|60 |
-- +------------------------------------+---------+--------------------------+----------+
-- ...
-- limit = 2, id = 019bd109-217b-708a-9ec3-5d48297f2d59
--
-- +------------------------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +------------------------------------+---------+--------------------------+----------+
-- |019bd109-217a-7ab7-983b-e17f685bfe8c|tenant_a |external-tenant-a-record-1|10 |
-- +------------------------------------+---------+--------------------------+----------+
High ingestion rate
For high ingestion rate we can use various implementations of integer-based Twitter’s SnowflakeID
This will allow us to use built-in bigint type that require only 8 bytes per value
Model
Creating table with bigint type for id column:
create table if not exists service_record
(
id bigint not null primary key, -- required for pagination
tenant_id text not null,
external_id text not null,
created_at bigint not null,
constraint service_record_unique_record unique (tenant_id, external_id)
);
Index for per-tenant queries as well:
create index if not exists service_record_pagination on service_record (tenant_id, id);
Inserting data:
insert into service_record (
id,
tenant_id,
external_id,
created_at
)
values (2013299597367054336, 'tenant_a', 'external-tenant-a-record-1', 10),
(2013299597367054337, 'tenant_a', 'external-tenant-a-record-2', 20),
(2013299597367054338, 'tenant_a', 'external-tenant-a-record-3', 30),
(2013299597367054339, 'tenant_a', 'external-tenant-a-record-4', 40),
(2013299597367054340, 'tenant_a', 'external-tenant-a-record-5', 50),
(2013299597367054341, 'tenant_a', 'external-tenant-a-record-6', 60),
(2013299597367054342, 'tenant_a', 'external-tenant-a-record-7', 70),
(2013299597367054343, 'tenant_a', 'external-tenant-a-record-8', 80),
(2013299597367054344, 'tenant_a', 'external-tenant-a-record-9', 90)
on conflict do nothing;
-- +-------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054336|tenant_a |external-tenant-a-record-1|10 |
-- |2013299597367054337|tenant_a |external-tenant-a-record-2|20 |
-- |2013299597367054338|tenant_a |external-tenant-a-record-3|30 |
-- |2013299597367054339|tenant_a |external-tenant-a-record-4|40 |
-- |2013299597367054340|tenant_a |external-tenant-a-record-5|50 |
-- |2013299597367054341|tenant_a |external-tenant-a-record-6|60 |
-- |2013299597367054342|tenant_a |external-tenant-a-record-7|70 |
-- |2013299597367054343|tenant_a |external-tenant-a-record-8|80 |
-- |2013299597367054344|tenant_a |external-tenant-a-record-9|90 |
-- +-------------------+---------+--------------------------+----------+
Query
Here everything’s the same as with uuidv7:
- querying first n = limit records
- evaluating the lowest id value
select id, tenant_id, external_id, created_at from service_record
where tenant_id = $1
order by id desc
limit 2;
-- $1 = 'tenant_a'
-- +-------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054344|tenant_a |external-tenant-a-record-9|90 |
-- |2013299597367054343|tenant_a |external-tenant-a-record-8|80 |
-- +-------------------+---------+--------------------------+----------+
-- -------------- client receives next "continue" string ---------------
-- --------- {"id": 2013299597367054343, "limit": 2} | base64 ----------
- querying next n record values with id lower than previous one
-- fetching records with limit = 2 with ids lower than 2013299597367054343 ...
select id, tenant_id, external_id, created_at from service_record
where tenant_id = $1 and id < $2
order by id desc
limit 2;
-- $1 = 'tenant_a'
-- $2 = 2013299597367054343 | lowest id from previous response
-- +-------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054342|tenant_a |external-tenant-a-record-7|70 |
-- |2013299597367054341|tenant_a |external-tenant-a-record-6|60 |
-- +-------------------+---------+--------------------------+----------+
-- limit = 2, id = 2013299597367054341
--
-- +-------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054340|tenant_a |external-tenant-a-record-5|50 |
-- |2013299597367054339|tenant_a |external-tenant-a-record-4|40 |
-- +-------------------+---------+--------------------------+----------+
-- ...
-- limit = 2, id = 2013299597367054337
--
-- +-------------------+---------+--------------------------+----------+
-- |id |tenant_id|external_id |created_at|
-- +-------------------+---------+--------------------------+----------+
-- |2013299597367054336|tenant_a |external-tenant-a-record-1|10 |
-- +-------------------+---------+--------------------------+----------+