Improve Paging Performance with Partitioned Cursor Paging
Overview
Cursor paging is a high-performance paging strategy optimized for large
datasets, offering a significant advantage over traditional limit/offset
pagination by reducing the impact of page depth on request performance.
Internally implemented using seek-based paging, API cursor paging ensures
efficient retrieval of resources by avoiding redundant offset computations for
each request and leveraging indexed keys for navigation.
Key benefits:
- Improved Performance - Minimizes database load when processing resources with a large number of items.
- Scalability - Enables parallel processing through partitioning, significantly speeding up data ingestion.
Default Paging Behavior
By default, the API supports traditional limit/offset
pagination. Clients
specify the number of items to retrieve using the limit
parameter and navigate
to subsequent pages using offset
.
GET /ed-fi/students?limit=500&offset=1500
Cursor Paging: The pageToken
and pageSize
Parameters
For improved performance, the API now supports cursor paging. After retrieving a
page of data, the API includes a Next-Page-Token
header in the response. This
token allows clients to request the next page of items without needing to
calculate offsets.
Usage
- Retrieve the first page using any applicable filtering criteria.
- Use the
Next-Page-Token
HTTP header value of the response as thepageToken
query parameter in subsequent requests. - Optionally control page size using the
pageSize
parameter (if omitted, a default page size is used).
GET /ed-fi/students?pageToken=LTIxNDc0NzU1ODQsLTIxNDc0Njc1MjE&pageSize=500
pageToken
and pageSize
parameters cannot be used in conjunctionwith offset
and limit
as these represent mutually exclusive paging
strategies. :::
totalCount
query string parameter is not supported with cursorpaging (i.e. when using using pageToken
). If you need to get a total count
before processing, perform this using an initial request as follows:
GET /ed-fi/students?totalCount=true&limit=0
Parallel Data Processing with /partitions
The /partitions
child endpoint of each resource is designed to enable client-side
parallel data processing by dividing the resource items into smaller,
evenly distributed partitions of data. This approach allows API clients to
process data in parallel threads, improving efficiency and reducing overall
processing time.
Endpoint Behavior
Partition Creation
- The endpoint accepts an optional
number
parameter to specify the desired number of partitions. - You may get fewer than the number of requested partitions if there isn't sufficient data available to warrant the level of parallel processing requested.
- Each entry in the
pageTokens
array of the response contains thepageToken
that represents the starting point for cursor pagination within that partition. - If no
number
parameter is provided, the API will provide a sensible number of partitions (up to a maximum of 10).
- computed partition size = CEILING(number of items available) / (number of partitions)
- minimum partition size = (maximum page size X 5)
- partition size = MAX( computed partition size, minimum partition size)
With the partition size determined, number of partitions then is calculated based on the total number of items available.
GET /ed-fi/students/partitions?number=5
{
"pageTokens": [
"LTIxNDc0ODM2NDgsLTIxNDc0NzU1ODU",
"LTIxNDc0NzU1ODQsLTIxNDc0Njc1MjE",
"LTIxNDc0Njc1MjAsLTIxNDc0NTk0NTc",
"LTIxNDc0NTk0NTYsLTIxNDc0NTEzOTM",
"LTIxNDc0NTEzOTIsMjE0NzQ4MzY0Nw"
]
}
Pagination on Partitions
Clients use the pageToken
values to request data from each partition in
parallel. Pagination within a partition proceeds until no data is returned,
indicating all items in that partition have been retrieved.
Filtering and Criteria
The /partitions
endpoint supports filtering criteria such as resource property
values and the Change Version ranges (using the minChangeVersion
and
maxChangeVersion
query string parameters).
criteria should be provided when retrieving partitions as with the page requests on the resource endpoint to ensure optimal and balanced results. For example, providing criteria to the partitions endpoint but not the requests may result in an exceedingly large number of items produced by the first partition in relation to the other partitions. :::