Serialized Data
The Serialized Data feature is a performance optimization where the API manages all of a resource's data in serialized form to significantly reduce the size and volume of queries executed against the database. While the data in the ODS tables is still managed as usual, the serialized data of each resource is stored in the "root" record. The API then treats this binary representation as the primary source of data for servicing API requests (both read and write), resulting in significantly less SQL being executed against the database.
Benefits
The primary benefit is the replacement of large batches of SQL generated by NHibernate, the Object Relational Mapper (ORM) that is used for persistence, with a highly simplified query pattern that eliminates the need for any joins (other than those required for authorization purposes). Since all of the resource's data must be loaded for both read and write scenarios, this improvement will provide benefit to all Data Management resource requests.
To support this feature, a new AggregateData
column has been added to every
aggregate root table in the ODS to store the serialized binary data, as follows:
- SQL Server -
varbinary(8000)
- PostgreSQL -
bytea
On the read side, the API is able to provide the response using only the
serialized data stored in the AggregateData
column rather than needing to
execute a potentially large batch of SQL. The following listing shows the SQL
that the API executes with the feature enabled (with the only join being
authorization-related):
SELECT r.AggregateId , r.AggregateData , r.LastModifiedDate , r.StudentUsi AS SurrogateId
FROM edfi.Student AS r
INNER JOIN authView7ba7db ON r.StudentUSI = authView7ba7db.StudentUSI
ORDER BY r.AggregateId
OFFSET @Offset ROWS FETCH NEXT @Limit ROWS ONLY
Similarly, on the write side the entities are deserialized from this binary data and provided to NHibernate as though they were just loaded from the ODS tables. This once again eliminates the query batches that NHibernate would otherwise generate to synchronize its state with the database in order to determine the changes that need to be applied.
exec sp_executesql N' SELECT r.AggregateId , r.AggregateData , r.LastModifiedDate , r.StudentUsi AS SurrogateId
FROM edfi.Student AS r WHERE r.StudentUSI = @p0
',N'@p0 int',@p0=1963
An analysis of requests for resources with multiple child entities reveals the
size of the query batches generated by NHibernate to be quite significant. For
example, a request against the StudentEducationOrganizationAssociations resource
yields about 50KB of SQL and performs 24 total left outer joins across 20
separate SELECT
statements in the batch to load the necessary data.
Tradeoffs
While improvements provided by the feature are significant, there are some trade-offs and considerations associated with this optimization.
Direct ODS Data Updates Require Additional Care
For any processing that modifies ODS data directly (i.e. not through the API), platform hosts must perform additional steps to ensure that the API reflects the latest data in the ODS tables and subsequently reincorporates it into the serialized representation. One or both of the following steps must be performed for each affected resource to prevent possible data loss:
- Update the
LastModifiedDate
column on the root table of the affected resource. - Clear the
AggregateData
column (optional, but recommended).
The following type of changes to a resource's associated tables are relevant:
- Update of values on the "root" record
- Update of values on a record derived from the "root" record (e.g.
School
,StudentSpecialEducationProgramAssociation
, which inherit fromEducationOrganization
andGeneralStudentProgramAssociation
, respectively) - Insert/Update/Delete of any records in child tables contained by the resource
info
This includes child tables of a derived type, such as
SchoolGradeLevel
. In this case, the deletion of aSchoolGradeLevel
record would necessitate an update of theLastModifiedDate
on theEducationOrganization
record associated with the containingSchool
. - Insert/Update/Delete of an "extension" record associated with the resource
- Insert/Update/Delete of any child items within an extension to the resource
Increased ODS Data Storage Requirements
Logically there is an increase in the data storage requirements due to the
persistence of the serialized data in the AggregateData
column of each "root"
table record. However, the serialization format used is efficient and performant
(see MessagePack) and the serialized data is also
compressed before it is persisted, reducing impact as much as possible for
storage considerations and the network bandwidth usage between the API and the
database server.