Skip to main content
Version: 7.3

Serialized Data

The Serialized Data feature is a performance optimization where the API manages all of a resource's data in serialized form to significantly reduce the size and volume of queries executed against the database. While the data in the ODS tables is still managed as usual, the serialized data of each resource is stored in the "root" record. The API then treats this binary representation as the primary source of data for servicing API requests (both read and write), resulting in significantly less SQL being executed against the database.

Benefits

The primary benefit is the replacement of large batches of SQL generated by NHibernate, the Object Relational Mapper (ORM) that is used for persistence, with a highly simplified query pattern that eliminates the need for any joins (other than those required for authorization purposes). Since all of the resource's data must be loaded for both read and write scenarios, this improvement will provide benefit to all Data Management resource requests.

To support this feature, a new AggregateData column has been added to every aggregate root table in the ODS to store the serialized binary data, as follows:

  • SQL Server - varbinary(8000)
  • PostgreSQL - bytea

On the read side, the API is able to provide the response using only the serialized data stored in the AggregateData column rather than needing to execute a potentially large batch of SQL. The following listing shows the SQL that the API executes with the feature enabled (with the only join being authorization-related):

SELECT  r.AggregateId , r.AggregateData , r.LastModifiedDate , r.StudentUsi AS SurrogateId
FROM edfi.Student AS r
INNER JOIN authView7ba7db ON r.StudentUSI = authView7ba7db.StudentUSI
ORDER BY r.AggregateId

OFFSET @Offset ROWS FETCH NEXT @Limit ROWS ONLY

Similarly, on the write side the entities are deserialized from this binary data and provided to NHibernate as though they were just loaded from the ODS tables. This once again eliminates the query batches that NHibernate would otherwise generate to synchronize its state with the database in order to determine the changes that need to be applied.

exec sp_executesql N' SELECT  r.AggregateId , r.AggregateData , r.LastModifiedDate , r.StudentUsi AS SurrogateId
FROM edfi.Student AS r WHERE r.StudentUSI = @p0
',N'@p0 int',@p0=1963
NHibernate Query Analysis

An analysis of requests for resources with multiple child entities reveals the size of the query batches generated by NHibernate to be quite significant. For example, a request against the StudentEducationOrganizationAssociations resource yields about 50KB of SQL and performs 24 total left outer joins across 20 separate SELECT statements in the batch to load the necessary data.

Tradeoffs

While improvements provided by the feature are significant, there are some trade-offs and considerations associated with this optimization.

Direct ODS Data Updates Require Additional Care

For any processing that modifies ODS data directly (i.e. not through the API), platform hosts must perform additional steps to ensure that the API reflects the latest data in the ODS tables and subsequently reincorporates it into the serialized representation. One or both of the following steps must be performed for each affected resource to prevent possible data loss:

  • Update the LastModifiedDate column on the root table of the affected resource.
  • Clear the AggregateData column (optional, but recommended).

The following type of changes to a resource's associated tables are relevant:

  • Update of values on the "root" record
  • Update of values on a record derived from the "root" record (e.g. School, StudentSpecialEducationProgramAssociation, which inherit from EducationOrganization and GeneralStudentProgramAssociation, respectively)
  • Insert/Update/Delete of any records in child tables contained by the resource
    info

    This includes child tables of a derived type, such as SchoolGradeLevel. In this case, the deletion of a SchoolGradeLevel record would necessitate an update of the LastModifiedDate on the EducationOrganization record associated with the containing School.

  • Insert/Update/Delete of an "extension" record associated with the resource
  • Insert/Update/Delete of any child items within an extension to the resource

Increased ODS Data Storage Requirements

Logically there is an increase in the data storage requirements due to the persistence of the serialized data in the AggregateData column of each "root" table record. However, the serialization format used is efficient and performant (see MessagePack) and the serialized data is also compressed before it is persisted, reducing impact as much as possible for storage considerations and the network bandwidth usage between the API and the database server.