Phenopacket Schema

The goal of the phenopacket-schema is to define a machine-readable phenotypic description of a patient/sample in the context of rare disease, common/complex disease, or cancer. It aims to provide sufficient and shareable information of the data outside of the EHR (Electronic Health Record) with the aim of enabling capturing of sufficient structured data at the point of care by a clinician or clinical geneticist for sharing with other labs or computational analysis of the data in clinical or research environments.

This work has been produced as part of the GA4GH Clinical Phenotype Data Capture Workstream and is designed to be compatible with GA4GH metadata-schemas.

The phenopacket schema defines a common, limited set of data types which may be composed into more specialised types for data sharing between resources using an agreed upon common schema.

This common schema has been used to define the ‘Phenopacket’ which is a catch-all collection of data types, specifically focused on representing disease data both initial data capture and analysis. The phenopacket schema is designed to be both human and machine-readable, and to inter-operate with standards being developed in organizations such as in the ISO TC215 committee and the HL7 Fast Healthcare Interoperability Resources Specification (aka FHIR®).


The diagram below shows an overview of Phenopackets schema elements and relationships.


The yellow circles represent an Interpretation and its sub-elements. Blue circles are the top-level elements used to describe an individual or group of individuals, and the green circles are the building block components of those. Lines between elements indicate composition.

Note: Interpretation is displayed in a different colour from the other top-level elements because it contains a Family or Phenopacket. Interpretation interprets the Phenopacket based on its component elements, whereas the Phenopacket simply states what was observed. They are both composed of the green building block elements, but the additional orange elements are only used to compose the Interpretation.

The schema is defined in protobuf. You can find out more in the section ‘A short introduction to protobuf’.