Skip to content

ehrQL: electronic health record query language🔗

Danger

This page discusses the new OpenSAFELY Data Builder for accessing OpenSAFELY data sources.

Use OpenSAFELY cohort-extractor, unless you are specifically involved in the development or testing of Data Builder.

OpenSAFELY ehrQL and its documentation are still undergoing extensive development. We will announce when ehrQL is ready for general use on the Platform News page.

ehrQL, or electronic health record query language, is the language used by ehrQL to extract datasets for analysis. It was developed to help researchers write simple, unambiguous queries; and to help researchers write queries that weren't anticipated by ehrQL's developers.

The first stage when carrying out research with the OpenSAFELY platform is for the researcher to write a dataset definition in ehrQL. A dataset definition specifies the criteria for selecting, and the characteristics of, the population. It is used by ehrQL to extract a dataset from an EHR system, where a dataset is a tabular data structure with one row per patient and one column per characteristic.

The following dataset definition, which is written in ehrQL, is from the tutorial. It specifies a single criterion for selecting the population; namely, that the population should consist of all patients who were born in or after 2000. It also specifies a single characteristic of the population; namely, each patient's year of birth.

from ehrql import Dataset
from ehrql.tables.beta.smoketest import patients

year_of_birth = patients.date_of_birth.year
dataset = Dataset()
dataset.define_population(year_of_birth >= 2000)
dataset.year_of_birth = year_of_birth

You can find out more about ehrQL in the tutorial and the examples. For in-depth technical documentation, there is also the reference.

TO BE REPLACED IN FULL DOCS BUILD

This snippet will be replaced in the main docs with the parent file 'includes/glossary.md'