Skip to content

OpenSAFELY ehrQL🔗

Danger

This page discusses the new OpenSAFELY Data Builder for accessing OpenSAFELY data sources.

Use OpenSAFELY cohort-extractor, unless you are specifically involved in the development or testing of Data Builder.

OpenSAFELY ehrQL and its documentation are still undergoing extensive development. We will announce when ehrQL is ready for general use on the Platform News page.

Danger

This content has not yet been reviewed by OpenSAFELY technical leads. This page is not a definitive statement about the status of ehrQL, cohort-extractor or any other part of OpenSAFELY.

It should be taken as potentially incorrect, until this notice is removed.

ehrQL constructs datasets for researchers🔗

ehrQL is a tool to construct your dataset to use for research studies and analysis using OpenSAFELY.

With ehrQL:

  • Researchers can specify data they want to use in their research via a dataset definition.
  • Data providers can specify data they want to offer for research via an OpenSAFELY Backend.

Features🔗

Readable dataset definitions🔗

A new query language ehrQL has been developed for ehrQL. Researchers can now use a dataset definition to specify the data to be extracted from OpenSAFELY.

ehrQL is designed to be semantically easy to read and understand how the dataset it is defining is constructed.

Multiple backends🔗

ehrQL facilitates querying multiple different data backends, without the researcher concerning themselves with the specific details of how that backend works. This means that a researcher only need to write a dataset definition once and be able to use this to query different datasets.

Researcher-provided dummy data🔗

ehrQL allows researchers to provide their own dummy data to use to develop their analytical code against.

Note

There is work in progress to add the functionality to generate dummy data from the dataset definition. This is currently in development.

Why ehrQL was created🔗

For researchers familiar with OpenSAFELY, there is naturally a question as to why we are writing software to replace cohort extractor. ehrQL is intended to eventually replace the use of cohort-extractor in new studies. We have more information about the differences between cohort-extractor and ehrQL to read if you are interested.

In OpenSAFELY's first two years, researchers have used cohort-extractor and study definitions to successfully complete a number of research studies using multiple data sources and linked data.

ehrQL is a complete redesign and reimplementation of cohort-extractor aimed at making OpenSAFELY even easier to work with for researchers and data providers. ehrQL's design incorporates feedback from researchers' use of cohort-extractor.

ehrQL:

  • Provides more expressive ways for researchers to specify cohorts.
  • Simplifies the implementation of new features across multiple different data backends.

For more information on how ehrQL and Cohort Extractor compare, see the development plan for ehrQL.

Reading the ehrQL documentation🔗

Other documentation pages explain in more detail the concepts to write a dataset definition:

ehrQL is still in development🔗

Warning

There is considerable on-going work into ehrQL's design and development. ehrQL is subject to frequent change, indicated by its current v0 version.

We recommend that users still favour the existing OpenSAFELY Cohort Extractor for their research.

TO BE REPLACED IN FULL DOCS BUILD

This snippet will be replaced in the main docs with the parent file 'includes/glossary.md'