Skip to content

Quick start

There are 3 components that need to be added to a project to use ehrQL. These are:

  1. Add ehrQL to a project.yaml
  2. Add a Dataset definition
  3. Specify a dummy data file.

Adding to project.yaml🔗

The project.yaml below is a minimal example that only runs ehrQL.

We define an action, in this case, generate_dataset with specific configuration. There is more information on projects here - which are common to all OpenSAFELY projects.

Specifically here we need to add:

  • --dataset-definition - this indicates where the dataset definition file is
  • --dummy-data-file - this indicate the path of the dummy data file
Minimal ehrQL project YAML example
version: '3.0'

actions:

  generate_dataset:
    run: >
      ehrql:v0 generate_dataset
        --dataset-definition analysis/dataset_definition.py
        --dummy-data-file dummy_data.csv
        --output output/dataset.csv
    outputs:
      highly_sensitive:
        dataset: output/dataset.csv

Versioning🔗

In the project.yaml above, the version of ehrQL was explicitly specified.

ehrQL uses semantic versioning. Data Builder releases use version numbers of the format vMAJOR.MINOR.PATCH — for example, v0.1.2 would have a major version of v0.

Adding a Dataset definition🔗

This is the dataset defined in ehrQL. There are sections and a tutorial on how to specify the dataset in the ehrQL sections. Below is the basic example.

from ehrql import Dataset
from ehrql.tables.beta.smoketest import patients

year_of_birth = patients.date_of_birth.year
dataset = Dataset()
dataset.define_population(year_of_birth >= 2000)
dataset.year_of_birth = year_of_birth

Add a dummy data file🔗

This is a CSV file of the dummy data to use with ehrql. It needs to be placed at the path specified above.