Quick start
There are 3 components that need to be added to a project to use ehrQL. These are:
- Add ehrQL to a
project.yaml
- Add a Dataset definition
- Specify a dummy data file.
Adding to project.yaml🔗
The project.yaml
below is a minimal example that only runs ehrQL.
We define an action, in this case, generate_dataset
with specific configuration.
There is more information on projects here - which are common to all OpenSAFELY
projects.
Specifically here we need to add:
--dataset-definition
- this indicates where the dataset definition file is--dummy-data-file
- this indicate the path of the dummy data file
version: '3.0'
actions:
generate_dataset:
run: >
ehrql:v0 generate_dataset
--dataset-definition analysis/dataset_definition.py
--dummy-data-file dummy_data.csv
--output output/dataset.csv
outputs:
highly_sensitive:
dataset: output/dataset.csv
Versioning🔗
In the project.yaml
above, the version of ehrQL was explicitly
specified.
ehrQL uses semantic versioning. Data
Builder releases use version numbers of the format vMAJOR.MINOR.PATCH
— for example, v0.1.2
would have a major version of v0
.
Adding a Dataset definition🔗
This is the dataset defined in ehrQL. There are sections and a tutorial on how to specify the dataset in the ehrQL sections. Below is the basic example.
from ehrql import Dataset
from ehrql.tables.beta.smoketest import patients
year_of_birth = patients.date_of_birth.year
dataset = Dataset()
dataset.define_population(year_of_birth >= 2000)
dataset.year_of_birth = year_of_birth
Add a dummy data file🔗
This is a CSV file of the dummy data to use with ehrql. It needs to be placed at the path specified above.