Installing ehrQL with Python🔗

Danger

This page discusses the new OpenSAFELY Data Builder for accessing OpenSAFELY data sources.

Use OpenSAFELY cohort-extractor, unless you are specifically involved in the development or testing of Data Builder.

OpenSAFELY ehrQL and its documentation are still undergoing extensive development. We will announce when ehrQL is ready for general use on the Platform News page.

Warning

We recommend that you use ehrQL with the OpenSAFELY CLI as instructed in the ehrQL tutorial.

Limitations🔗

This option is a fall back if:

you are a competent Python user,
and you understand how to install Python packages yourself with pip

This installation option will allow you to run ehrQL dataset definitions only. You will not be able to run a full OpenSAFELY project via a project.yaml pipeline.

If you are unable to run ehrQL via Docker, you can try installing ehrQL directly using Python.

As Python configurations vary between operating systems, and how users have Python configured, we will not give detailed instructions.

Warning

This option may not work on Windows currently: https://github.com/opensafely-core/ehrql/issues/790

Todo

Can we fix that issue?

Requirements🔗

You will need to:

have a suitable Python version installed (currently Python 3.11)
configure a suitable virtual environment to run ehrQL for example with conda or venv
install the ehrQL package into that virtual environment;

Installation🔗

Install the latest version of ehrQL into your new virtual environment with pip

pip install git+https://github.com/opensafely-core/ehrql@main#egg=opensafely-ehrql`

Todo

It's probably better to advocate installing the same version we're using to build the definitions. This will be a tagged version in ehrql/requirements.prod.in.

Todo

Are we going to ever publish ehrQL to PyPI?

Checking the installation🔗

Make sure that you can run ehrQL's "help" command:

ehrql --help

If that command succeeds, you should see some help text and ehrQL should be correctly installed.

Using ehrQL's command-line interface🔗

This section explains how to load dataset definitions into ehrQL.

Each dataset definition used in this tutorial has a filename of the form:

IDENTIFIER_DATASOURCENAME_dataset_definition.py

For example, for

1a_minimal_dataset_definition.py

the identifier is 1a and the data source name is minimal. The identifier associates the dataset definition with a specific tutorial page.

Todo

Check how compatible this is cross-platform.

To run this dataset definition with ehrQL,

In a terminal, enter the ehrql-tutorial-examples directory that you extracted from the sample data.
Run this command:

ehrql generate-dataset "1a_minimal_dataset_definition.py" --dummy-tables "example-data/minimal/" --output "outputs.csv"

3. You should see ehrQL run without error and find the outputs.csv file in the ehrql-tutorial-examples directory that you were working in.

Tip

In general, the command to run a dataset defintion looks like:

ehrql generate-dataset "IDENTIFIER_DATASOURCENAME_dataset_definition.py --dummy-tables "example-data/DATASOURCENAME/" --output "outputs.csv"

You need to substitute DATASOURCENAME with the appropriate dataset name, and IDENTIFIER_DATASOURCENAME_dataset_definition.py to match the specific dataset definition that you want to run.

Tip

The output in this example is called outputs.csv, but you can choose any valid filename.