ehrQL tutorial: Working with electronic health record codes and codelists🔗
Danger
This page discusses the new OpenSAFELY Data Builder for accessing OpenSAFELY data sources.
Use OpenSAFELY cohort-extractor, unless you are specifically involved in the development or testing of Data Builder.
OpenSAFELY ehrQL and its documentation are still undergoing extensive development. We will announce when ehrQL is ready for general use on the Platform News page.
Electronic health record codes🔗
Electronic health record (EHR) codes are used when recording patient events. For research, codes are often collated into codelists to classify patients. ehrQL can use codelists to help extract information about patient groups of interest.
Info
There is more information about codelists elsewhere in OpenSAFELY's documentation. We also have a codelist builder tool, OpenCodelists, that you may find useful when creating codelists.
Example dataset definition 7: Codes and codelists🔗
Warning
This is a rough draft. It is intended as a placeholder so that the tutorial flows from start to finish.
Warning
Codelists are not yet fully implemented. See https://github.com/opensafely-core/ehrql/issues/31
Todo
Complete this tutorial page. Write an example that suitably explains enough for the user testing scenario. This is pending the use of codelists there being finalised.
Loading a codelist🔗
Todo
Remove the placeholder example and replace with a real dataset definition.
You can load codelists in a comma-separated values (CSV) format with ehrQL's codelist_from_csv()
function.
The CSV must have named columns.
You need to specify, as strings:
- the CSV filename
- the name of the CSV column containing the codes
codelist_from_csv()
from ehrql.codes import codelist_from_csv
codelist = codelist_from_csv(filename="my_codelist.csv", column="code")
Checking if a code is in a codelist🔗
Todo
Remove the placeholder example and replace with a real dataset definition.
Warning
This example may not be correct.
You can use isin
and isnotin
to check for specific codes.
Dataset definition
from ehrql.codes import SNOMEDCTCode
from ehrql import Dataset, codelist_from_csv
from ehrql.tables.example.tutorial import clinical_events
codelist = codelist_from_csv(filename="my_codelist.csv", column="code")
a1_code = SNOMEDCTCode("a1")
a2_code = SNOMEDCTCode("a2")
codelist_filtered_events = clinical_events.where(clinical_events.system == "snomed").code.is_in(codelist))
code_filtered_events = clinical_events.where(clinical.events.system == "snomed").code.is_in([a1_code, a2_code])
dataset = Dataset()
dataset.define_population(codelist_filtered_events.exists_for_patient() | code_filtered_events.exists_for_patient())
Todo
Explain that […]
creates a sequence of ordered items
termed a list in Python,
if not mentioned elsewhere.
We also instantiate code objects the same way we do for Dataset()
.
Tutorial questions🔗
Question
- …