Skip to content

beta.tpp schema🔗

Available on backends: TPP

This defines all the data (both primary care and externally linked) available in the TPP backend.

To use this schema in an ehrQL file:
from ehrql.tables.beta.tpp import (
    addresses,
    appointments,
    clinical_events,
    emergency_care_attendances,
    hospital_admissions,
    household_memberships_2020,
    isaric_raw,
    medications,
    occupation_on_covid_vaccine_record,
    ons_cis,
    ons_deaths,
    open_prompt,
    patients,
    practice_registrations,
    sgss_covid_all_tests,
    vaccinations,
)

many rows per patient

addresses🔗

Columns
address_id 🔗 integer
start_date 🔗 date
end_date 🔗 date
address_type 🔗 integer
rural_urban_classification 🔗 integer
imd_rounded 🔗 integer
msoa_code 🔗 string
  • Matches regular expression: E020[0-9]{5}
has_postcode 🔗 boolean
care_home_is_potential_match 🔗 boolean
care_home_requires_nursing 🔗 boolean
care_home_does_not_require_nursing 🔗 boolean
Methods
for_patient_on(date) 🔗

Return each patient's registered address as it was on the supplied date.

Where there are multiple registered addresses we prefer any which have a known postcode (though we never have access to this postcode) as this is used by TPP to cross-reference other data associated with the address, such as the MSOA or index of multiple deprevation.

Where there are multiple of these we prefer the most recently registered address and then, if there are multiple of these, the one with the longest duration. If there's stil an exact tie we choose arbitrarily based on the address ID.

View method definition
spanning_addrs = addresses.where(addresses.start_date <= date).except_where(
    addresses.end_date < date
)
ordered_addrs = spanning_addrs.sort_by(
    case(when(addresses.has_postcode).then(1), default=0),
    addresses.start_date,
    addresses.end_date,
    addresses.address_id,
)
return ordered_addrs.last_for_patient()

many rows per patient

appointments🔗

Columns
booked_date 🔗 date
start_date 🔗 date

many rows per patient

clinical_events🔗

Columns
date 🔗 date
snomedct_code 🔗 SNOMED-CT code
ctv3_code 🔗 CTV3 (Read v3) code
numeric_value 🔗 float

many rows per patient

emergency_care_attendances🔗

Columns
id 🔗 integer
arrival_date 🔗 date
discharge_destination 🔗 SNOMED-CT code
diagnosis_01 🔗 SNOMED-CT code
diagnosis_02 🔗 SNOMED-CT code
diagnosis_03 🔗 SNOMED-CT code
diagnosis_04 🔗 SNOMED-CT code
diagnosis_05 🔗 SNOMED-CT code
diagnosis_06 🔗 SNOMED-CT code
diagnosis_07 🔗 SNOMED-CT code
diagnosis_08 🔗 SNOMED-CT code
diagnosis_09 🔗 SNOMED-CT code
diagnosis_10 🔗 SNOMED-CT code
diagnosis_11 🔗 SNOMED-CT code
diagnosis_12 🔗 SNOMED-CT code
diagnosis_13 🔗 SNOMED-CT code
diagnosis_14 🔗 SNOMED-CT code
diagnosis_15 🔗 SNOMED-CT code
diagnosis_16 🔗 SNOMED-CT code
diagnosis_17 🔗 SNOMED-CT code
diagnosis_18 🔗 SNOMED-CT code
diagnosis_19 🔗 SNOMED-CT code
diagnosis_20 🔗 SNOMED-CT code
diagnosis_21 🔗 SNOMED-CT code
diagnosis_22 🔗 SNOMED-CT code
diagnosis_23 🔗 SNOMED-CT code
diagnosis_24 🔗 SNOMED-CT code

many rows per patient

hospital_admissions🔗

Columns
id 🔗 integer
admission_date 🔗 date
discharge_date 🔗 date
admission_method 🔗 string
all_diagnoses 🔗 string
patient_classification 🔗 string
days_in_critical_care 🔗 integer
primary_diagnoses 🔗 string

one row per patient

household_memberships_2020🔗

Inferred household membership as of 2020-02-01, as determined by TPP using an as yet undocumented algorithm.

Columns
household_pseudo_id 🔗 integer
household_size 🔗 integer

many rows per patient

isaric_raw🔗

A subset of the ISARIC data.

These columns are deliberately all taken as strings while in a preliminary phase. They will later change to more appropriate data types.

Descriptions taken from: CCP_REDCap_ISARIC_data_dictionary_codebook.pdf

Columns
age 🔗 string

Age

age_factor 🔗 string

TODO

calc_age 🔗 string

Calculated age (comparing date of birth with date of enrolment). May be inaccurate if a date of February 29 is used.

sex 🔗 string

Sex at birth.

ethnic___1 🔗 string

Ethnic group: Arab.

ethnic___2 🔗 string

Ethnic group: Black.

ethnic___3 🔗 string

Ethnic group: East Asian.

ethnic___4 🔗 string

Ethnic group: South Asian.

ethnic___5 🔗 string

Ethnic group: West Asian.

ethnic___6 🔗 string

Ethnic group: Latin American.

ethnic___7 🔗 string

Ethnic group: White.

ethnic___8 🔗 string

Ethnic group: Aboriginal/First Nations.

ethnic___9 🔗 string

Ethnic group: Other.

ethnic___10 🔗 string

Ethnic group: N/A.

covid19_vaccine 🔗 string

Has the patient received a Covid-19 vaccine (open label licenced product)?

covid19_vaccined 🔗 date

Date first vaccine given (Covid-19) if known.

covid19_vaccine2d 🔗 date

Date second vaccine given (Covid-19) if known.

covid19_vaccined_nk 🔗 string

First vaccine given (Covid-19) but date not known.

corona_ieorres 🔗 string

Suspected or proven infection with pathogen of public health interest.

coriona_ieorres2 🔗 string

Proven or high likelihood of infection with pathogen of public health interest.

coriona_ieorres3 🔗 string

Proven infection with pathogen of public health interest.

inflammatory_mss 🔗 string

Adult or child who meets case definition for inflammatory multi-system syndrome (MIS-C/MIS-A).

cestdat 🔗 date

Onset date of first/earliest symptom.

chrincard 🔗 string

Chronic cardiac disease, including congenital heart disease (not hypertension).

  • Possible values: YES, NO, Unknown
hypertension_mhyn 🔗 string

Hypertension (physician diagnosed).

  • Possible values: YES, NO, Unknown
chronicpul_mhyn 🔗 string

Chronic pulmonary disease (not asthma).

  • Possible values: YES, NO, Unknown
asthma_mhyn 🔗 string

Asthma (physician diagnosed).

  • Possible values: YES, NO, Unknown
renal_mhyn 🔗 string

Chronic kidney disease.

  • Possible values: YES, NO, Unknown
mildliver 🔗 string

Mild liver disease.

  • Possible values: YES, NO, Unknown
modliv 🔗 string

Moderate or severe liver disease

  • Possible values: YES, NO, Unknown
chronicneu_mhyn 🔗 string

Chronic neurological disorder.

  • Possible values: YES, NO, Unknown
malignantneo_mhyn 🔗 string

Malignant neoplasm.

  • Possible values: YES, NO, Unknown
chronichaemo_mhyn 🔗 string

Chronic haematologic disease.

  • Possible values: YES, NO, Unknown
aidshiv_mhyn 🔗 string

AIDS/HIV.

  • Possible values: YES, NO, Unknown
obesity_mhyn 🔗 string

Obesity (as defined by clinical staff).

  • Possible values: YES, NO, Unknown
diabetes_type_mhyn 🔗 string

Diabetes and type.

  • Possible values: NO, 1, 2, N/K
diabetescom_mhyn 🔗 string

Diabetes with complications.

  • Possible values: YES, NO, Unknown
diabetes_mhyn 🔗 string

Diabetes without complications.

  • Possible values: YES, NO, Unknown
rheumatologic_mhyn 🔗 string

Rheumatologic disorder.

  • Possible values: YES, NO, Unknown
dementia_mhyn 🔗 string

Dementia.

  • Possible values: YES, NO, Unknown
malnutrition_mhyn 🔗 string

Malnutrition.

  • Possible values: YES, NO, Unknown
smoking_mhyn 🔗 string

Smoking.

  • Possible values: Yes, Never Smoked, Former Smoker, N/K
hostdat 🔗 date

Admission date at this facility.

hooccur 🔗 string

Transfer from other facility?

hostdat_transfer 🔗 date

Admission date at previous facility.

hostdat_transfernk 🔗 string

Admission date at previous facility not known.

readm_cov19 🔗 string

Is the patient being readmitted with Covid-19?

dsstdat 🔗 date

Date of enrolment.

dsstdtc 🔗 date

Outcome date.

many rows per patient

medications🔗

Columns
date 🔗 date
dmd_code 🔗 dm+d code

many rows per patient

occupation_on_covid_vaccine_record🔗

Columns
is_healthcare_worker 🔗 boolean

many rows per patient

ons_cis🔗

Data from the ONS Covid Infection Survey.

Columns
visit_date 🔗 date
visit_num 🔗 integer
is_opted_out_of_nhs_data_share 🔗 boolean
last_linkage_dt 🔗 date
imd_decile_e 🔗 integer
imd_quartile_e 🔗 integer
rural_urban 🔗 integer

many rows per patient

ons_deaths🔗

Columns
date 🔗 date
place 🔗 string
  • Possible values: Care Home, Elsewhere, Home, Hospice, Hospital, Other communal establishment
cause_of_death_01 🔗 ICD-10 code
cause_of_death_02 🔗 ICD-10 code
cause_of_death_03 🔗 ICD-10 code
cause_of_death_04 🔗 ICD-10 code
cause_of_death_05 🔗 ICD-10 code
cause_of_death_06 🔗 ICD-10 code
cause_of_death_07 🔗 ICD-10 code
cause_of_death_08 🔗 ICD-10 code
cause_of_death_09 🔗 ICD-10 code
cause_of_death_10 🔗 ICD-10 code
cause_of_death_11 🔗 ICD-10 code
cause_of_death_12 🔗 ICD-10 code
cause_of_death_13 🔗 ICD-10 code
cause_of_death_14 🔗 ICD-10 code
cause_of_death_15 🔗 ICD-10 code

many rows per patient

open_prompt🔗

Columns
ctv3_code 🔗 CTV3 (Read v3) code

The question, as a CTV3 code

  • Never NULL
snomedct_code 🔗 SNOMED-CT code

The question, as a SNOMED CT code or None

consultation_date 🔗 date

The date the survey was administered

  • Never NULL
consultation_id 🔗 integer

The ID of the survey

  • Never NULL
numeric_value 🔗 float

The response to the question

  • Never NULL

one row per patient

patients🔗

Columns
date_of_birth 🔗 date

Patient's date of birth, rounded to first of month.

  • Always the first day of a month
  • Never NULL
sex 🔗 string

Patient's sex.

  • Possible values: female, male, intersex, unknown
  • Never NULL
date_of_death 🔗 date

Patient's date of death.

Methods
age_on(date) 🔗

Patient's age as an integer, in whole elapsed calendar years, as it would be on the supplied date.

Note that this takes no account of whether the patient is alive at the given date. In particular, it may return negative values if the date is before the patient's date of birth.

View method definition
return (date - patients.date_of_birth).years

many rows per patient

practice_registrations🔗

Columns
start_date 🔗 date
end_date 🔗 date
practice_pseudo_id 🔗 integer
practice_stp 🔗 string
  • Matches regular expression: E540000[0-9]{2}
practice_nuts1_region_name 🔗 string

Name of the NUTS level 1 region of England to which the practice belongs. For more information see: https://www.ons.gov.uk/methodology/geography/ukgeographies/eurostat

  • Possible values: North East, North West, Yorkshire and The Humber, East Midlands, West Midlands, East, London, South East, South West
Methods
for_patient_on(date) 🔗

Return each patient's practice registration as it was on the supplied date.

Where a patient is registered with multiple practices we prefer the most recent registration and then, if there are multiple of these, the one with the longest duration. If there's stil an exact tie we choose arbitrarily based on the practice ID.

View method definition
spanning_regs = practice_registrations.where(practice_registrations.start_date <= date).except_where(
    practice_registrations.end_date < date
)
ordered_regs = spanning_regs.sort_by(
    practice_registrations.start_date,
    practice_registrations.end_date,
    practice_registrations.practice_pseudo_id,
)
return ordered_regs.last_for_patient()

many rows per patient

sgss_covid_all_tests🔗

Columns
specimen_taken_date 🔗 date
is_positive 🔗 boolean

many rows per patient

vaccinations🔗

Columns
vaccination_id 🔗 integer
date 🔗 date
target_disease 🔗 string
product_name 🔗 string