library(codemapper)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Introduction

The CALIBER team have manually curated clinical code lists for 308 common health conditions, providing a rich resource for researchers working with electronic health records.(Kuan et al. 2019) All code lists are publicly available in csv format on github.1 These are divided into primary care (Read 2 codes and Medcodes) and secondary care (ICD10 and OPCS4).

The UK Biobank contains both linked primary and secondary care diagnostic records. Primary care records are in both Read 2 and Read 3 formats, while secondary care records are in both ICD10 and ICD9 formats (although the large majority of these are ICD10) as well as OPCS4. Data analysts may therefore wish to extend the CALIBER resource by mapping from Read 2 to Read 3, and from ICD10 to ICD9. The raw Read 2 and ICD10 CALIBER codes also require reformatting to match the format in UK Biobank data.

The CALIBER repository may be imported into R, reformatted for use with UK Biobank data, and mapped to Read 3 and ICD9 equivalents as follows:

# download CALIBER repository, returning file path
caliber_dir_path <- download_caliber_repo()

# read all CALIBER codes into R
caliber_raw <- read_caliber_raw(caliber_dir_path)

# build all_lkps_maps resource - contains clinical code lookup and mapping tables
all_lkps_maps <- build_all_lkps_maps()

# reformat CALIBER codes for UK Biobank, and map from ICD10 and Read 2 to ICD9 and Read 3 respectively. Expect various warnings to be raised at this stage
caliber_ukb <- reformat_caliber_for_ukb(
  caliber_raw,
  all_lkps_maps = all_lkps_maps,
  overlapping_disease_categories_csv = default_overlapping_disease_categories_csv()
)

Using dummy data:

# read all codes from dummy CALIBER repo into R
caliber_raw_dummy <- read_caliber_raw(dummy_caliber_dir_path())
#> Reading CALIBER clinical codes lists into R
#> Primary care Read 2 (1 of 3)
#> Secondary care ICD10 (2 of 3)
#> Secondary care OPCS4 (3 of 3)

# build dummy all_lkps_maps resource
all_lkps_maps_dummy <- build_all_lkps_maps_dummy()

# reformat CALIBER codes for UK Biobank, and map from ICD10 and Read 2 to ICD9 and Read 3 respectively (warnings suppressed)
caliber_ukb_dummy <- suppressWarnings(reformat_caliber_for_ukb(
  caliber_raw_dummy,
  all_lkps_maps = all_lkps_maps_dummy,
  overlapping_disease_categories_csv = default_overlapping_disease_categories_csv()
))
#> Reformatting Read 2 codes
#> Reformatting ICD10 codes
#> The following 1 input ICD10 codes do not have a 1-to-1 ICD10_CODE-to-ALT_CODE mapping: 'M90.0'. There will therefore be *more* output than input codes
#> Reformatting OPCS4 codes
#> Mapping read2 codes to read3
#> Mapping icd10 to icd9 codes

# view first few rows
caliber_ukb_dummy %>% 
  head() %>% 
  knitr::kable()
disease description category code_type code author
Diabetes Type 1 diabetes mellitus Type I diabetes mellitus (3) read2 C108. caliber
Diabetes Type I diabetes mellitus with renal complications Type I diabetes mellitus (3) read2 C1080 caliber
Diabetes Type I diabetes mellitus with neurological complications Type I diabetes mellitus (3) read2 C1082 caliber
Diabetes Unstable type I diabetes mellitus Type I diabetes mellitus (3) read2 C1084 caliber
Diabetes Type I diabetes mellitus with ulcer Type I diabetes mellitus (3) read2 C1085 caliber
Diabetes Type I diabetes mellitus with retinopathy Type I diabetes mellitus (3) read2 C1087 caliber

This vignette outlines the steps performed by read_caliber_raw() and reformat_caliber_for_ukb() to achieve this.

Reformatting Read 2 and ICD10 codes

Read 2

  • Remove the last 2 digits from all Read 2 codes (these indicate whether the associated description is ‘preferred’ or ‘alternative’).
  • Filter for just one code description per code (in some cases, one or more alternative descriptions are included because they map to distinct Medcodes e.g. see preferred and alternative descriptions for ‘7K1D0’ under ‘Fracture of hip’).

Before:

caliber_raw_dummy$read2 %>% 
  arrange(category) %>% 
  head() %>% 
  knitr::kable()
disease description category code_type code author
Diabetic neurological complications Diabetes mellitus with neurological manifestation Diagnosis of diabetic neurological complications Readcode C106.00 caliber
Diabetic neurological complications Diabetes mellitus with neurological manifestation Diagnosis of diabetic neurological complications Medcode 16230 caliber
Diabetic neurological complications Diabetic amyotrophy Diagnosis of diabetic neurological complications Readcode C106.11 caliber
Diabetic neurological complications Diabetic amyotrophy Diagnosis of diabetic neurological complications Medcode 59903 caliber
Diabetic neurological complications Diabetes mellitus with neuropathy Diagnosis of diabetic neurological complications Readcode C106.12 caliber
Diabetic neurological complications Diabetes mellitus with neuropathy Diagnosis of diabetic neurological complications Medcode 7795 caliber

After:

caliber_ukb_dummy %>% 
  arrange(category) %>% 
  filter(code_type == "read2") %>% 
  head() %>% 
  knitr::kable()
disease description category code_type code author
Diabetic neurological complications Diabetes mellitus with neurological manifestation Diagnosis of diabetic neurological complications read2 C106. caliber
End stage renal disease End stage renal failure Diagnosis of End stage renal disease read2 K05.. caliber
End stage renal disease End stage renal failure Diagnosis of End stage renal disease read2 K050. caliber
Diabetic neurological complications Myasthenic syndrome due to diabetic amyotrophy History of diabetic neurological complications read2 F3813 caliber
Hypertension Essential hypertension Hypertension (3) read2 G20.. caliber
End stage renal disease End-stage renal disease Procedure for End stage renal disease read2 K0D.. caliber

The following read codes are not present in the Read 2 lookup table provided by UK Biobank resource 592:

CALIBER disease Unrecognised Read 2 code
Alcohol Problems Z191.; Z1911; Z1912; Z4B1.
Anxiety disorders Z481.; Z4L1.
Coeliac disease ZC2C2
Crohn’s disease ZR3S.
Dementia ZS7C5
End stage renal disease Z1A..; Z1A1.; Z1A2.; Z919.; Z9191; Z9192; Z9193; Z91A.
Erectile dysfunction Z9E9.; ZG436
Hearing loss Z8B5.; Z8B51; Z8B53; Z8B55; Z911.; Z9111; Z9113; Z9114; Z9115; Z9117; Z9118; Z9119; Z911A; Z911B; Z911E; Z911G; Z9E81; ZE87.; ZL716; ZN569; ZN56A
Heart failure ZRad.
Intrauterine hypoxia Z2648; Z2649; Z264A; Z264B
Lupus erythematosus (local and systemic) ZRq8.; ZRq9.
Obesity ZC2CM
Other psychoactive substance misuse 9G24.; 9K4..; Z1Q62; Z416.
Tinnitus Z9112; ZEB..
Transient ischaemic attack Z7CE7
Urinary Incontinence Z9EA.
Visual impairment and blindness Z96..; Z961.; Z962.; ZK74.; ZN568; ZN56A; ZRhO.; ZRr6.

ICD10

  • Convert all ICD10 codes to the ALT_CODE format used in UK Biobank data. Note that while undivided three character ICD10 codes are flagged by an ‘X’ suffix in UK Biobank resource 592 (e.g. ‘A38X’, Scarlet fever), the suffix does not appear in the UK Biobank dataset itself (e.g. ‘A38X’ should instead appear as ‘A38’).
  • Expand all three character ICD10 codes to include their children. For example, ‘D25’ (Leiomyoma of uterus) is expanded to include ‘D25’, ‘D250’, ‘D251’, ‘D252’, ‘D259.’2
  • Expand any four character ICD10 codes with a ‘modifier 5’ to include their five character children. For example, ‘M90.0’ (Tuberculosis of bone) is expanded to include ‘M900’, ‘M9000’, ‘M9001’, ‘M9002’, ‘M9003’, ‘M9004’, ‘M9005’, ‘M9006’, ‘M9007’, ‘M9008’ and ‘M9009’.

Before:

caliber_raw_dummy$icd10 %>% 
  arrange(category) %>% 
  head() %>% 
  knitr::kable()
disease description category code_type code author
Asthma Asthma Diagnosis of Asthma icd10 J45 caliber
Bacterial Diseases (excl TB) Scarlet fever Diagnosis of Bacterial Diseases (excl TB) icd10 A38 caliber
Bacterial Diseases (excl TB) Osteomyelitis Diagnosis of Bacterial Diseases (excl TB) icd10 M86 caliber
Postviral fatigue syndrome, neurasthenia and fibromyalgia Neurasthenia Diagnosis of Postviral fatigue syndrome, neurasthenia and fibromyalgia icd10 F48.0 caliber
Tuberculosis Tuberculosis of bone Diagnosis of Tuberculosis icd10 M90.0 caliber
Diabetes Insulin-dependent diabetes mellitus Insulin dependent diabetes (3) icd10 E10 caliber

After:

caliber_ukb_dummy %>% 
  filter(code_type == "icd10") %>% 
  arrange(category) %>% 
  head() %>% 
  knitr::kable()
disease description category code_type code author
Asthma Asthma Diagnosis of Asthma icd10 J45 caliber
Asthma Predominantly allergic asthma Diagnosis of Asthma icd10 J450 caliber
Asthma Nonallergic asthma Diagnosis of Asthma icd10 J451 caliber
Asthma Mixed asthma Diagnosis of Asthma icd10 J458 caliber
Asthma Asthma, unspecified Diagnosis of Asthma icd10 J459 caliber
Bacterial Diseases (excl TB) Scarlet fever Diagnosis of Bacterial Diseases (excl TB) icd10 A38 caliber

The following ICD10 codes are not present in the ICD10 lookup table provided by UK Biobank resource 592:3

CALIBER disease Unrecognised ICD10 code
Infections of Other or unspecified organs A90; A91
Viral diseases (excl chronic hepatitis/HIV) A90; A91

OPCS4

  • Remove ‘.’ from all codes (e.g. ‘H01.1’ becomes ‘H011’).

Before:

caliber_raw_dummy$opcs4 %>% 
  arrange(category) %>% 
  head() %>% 
  knitr::kable()
disease description category code_type code author
Appendicitis Emergency excision of appendix Procedure for Appendicitis opcs4 H01 caliber
Appendicitis Emergency excision of abnormal appendix and drainage HFQ Procedure for Appendicitis opcs4 H01.1 caliber
Appendicitis Emergency excision of abnormal appendix NEC Procedure for Appendicitis opcs4 H01.2 caliber
Appendicitis Other specified emergency excision of appendix Procedure for Appendicitis opcs4 H01.8 caliber
Appendicitis Unspecified emergency excision of appendix Procedure for Appendicitis opcs4 H01.9 caliber

After:

caliber_ukb_dummy %>% 
  filter(code_type == "opcs4") %>% 
  arrange(category) %>% 
  head() %>% 
  knitr::kable()
disease description category code_type code author
Appendicitis Emergency excision of appendix Procedure for Appendicitis opcs4 H01 caliber
Appendicitis Emergency excision of abnormal appendix and drainage HFQ Procedure for Appendicitis opcs4 H011 caliber
Appendicitis Emergency excision of abnormal appendix NEC Procedure for Appendicitis opcs4 H012 caliber
Appendicitis Other specified emergency excision of appendix Procedure for Appendicitis opcs4 H018 caliber
Appendicitis Unspecified emergency excision of appendix Procedure for Appendicitis opcs4 H019 caliber

Mapping to Read 3 and ICD9

Read 2 to Read 3

Mapping from Read 2 to Read 3 is performed using the read_v2_read_ctv3 mapping sheet from UK Biobank resource 592. Points to be aware of:

  • A minority of mappings in read_v2_read_ctv3 are flagged as ‘not assured’ (IS_ASSURED ‘0’). These mappings are excluded by default - this action can be adjusted with the col_filters argument to reformat_caliber_for_ukb().

ICD10 to ICD9

Mapping from ICD10 to ICD94 is performed using the icd9_icd10 mapping sheet from UK Biobank resource 592. Points to be aware of:

  • There are a number of rows with missing values for either DESCRIPTION_ICD9 or DESCRIPTION_ICD10, indicating that these codes have no ICD9/ICD10 equivalent.5

  • One-to-many mappings occur in either direction (i.e. ICD9 to ICD10, and ICD10 to ICD9).

Overlapping disease categories

The mapping process results in some codes appearing under more than one disease category within a single disease. As a general rule, subcategories within a clinical code list should be mutually exclusive (e.g. a clinical code list for diabetes may be sub categorised into type 1 and type 2 diabetes - a clinical code for type 1 diabetes should not also be used for type 2 diabetes).6

By default, these cases are dealt with by using default_overlapping_disease_categories_csv() with reformat_caliber_for_ukb(). This uses the following csv file, which has been manually annotated (‘Y’ in column ‘keep’) to indicate which disease category a code should belong to:

disease description category code_type code author keep
Diabetes DIABETES MELLITUS Diabetes not otherwise specified (6) icd9 6480 caliber Y
Diabetes DIABETES MELLITUS Secondary diabetes (5) icd9 6480 caliber
Diabetic neurological complications Diabetic (femoral mononeuropathy) & (Diabetic amyotrophy) Diagnosis of diabetic neurological complications read3 Xa0lK caliber Y
Diabetic neurological complications Diabetic (femoral mononeuropathy) & (Diabetic amyotrophy) History of diabetic neurological complications read3 Xa0lK caliber
Diabetic neurological complications Diabetic amyotrophy Diagnosis of diabetic neurological complications read3 XaPmX caliber Y
Diabetic neurological complications Diabetic amyotrophy History of diabetic neurological complications read3 XaPmX caliber
End stage renal disease End stage renal failure Diagnosis of End stage renal disease read3 X30J0 caliber Y
End stage renal disease End stage renal failure Procedure for End stage renal disease read3 X30J0 caliber
Erectile dysfunction Erectile dysfunction Diagnosis of erectile dysfunction read3 E2273 caliber Y
Erectile dysfunction Erectile dysfunction Possible diagnosis of erectile dysfunction read3 E2273 caliber
Primary Malignancy_Other Organs HEAD, FACE AND NECK Diagnosis of Primary Malignancy_Other Organs icd9 1710 caliber Y
Primary Malignancy_Other Organs HEAD, FACE AND NECK Possible Diagnosis of Primary Malignancy_Other Organs icd9 1710 caliber
Primary Malignancy_Other Organs UPPER LIMB, INCLUDING SHOULDER Diagnosis of Primary Malignancy_Other Organs icd9 1712 caliber Y
Primary Malignancy_Other Organs UPPER LIMB, INCLUDING SHOULDER Possible Diagnosis of Primary Malignancy_Other Organs icd9 1712 caliber
Primary Malignancy_Other Organs LOWER LIMB, INCLUDING HIP Diagnosis of Primary Malignancy_Other Organs icd9 1713 caliber Y
Primary Malignancy_Other Organs LOWER LIMB, INCLUDING HIP Possible Diagnosis of Primary Malignancy_Other Organs icd9 1713 caliber
Primary Malignancy_Other Organs THORAX Diagnosis of Primary Malignancy_Other Organs icd9 1714 caliber Y
Primary Malignancy_Other Organs THORAX Possible Diagnosis of Primary Malignancy_Other Organs icd9 1714 caliber
Primary Malignancy_Other Organs ABDOMEN Diagnosis of Primary Malignancy_Other Organs icd9 1715 caliber Y
Primary Malignancy_Other Organs ABDOMEN Possible Diagnosis of Primary Malignancy_Other Organs icd9 1715 caliber
Primary Malignancy_Other Organs PELVIS Diagnosis of Primary Malignancy_Other Organs icd9 1716 caliber Y
Primary Malignancy_Other Organs PELVIS Possible Diagnosis of Primary Malignancy_Other Organs icd9 1716 caliber
Primary Malignancy_Other Organs TRUNK, UNSPECIFIED Diagnosis of Primary Malignancy_Other Organs icd9 1717 caliber Y
Primary Malignancy_Other Organs TRUNK, UNSPECIFIED Possible Diagnosis of Primary Malignancy_Other Organs icd9 1717 caliber
Primary Malignancy_Other Organs OTHER Diagnosis of Primary Malignancy_Other Organs icd9 1718 caliber Y
Primary Malignancy_Other Organs OTHER Possible Diagnosis of Primary Malignancy_Other Organs icd9 1718 caliber
Primary Malignancy_Other Organs OTHER Diagnosis of Primary Malignancy_Other Organs icd9 1878 caliber Y
Primary Malignancy_Other Organs OTHER Possible Diagnosis of Primary Malignancy_Other Organs icd9 1878 caliber
Tuberculosis Late effects of tuberculosis of bones and joints Diagnosis of tuberculosis read3 AE03. caliber Y
Tuberculosis Late effects of tuberculosis of bones and joints History of tuberculosis read3 AE03. caliber
Unstable Angina Worsening angina Unstable angina (3) read3 XE0Ui caliber
Unstable Angina Worsening angina Worsening angina (2) read3 XE0Ui caliber Y

References

Kuan, Valerie, Spiros Denaxas, Arturo Gonzalez-Izquierdo, Kenan Direk, Osman Bhatti, Shanaz Husain, Shailen Sutaria, et al. 2019. “A Chronological Map of 308 Physical and Mental Health Conditions from 4 Million Individuals in the English National Health Service.” The Lancet. Digital Health 1 (2): e63–e77. https://doi.org/10.1016/S2589-7500(19)30012-3.


  1. See also the HDRUK Phenotype Library for even more clinical code lists.↩︎

  2. See this warning note (under the ‘Coding lists’ tab)↩︎

  3. These were retired after the 4th ICD10 edition, whereas the lookup table in UK Biobank resource 592 is based on the 5th edition.↩︎

  4. Note that there are relatively few ICD9 diagnostic records.↩︎

  5. Although some of these codes look like they should map to each other (e.g. ICD9 ‘0030’ SALMONELLA GASTROENTERITIS and ICD10 ‘A020’ Salmonella enteritis).↩︎

  6. Note that clinical codes may appropriately appear under more than one disease however (e.g. ‘E103’ Type 1 diabetes mellitus With ophthalmic complications, is listed under both ‘Diabetes’ and ‘Diabetic ophthalmic complications’ by CALIBER)↩︎