library(codemapper)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

all_lkps_maps_dummy <- build_all_lkps_maps_dummy()

Introduction

A phenome-wide association study analyses associations between a single genetic (or non-genetic) exposure across a number of phenotypes. One way to perform this is by using phecodes.(Denny, Bastarache, and Roden 2016; Wu et al. 2019)

This vignette outlines one approach for mapping UK Biobank clinical codes to phecodes.

Map UK Biobank clinical events to phecodes

A UK Biobank clinical events table created by ukbwranglr::tidy_clinical_events() may be mapped to Phecodes using map_clinical_events_to_phecodes():

# dummy clinical events data frame
dummy_clinical_events <- dummy_clinical_events_tidy()

# map to Phecodes
dummy_clinical_events_phecodes <- map_clinical_events_to_phecodes(clinical_events = dummy_clinical_events,
                                                            all_lkps_maps = all_lkps_maps_dummy,
                                                            min_date_only = FALSE)
#> Identified the following 5 data sources to map to phecodes: [1] **Death register** - Underlying (primary) cause of death, [2] **Death register** - Contributory (secondary) cause of death, [3] **Primary care** - `read_2` column, data provider England (Vision), [4] **Primary care** - `read_3` column, data provider England (TPP), [5] **Summary Diagnoses - Hospital inpatient - Health-related outcomes** - Diagnoses - ICD9
#> 
#> ***MAPPING clinical_events TO PHECODES***
#> [1] **Death register** - Underlying (primary) cause of death
#> [2] **Death register** - Contributory (secondary) cause of death
#> [3] **Primary care** - `read_2` column, data provider England (Vision)
#> [4] **Primary care** - `read_3` column, data provider England (TPP)
#> [5] **Summary Diagnoses - Hospital inpatient - Health-related outcomes** - Diagnoses - ICD9
#> Time taken: 0 minutes, 0 seconds.

dummy_clinical_events_phecodes
#> # A tibble: 6 × 7
#>     eid source  index code  date       icd10 phecode
#>   <dbl> <chr>   <chr> <chr> <chr>      <chr> <chr>  
#> 1     1 f40001  0_0   NA    1917-10-08 I10   401.1  
#> 2     1 f40002  0_0   NA    1955-02-11 E109  250.1  
#> 3     1 gpc3_r3 3     XaIP9 1917-10-08 L721  706.2  
#> 4     1 gpc3_r3 3     XaIP9 1917-10-08 L721  704    
#> 5     1 gpc3_r3 3     XE0Uc 1917-10-08 I10   401.1  
#> 6     1 f41271  0_0   4019  1910-02-19 I10   401.1

ICD10 codes are mapped directly to phecodes, while non-ICD10 sources are mapped to phecodes via ICD10. Use the col_filters argument to determine which code mappings to include (see vignette(map_codes) for further details).

The following data sources (if present in the clinical events data frame) are included in the mapping:

source data_coding description category file
f40001 icd10 Underlying (primary) cause of death Death register ukb_main
f40002 icd10 Contributory (secondary) cause of death Death register ukb_main
f20002_icd10 icd10 Non-cancer illness code, self-reported Medical conditions ukb_main
f40013 icd9 Type of cancer: ICD9 Cancer register ukb_main
f40006 icd10 Type of cancer: ICD10 Cancer register ukb_main
f41270 icd10 Diagnoses - ICD10 Summary Diagnoses - Hospital inpatient - Health-related outcomes ukb_main
f41271 icd9 Diagnoses - ICD9 Summary Diagnoses - Hospital inpatient - Health-related outcomes ukb_main
gpc1_r2 read2 read_2 column, data provider England (Vision) Primary care gp_clinical
gpc2_r2 read2 read_2 column, data provider Scotland Primary care gp_clinical
gpc3_r2 read2 read_2 column, data provider England (TPP) Primary care gp_clinical
gpc4_r2 read2 read_2 column, data provider Wales Primary care gp_clinical
gpc1_r3 read3 read_3 column, data provider England (Vision) Primary care gp_clinical
gpc2_r3 read3 read_3 column, data provider Scotland Primary care gp_clinical
gpc3_r3 read3 read_3 column, data provider England (TPP) Primary care gp_clinical
gpc4_r3 read3 read_3 column, data provider Wales Primary care gp_clinical

The the output from map_clinical_events_to_phecodes() may be reformatted for running a phenome-wide association study using the PheWAS package as follows:

dummy_clinical_events_phecodes %>%
               select(eid,
                      phecode,
                      date) %>%
               mutate(date = TRUE) %>%
               # ensure one phecode only per eid
               distinct() %>%
               tidyr::pivot_wider(names_from = phecode,
                           values_from = date,
                           values_fill = FALSE)
#> # A tibble: 1 × 5
#>     eid `401.1` `250.1` `706.2` `704`
#>   <dbl> <lgl>   <lgl>   <lgl>   <lgl>
#> 1     1 TRUE    TRUE    TRUE    TRUE

Reverse mapping

Ideally all Read and ICD9 code to phecode mappings should be manually checked prior to their use. At the very least, any findings of interest should be manually reviewed to ascertain exactly which clinical codes were used. Use make_phecode_reverse_map() to achieve this

make_phecode_reverse_map(clinical_events_phecodes = dummy_clinical_events_phecodes,
                           all_lkps_maps = all_lkps_maps_dummy) %>% 
  knitr::kable()
#> Time taken: 0 minutes, 0 seconds.
phecode phecode_description data_coding code description icd10_equivalent icd10_description
401.1 Essential hypertension icd10 I10 Essential (primary) hypertension I10 Essential (primary) hypertension
250.1 Type 1 diabetes icd10 E109 Type 1 diabetes mellitus Without complications E109 Type 1 diabetes mellitus Without complications
401.1 Essential hypertension icd9 4019 ESSENTIAL HYPERTENSION NOT SPECIFIED I10 Essential (primary) hypertension
706.2 Sebaceous cyst read3 XaIP9 Sebaceous cyst L721 Trichilemmal cyst
704 Diseases of hair and hair follicles read3 XaIP9 Sebaceous cyst L721 Trichilemmal cyst
401.1 Essential hypertension read3 XE0Uc Essential hypertension I10 Essential (primary) hypertension

References

Denny, Joshua C., Lisa Bastarache, and Dan M. Roden. 2016. “Phenome-Wide Association Studies as a Tool to Advance Precision Medicine.” Annual Review of Genomics and Human Genetics 17 (August): 353–73. https://doi.org/10.1146/annurev-genom-090314-024956.

Wu, Patrick, Aliya Gifford, Xiangrui Meng, Xue Li, Harry Campbell, Tim Varley, Juan Zhao, et al. 2019. “Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation.” JMIR Medical Informatics 7 (4): e14325. https://doi.org/10.2196/14325.