library(codemapper)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
all_lkps_maps_dummy <- build_all_lkps_maps_dummy()
A phenome-wide association study analyses associations between a single genetic (or non-genetic) exposure across a number of phenotypes. One way to perform this is by using phecodes.(Denny, Bastarache, and Roden 2016; Wu et al. 2019)
This vignette outlines one approach for mapping UK Biobank clinical codes to phecodes.
A UK Biobank clinical events table created by ukbwranglr::tidy_clinical_events()
may be mapped to Phecodes using map_clinical_events_to_phecodes()
:
# dummy clinical events data frame
dummy_clinical_events <- dummy_clinical_events_tidy()
# map to Phecodes
dummy_clinical_events_phecodes <- map_clinical_events_to_phecodes(clinical_events = dummy_clinical_events,
all_lkps_maps = all_lkps_maps_dummy,
min_date_only = FALSE)
#> Identified the following 5 data sources to map to phecodes: [1] **Death register** - Underlying (primary) cause of death, [2] **Death register** - Contributory (secondary) cause of death, [3] **Primary care** - `read_2` column, data provider England (Vision), [4] **Primary care** - `read_3` column, data provider England (TPP), [5] **Summary Diagnoses - Hospital inpatient - Health-related outcomes** - Diagnoses - ICD9
#>
#> ***MAPPING clinical_events TO PHECODES***
#> [1] **Death register** - Underlying (primary) cause of death
#> [2] **Death register** - Contributory (secondary) cause of death
#> [3] **Primary care** - `read_2` column, data provider England (Vision)
#> [4] **Primary care** - `read_3` column, data provider England (TPP)
#> [5] **Summary Diagnoses - Hospital inpatient - Health-related outcomes** - Diagnoses - ICD9
#> Time taken: 0 minutes, 0 seconds.
dummy_clinical_events_phecodes
#> # A tibble: 6 × 7
#> eid source index code date icd10 phecode
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 f40001 0_0 NA 1917-10-08 I10 401.1
#> 2 1 f40002 0_0 NA 1955-02-11 E109 250.1
#> 3 1 gpc3_r3 3 XaIP9 1917-10-08 L721 706.2
#> 4 1 gpc3_r3 3 XaIP9 1917-10-08 L721 704
#> 5 1 gpc3_r3 3 XE0Uc 1917-10-08 I10 401.1
#> 6 1 f41271 0_0 4019 1910-02-19 I10 401.1
ICD10 codes are mapped directly to phecodes, while non-ICD10 sources are mapped to phecodes via ICD10. Use the col_filters
argument to determine which code mappings to include (see vignette(map_codes)
for further details).
The following data sources (if present in the clinical events data frame) are included in the mapping:
source | data_coding | description | category | file |
---|---|---|---|---|
f40001 | icd10 | Underlying (primary) cause of death | Death register | ukb_main |
f40002 | icd10 | Contributory (secondary) cause of death | Death register | ukb_main |
f20002_icd10 | icd10 | Non-cancer illness code, self-reported | Medical conditions | ukb_main |
f40013 | icd9 | Type of cancer: ICD9 | Cancer register | ukb_main |
f40006 | icd10 | Type of cancer: ICD10 | Cancer register | ukb_main |
f41270 | icd10 | Diagnoses - ICD10 | Summary Diagnoses - Hospital inpatient - Health-related outcomes | ukb_main |
f41271 | icd9 | Diagnoses - ICD9 | Summary Diagnoses - Hospital inpatient - Health-related outcomes | ukb_main |
gpc1_r2 | read2 |
read_2 column, data provider England (Vision) |
Primary care | gp_clinical |
gpc2_r2 | read2 |
read_2 column, data provider Scotland |
Primary care | gp_clinical |
gpc3_r2 | read2 |
read_2 column, data provider England (TPP) |
Primary care | gp_clinical |
gpc4_r2 | read2 |
read_2 column, data provider Wales |
Primary care | gp_clinical |
gpc1_r3 | read3 |
read_3 column, data provider England (Vision) |
Primary care | gp_clinical |
gpc2_r3 | read3 |
read_3 column, data provider Scotland |
Primary care | gp_clinical |
gpc3_r3 | read3 |
read_3 column, data provider England (TPP) |
Primary care | gp_clinical |
gpc4_r3 | read3 |
read_3 column, data provider Wales |
Primary care | gp_clinical |
The the output from map_clinical_events_to_phecodes()
may be reformatted for running a phenome-wide association study using the PheWAS package as follows:
dummy_clinical_events_phecodes %>%
select(eid,
phecode,
date) %>%
mutate(date = TRUE) %>%
# ensure one phecode only per eid
distinct() %>%
tidyr::pivot_wider(names_from = phecode,
values_from = date,
values_fill = FALSE)
#> # A tibble: 1 × 5
#> eid `401.1` `250.1` `706.2` `704`
#> <dbl> <lgl> <lgl> <lgl> <lgl>
#> 1 1 TRUE TRUE TRUE TRUE
Ideally all Read and ICD9 code to phecode mappings should be manually checked prior to their use. At the very least, any findings of interest should be manually reviewed to ascertain exactly which clinical codes were used. Use make_phecode_reverse_map()
to achieve this
make_phecode_reverse_map(clinical_events_phecodes = dummy_clinical_events_phecodes,
all_lkps_maps = all_lkps_maps_dummy) %>%
knitr::kable()
#> Time taken: 0 minutes, 0 seconds.
phecode | phecode_description | data_coding | code | description | icd10_equivalent | icd10_description |
---|---|---|---|---|---|---|
401.1 | Essential hypertension | icd10 | I10 | Essential (primary) hypertension | I10 | Essential (primary) hypertension |
250.1 | Type 1 diabetes | icd10 | E109 | Type 1 diabetes mellitus Without complications | E109 | Type 1 diabetes mellitus Without complications |
401.1 | Essential hypertension | icd9 | 4019 | ESSENTIAL HYPERTENSION NOT SPECIFIED | I10 | Essential (primary) hypertension |
706.2 | Sebaceous cyst | read3 | XaIP9 | Sebaceous cyst | L721 | Trichilemmal cyst |
704 | Diseases of hair and hair follicles | read3 | XaIP9 | Sebaceous cyst | L721 | Trichilemmal cyst |
401.1 | Essential hypertension | read3 | XE0Uc | Essential hypertension | I10 | Essential (primary) hypertension |
Denny, Joshua C., Lisa Bastarache, and Dan M. Roden. 2016. “Phenome-Wide Association Studies as a Tool to Advance Precision Medicine.” Annual Review of Genomics and Human Genetics 17 (August): 353–73. https://doi.org/10.1146/annurev-genom-090314-024956.
Wu, Patrick, Aliya Gifford, Xiangrui Meng, Xue Li, Harry Campbell, Tim Varley, Juan Zhao, et al. 2019. “Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation.” JMIR Medical Informatics 7 (4): e14325. https://doi.org/10.2196/14325.