R/clinical_events_to_phecodes.R
map_clinical_events_to_phecodes.Rd
UK Biobank clinical events sources that are recorded in ICD10 are mapped directly to phecodes, while non-ICD10 sources are mapped to phecodes via ICD10.
map_clinical_events_to_phecodes(
clinical_events,
all_lkps_maps = NULL,
min_date_only = FALSE,
col_filters = default_col_filters()
)
A long format data frame created by
tidy_clinical_events
, tidy_gp_clinical
,
tidy_gp_scripts
or make_clinical_events_db
.
This can also be a tbl_dbi
object.
Either a named list of lookup and mapping tables (either
data frames or tbl_dbi
objects), or the path to a SQLite database
containing these tables (see also build_all_lkps_maps()
and
all_lkps_maps_to_db()
). If NULL
, will attempt to connect to an SQLite
database named 'all_lkps_maps.db' in the current working directory, or to a
a SQLite database specified by an environmental variable named
'ALL_LKPS_MAPS_DB' (see
here
for how to set environment variables using a .Renviron
file). The latter
method will be used in preference.
If TRUE
, result will be filtered for only the earliest
date per eid-phecode pair (date will be recorded as NA
for cases where
there are no dates).
A named list where each name in the list refers to the
name of a lookup or mapping table. Each item is also a named list, where
the names refer to column names in the corresponding table, and the items
are vectors of values to filter for. For example, list(my_lookup_table = list(colA = c("A", "B"))
will result in my_lookup_table
being filtered
for rows where colA
is either 'A' or 'B'. Uses default_col_filters()
by
default. Set to NULL
to remove all filters.
A data frame with column names 'eid', 'source', 'index', 'code', 'icd10', 'phecode' and 'date'.
Maps the following UK Biobank clinical events sources to phecodes: f40001, f40002, f20002_icd10, f40006, f41270, f40013, f41271, gpc1_r3, gpc2_r3, gpc3_r3, gpc4_r3, gpc1_r2, gpc2_r2, gpc3_r2, gpc4_r2.
# build dummy all_lkps_maps
all_lkps_maps_dummy <- build_all_lkps_maps_dummy()
# dummy clinical events data frame
dummy_clinical_events_tidy()
#> # A tibble: 7 × 5
#> eid source index code date
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 f40001 0_0 I10 1917-10-08
#> 2 1 f40002 0_0 E109 1955-02-11
#> 3 1 f41271 0_0 4019 1910-02-19
#> 4 1 gpc1_r2 1 C10.. 1965-08-08
#> 5 1 gpc1_r2 2 C10.. 1917-10-08
#> 6 1 gpc3_r3 3 XaIP9 1917-10-08
#> 7 1 gpc3_r3 3 XE0Uc 1917-10-08
# map to phecodes
map_clinical_events_to_phecodes(
clinical_events = dummy_clinical_events_tidy(),
all_lkps_maps = all_lkps_maps_dummy,
min_date_only = FALSE
)
#> Identified the following 5 data sources to map to phecodes: [1] **Death register** - Underlying (primary) cause of death, [2] **Death register** - Contributory (secondary) cause of death, [3] **Primary care** - `read_2` column, data provider England (Vision), [4] **Primary care** - `read_3` column, data provider England (TPP), [5] **Summary Diagnoses - Hospital inpatient - Health-related outcomes** - Diagnoses - ICD9
#>
#> ***MAPPING clinical_events TO PHECODES***
#> [1] **Death register** - Underlying (primary) cause of death
#> [2] **Death register** - Contributory (secondary) cause of death
#> [3] **Primary care** - `read_2` column, data provider England (Vision)
#> [4] **Primary care** - `read_3` column, data provider England (TPP)
#> [5] **Summary Diagnoses - Hospital inpatient - Health-related outcomes** - Diagnoses - ICD9
#> Time taken: 0 minutes, 0 seconds.
#> # A tibble: 6 × 7
#> eid source index code date icd10 phecode
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 f40001 0_0 NA 1917-10-08 I10 401.1
#> 2 1 f40002 0_0 NA 1955-02-11 E109 250.1
#> 3 1 gpc3_r3 3 XaIP9 1917-10-08 L721 706.2
#> 4 1 gpc3_r3 3 XaIP9 1917-10-08 L721 704
#> 5 1 gpc3_r3 3 XE0Uc 1917-10-08 I10 401.1
#> 6 1 f41271 0_0 4019 1910-02-19 I10 401.1