Tidy clinical events data from a UK Biobank main dataset
Source:R/clinical_events.R
tidy_clinical_events.Rd
Data in a UK Biobank main dataset is stored in wide format i.e. a single row of data per UK Biobank participant ('eid's). Clinical events may be ascertained from numerous sources (e.g. self-reported medical conditions, linked hospital records) with coded events and their associated dates recorded across multiple columns. This function tidies this data into a standardised long format table.
Usage
tidy_clinical_events(
ukb_main,
ukb_data_dict = get_ukb_data_dict(),
ukb_codings = get_ukb_codings(),
clinical_events_sources = c("primary_death_icd10", "secondary_death_icd10",
"self_report_medication", "self_report_non_cancer", "self_report_non_cancer_icd10",
"self_report_cancer", "self_report_operation", "cancer_register_icd9",
"cancer_register_icd10", "summary_hes_icd9", "summary_hes_icd10",
"summary_hes_opcs3", "summary_hes_opcs4"),
strict = TRUE,
.details_only = FALSE
)
Arguments
- ukb_main
A UK Biobank main dataset.
- ukb_data_dict
The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type
character
.- ukb_codings
The UKB codings file (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type
character
.- clinical_events_sources
A character vector of clinical events sources to tidy. By default, all available options are included.
- strict
If
TRUE
, raise an error if required columns for any clinical events sources listed inclinical_events
are not present inukb_main
. IfFALSE
, then a warning message will be displayed instead. Default value isTRUE
.- .details_only
If
TRUE
, return a list detailing required Field IDs
Details
A named list of data frames is returned, with the names corresponding to the
data sources specified by clinical_events
. Each data frame has the
following columns:
eid
- participant identifiersource
- the FieldID (prefixed by 'f') where clinical codes were extracted from. Seeclinical_events_sources
for further details.index
the corresponding instance and array (e.g. '0-1' means instance 0 and array
code
- clinical code. The type of clinical codings system used depends onsource
.date
- associated date. Note that in cases where participants self-reported a medical condition but recorded the date as either 'Date uncertain or unknown' or 'Preferred not to answer' (see data coding 13) then the date is set toNA
.
Other notes
Results may be combined into a single data frame using
bind_rows
.
See also
Other clinical events:
clinical_events_sources()
,
example_clinical_codes()
,
extract_phenotypes()
,
make_clinical_events_db()
Examples
# dummy UKB main dataset and metadata
dummy_ukb_main <- get_ukb_dummy("dummy_ukb_main.tsv")
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")
# tidy clinical events in a UK Biobank main dataset
clinical_events <- tidy_clinical_events(
ukb_main = dummy_ukb_main,
ukb_data_dict = dummy_ukb_data_dict,
ukb_codings = dummy_ukb_codings
)
#> Tidying clinical events for primary_death_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for secondary_death_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_medication
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_non_cancer
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_non_cancer_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_cancer
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_operation
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for cancer_register_icd9
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for cancer_register_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_icd9
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_opcs3
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_opcs4
#> Time taken: 0 minutes, 0 seconds.
# returns a named list of data frames, one for each `clinical_events_source`
names(clinical_events)
#> [1] "primary_death_icd10" "secondary_death_icd10"
#> [3] "self_report_medication" "self_report_non_cancer"
#> [5] "self_report_non_cancer_icd10" "self_report_cancer"
#> [7] "self_report_operation" "cancer_register_icd9"
#> [9] "cancer_register_icd10" "summary_hes_icd9"
#> [11] "summary_hes_icd10" "summary_hes_opcs3"
#> [13] "summary_hes_opcs4"
clinical_events$summary_hes_icd10
#> eid source index code date
#> <int> <char> <lgcl> <char> <char>
#> 1: 1 f41270 NA X715 1955-11-12
#> 2: 2 f41270 NA E11 1939-02-16
#> 3: 1 f41270 NA X715 1910-02-19
#> 4: 2 f41270 NA E11 1965-08-08
#> 5: 1 f41270 NA E10 1955-11-12
#> 6: 2 f41270 NA M0087 1939-02-16
#> 7: 1 f41270 NA E10 1910-02-19
#> 8: 2 f41270 NA M0087 1965-08-08
# use .details_only = TRUE to return details of required Field IDs for
# specific clinical_events sources
tidy_clinical_events(.details_only = TRUE)
#> $required_field_ids
#> $required_field_ids$primary_death_icd10
#> code_fid date_fid
#> "40001" "40000"
#>
#> $required_field_ids$secondary_death_icd10
#> code_fid date_fid
#> "40002" "40000"
#>
#> $required_field_ids$self_report_medication
#> code_fid date_fid
#> "20003" "53"
#>
#> $required_field_ids$self_report_non_cancer
#> code_fid date_fid
#> "20002" "20008"
#>
#> $required_field_ids$self_report_non_cancer_icd10
#> code_fid date_fid
#> "20002" "20008"
#>
#> $required_field_ids$self_report_cancer
#> code_fid date_fid
#> "20001" "20006"
#>
#> $required_field_ids$self_report_operation
#> code_fid date_fid
#> "20004" "20010"
#>
#> $required_field_ids$cancer_register_icd9
#> code_fid date_fid
#> "40013" "40005"
#>
#> $required_field_ids$cancer_register_icd10
#> code_fid date_fid
#> "40006" "40005"
#>
#> $required_field_ids$summary_hes_icd9
#> code_fid date_fid
#> "41271" "41281"
#>
#> $required_field_ids$summary_hes_icd10
#> code_fid date_fid
#> "41270" "41280"
#>
#> $required_field_ids$summary_hes_opcs3
#> code_fid date_fid
#> "41273" "41283"
#>
#> $required_field_ids$summary_hes_opcs4
#> code_fid date_fid
#> "41272" "41282"
#>
#>