Tidy clinical events data from a UK Biobank main dataset
Source:R/clinical_events.R
tidy_clinical_events.RdData in a UK Biobank main dataset is stored in wide format i.e. a single row of data per UK Biobank participant ('eid's). Clinical events may be ascertained from numerous sources (e.g. self-reported medical conditions, linked hospital records) with coded events and their associated dates recorded across multiple columns. This function tidies this data into a standardised long format table.
Usage
tidy_clinical_events(
ukb_main,
ukb_data_dict = get_ukb_data_dict(),
ukb_codings = get_ukb_codings(),
clinical_events_sources = c("primary_death_icd10", "secondary_death_icd10",
"self_report_medication", "self_report_non_cancer", "self_report_non_cancer_icd10",
"self_report_cancer", "self_report_operation", "cancer_register_icd9",
"cancer_register_icd10", "summary_hes_icd9", "summary_hes_icd10",
"summary_hes_opcs3", "summary_hes_opcs4"),
strict = TRUE,
.details_only = FALSE
)Arguments
- ukb_main
A UK Biobank main dataset.
- ukb_data_dict
The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type
character.- ukb_codings
The UKB codings file (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type
character.- clinical_events_sources
A character vector of clinical events sources to tidy. By default, all available options are included.
- strict
If
TRUE, raise an error if required columns for any clinical events sources listed inclinical_eventsare not present inukb_main. IfFALSE, then a warning message will be displayed instead. Default value isTRUE.- .details_only
If
TRUE, return a list detailing required Field IDs
Details
A named list of data frames is returned, with the names corresponding to the
data sources specified by clinical_events. Each data frame has the
following columns:
eid- participant identifiersource- the FieldID (prefixed by 'f') where clinical codes were extracted from. Seeclinical_events_sourcesfor further details.indexthe corresponding instance and array (e.g. '0-1' means instance 0 and array
code- clinical code. The type of clinical codings system used depends onsource.date- associated date. Note that in cases where participants self-reported a medical condition but recorded the date as either 'Date uncertain or unknown' or 'Preferred not to answer' (see data coding 13) then the date is set toNA.
Other notes
Results may be combined into a single data frame using
bind_rows.
See also
Other clinical events:
clinical_events_sources(),
example_clinical_codes(),
extract_phenotypes(),
make_clinical_events_db()
Examples
# dummy UKB main dataset and metadata
dummy_ukb_main <- get_ukb_dummy("dummy_ukb_main.tsv")
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")
# tidy clinical events in a UK Biobank main dataset
clinical_events <- tidy_clinical_events(
ukb_main = dummy_ukb_main,
ukb_data_dict = dummy_ukb_data_dict,
ukb_codings = dummy_ukb_codings
)
#> Tidying clinical events for primary_death_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for secondary_death_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_medication
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_non_cancer
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_non_cancer_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_cancer
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_operation
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for cancer_register_icd9
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for cancer_register_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_icd9
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_opcs3
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_opcs4
#> Time taken: 0 minutes, 0 seconds.
# returns a named list of data frames, one for each `clinical_events_source`
names(clinical_events)
#> [1] "primary_death_icd10" "secondary_death_icd10"
#> [3] "self_report_medication" "self_report_non_cancer"
#> [5] "self_report_non_cancer_icd10" "self_report_cancer"
#> [7] "self_report_operation" "cancer_register_icd9"
#> [9] "cancer_register_icd10" "summary_hes_icd9"
#> [11] "summary_hes_icd10" "summary_hes_opcs3"
#> [13] "summary_hes_opcs4"
clinical_events$summary_hes_icd10
#> eid source index code date
#> <int> <char> <lgcl> <char> <char>
#> 1: 1 f41270 NA X715 1955-11-12
#> 2: 2 f41270 NA E11 1939-02-16
#> 3: 1 f41270 NA X715 1910-02-19
#> 4: 2 f41270 NA E11 1965-08-08
#> 5: 1 f41270 NA E10 1955-11-12
#> 6: 2 f41270 NA M0087 1939-02-16
#> 7: 1 f41270 NA E10 1910-02-19
#> 8: 2 f41270 NA M0087 1965-08-08
# use .details_only = TRUE to return details of required Field IDs for
# specific clinical_events sources
tidy_clinical_events(.details_only = TRUE)
#> $required_field_ids
#> $required_field_ids$primary_death_icd10
#> code_fid date_fid
#> "40001" "40000"
#>
#> $required_field_ids$secondary_death_icd10
#> code_fid date_fid
#> "40002" "40000"
#>
#> $required_field_ids$self_report_medication
#> code_fid date_fid
#> "20003" "53"
#>
#> $required_field_ids$self_report_non_cancer
#> code_fid date_fid
#> "20002" "20008"
#>
#> $required_field_ids$self_report_non_cancer_icd10
#> code_fid date_fid
#> "20002" "20008"
#>
#> $required_field_ids$self_report_cancer
#> code_fid date_fid
#> "20001" "20006"
#>
#> $required_field_ids$self_report_operation
#> code_fid date_fid
#> "20004" "20010"
#>
#> $required_field_ids$cancer_register_icd9
#> code_fid date_fid
#> "40013" "40005"
#>
#> $required_field_ids$cancer_register_icd10
#> code_fid date_fid
#> "40006" "40005"
#>
#> $required_field_ids$summary_hes_icd9
#> code_fid date_fid
#> "41271" "41281"
#>
#> $required_field_ids$summary_hes_icd10
#> code_fid date_fid
#> "41270" "41280"
#>
#> $required_field_ids$summary_hes_opcs3
#> code_fid date_fid
#> "41273" "41283"
#>
#> $required_field_ids$summary_hes_opcs4
#> code_fid date_fid
#> "41272" "41282"
#>
#>