Skip to contents

Data in a UK Biobank main dataset is stored in wide format i.e. a single row of data per UK Biobank participant ('eid's). Clinical events may be ascertained from numerous sources (e.g. self-reported medical conditions, linked hospital records) with coded events and their associated dates recorded across multiple columns. This function tidies this data into a standardised long format table.

Usage

tidy_clinical_events(
  ukb_main,
  ukb_data_dict = get_ukb_data_dict(),
  ukb_codings = get_ukb_codings(),
  clinical_events_sources = c("primary_death_icd10", "secondary_death_icd10",
    "self_report_medication", "self_report_non_cancer", "self_report_non_cancer_icd10",
    "self_report_cancer", "self_report_operation", "cancer_register_icd9",
    "cancer_register_icd10", "summary_hes_icd9", "summary_hes_icd10",
    "summary_hes_opcs3", "summary_hes_opcs4"),
  strict = TRUE,
  .details_only = FALSE
)

Arguments

ukb_main

A UK Biobank main dataset.

ukb_data_dict

The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type character.

ukb_codings

The UKB codings file (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type character.

clinical_events_sources

A character vector of clinical events sources to tidy. By default, all available options are included.

strict

If TRUE, raise an error if required columns for any clinical events sources listed in clinical_events are not present in ukb_main. If FALSE, then a warning message will be displayed instead. Default value is TRUE.

.details_only

If TRUE, return a list detailing required Field IDs

Value

A named list of clinical events data frames.

Details

A named list of data frames is returned, with the names corresponding to the data sources specified by clinical_events. Each data frame has the following columns:

  • eid - participant identifier

  • source - the FieldID (prefixed by 'f') where clinical codes were extracted from. See clinical_events_sources for further details.

  • index

    • the corresponding instance and array (e.g. '0-1' means instance 0 and array

    1. code - clinical code. The type of clinical codings system used depends on source.

    2. date - associated date. Note that in cases where participants self-reported a medical condition but recorded the date as either 'Date uncertain or unknown' or 'Preferred not to answer' (see data coding 13) then the date is set to NA.

Other notes

Results may be combined into a single data frame using bind_rows.

Examples

# dummy UKB main dataset and metadata
dummy_ukb_main <- get_ukb_dummy("dummy_ukb_main.tsv")
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")

# tidy clinical events in a UK Biobank main dataset
clinical_events <- tidy_clinical_events(
  ukb_main = dummy_ukb_main,
  ukb_data_dict = dummy_ukb_data_dict,
  ukb_codings = dummy_ukb_codings
)
#> Tidying clinical events for primary_death_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for secondary_death_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_medication
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_non_cancer
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_non_cancer_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_cancer
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for self_report_operation
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for cancer_register_icd9
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for cancer_register_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_icd9
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_icd10
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_opcs3
#> Time taken: 0 minutes, 0 seconds.
#> Tidying clinical events for summary_hes_opcs4
#> Time taken: 0 minutes, 0 seconds.

# returns a named list of data frames, one for each `clinical_events_source`
names(clinical_events)
#>  [1] "primary_death_icd10"          "secondary_death_icd10"       
#>  [3] "self_report_medication"       "self_report_non_cancer"      
#>  [5] "self_report_non_cancer_icd10" "self_report_cancer"          
#>  [7] "self_report_operation"        "cancer_register_icd9"        
#>  [9] "cancer_register_icd10"        "summary_hes_icd9"            
#> [11] "summary_hes_icd10"            "summary_hes_opcs3"           
#> [13] "summary_hes_opcs4"           

clinical_events$summary_hes_icd10
#>      eid source  index   code       date
#>    <int> <char> <lgcl> <char>     <char>
#> 1:     1 f41270     NA   X715 1955-11-12
#> 2:     2 f41270     NA    E11 1939-02-16
#> 3:     1 f41270     NA   X715 1910-02-19
#> 4:     2 f41270     NA    E11 1965-08-08
#> 5:     1 f41270     NA    E10 1955-11-12
#> 6:     2 f41270     NA  M0087 1939-02-16
#> 7:     1 f41270     NA    E10 1910-02-19
#> 8:     2 f41270     NA  M0087 1965-08-08

# use .details_only = TRUE to return details of required Field IDs for
# specific clinical_events sources
tidy_clinical_events(.details_only = TRUE)
#> $required_field_ids
#> $required_field_ids$primary_death_icd10
#> code_fid date_fid 
#>  "40001"  "40000" 
#> 
#> $required_field_ids$secondary_death_icd10
#> code_fid date_fid 
#>  "40002"  "40000" 
#> 
#> $required_field_ids$self_report_medication
#> code_fid date_fid 
#>  "20003"     "53" 
#> 
#> $required_field_ids$self_report_non_cancer
#> code_fid date_fid 
#>  "20002"  "20008" 
#> 
#> $required_field_ids$self_report_non_cancer_icd10
#> code_fid date_fid 
#>  "20002"  "20008" 
#> 
#> $required_field_ids$self_report_cancer
#> code_fid date_fid 
#>  "20001"  "20006" 
#> 
#> $required_field_ids$self_report_operation
#> code_fid date_fid 
#>  "20004"  "20010" 
#> 
#> $required_field_ids$cancer_register_icd9
#> code_fid date_fid 
#>  "40013"  "40005" 
#> 
#> $required_field_ids$cancer_register_icd10
#> code_fid date_fid 
#>  "40006"  "40005" 
#> 
#> $required_field_ids$summary_hes_icd9
#> code_fid date_fid 
#>  "41271"  "41281" 
#> 
#> $required_field_ids$summary_hes_icd10
#> code_fid date_fid 
#>  "41270"  "41280" 
#> 
#> $required_field_ids$summary_hes_opcs3
#> code_fid date_fid 
#>  "41273"  "41283" 
#> 
#> $required_field_ids$summary_hes_opcs4
#> code_fid date_fid 
#>  "41272"  "41282" 
#> 
#>