Generate a UK Biobank data dictionary — make_data

Creates a data dictionary for a raw UK Biobank main dataset, either using file path or from a data frame if the dataset has already been loaded into R.

Usage

make_data_dict(ukb_main, delim = "auto", ukb_data_dict = get_ukb_data_dict())

Arguments

ukb_main: Either the path to a UK Biobank main dataset file (character string) or a data frame.
delim: Delimiter for the UKB main dataset file. Default is "auto" (see data.table::fread()). Ignored if the file name ends with .dta (i.e. is a STATA file) or if ukb_main is a data frame.
ukb_data_dict: The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type character.

Value

A data dictionary (data frame) specific to ukb_main. This includes columns with descriptive column names ("descriptive_colnames") and the current column names ("colheaders_raw").

Examples

# dummy UKB data dictionary
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")

# create data dictionary specific to UKB main dataset, either using file path
make_data_dict(
  ukb_main = get_ukb_dummy("dummy_ukb_main.tsv", path_only = TRUE),
  ukb_data_dict = dummy_ukb_data_dict
)
#> # A tibble: 71 × 22
#>    descriptive_colnames     colheaders_raw colheaders_processed FieldID instance
#>    <chr>                    <chr>          <chr>                <chr>   <chr>   
#>  1 eid                      eid            feid                 eid     NA      
#>  2 sex_f31_0_0              31-0.0         f31_0_0              31      0       
#>  3 year_of_birth_f34_0_0    34-0.0         f34_0_0              34      0       
#>  4 month_of_birth_f52_0_0   52-0.0         f52_0_0              52      0       
#>  5 ethnic_background_f2100… 21000-0.0      f21000_0_0           21000   0       
#>  6 ethnic_background_f2100… 21000-1.0      f21000_1_0           21000   1       
#>  7 ethnic_background_f2100… 21000-2.0      f21000_2_0           21000   2       
#>  8 body_mass_index_bmi_f21… 21001-0.0      f21001_0_0           21001   0       
#>  9 body_mass_index_bmi_f21… 21001-1.0      f21001_1_0           21001   1       
#> 10 body_mass_index_bmi_f21… 21001-2.0      f21001_2_0           21001   2       
#> # ℹ 61 more rows
#> # ℹ 17 more variables: array <chr>, Path <chr>, Category <chr>, Field <chr>,
#> #   Participants <chr>, Items <chr>, Stability <chr>, ValueType <chr>,
#> #   Units <chr>, ItemType <chr>, Strata <chr>, Sexed <chr>, Instances <chr>,
#> #   Array <chr>, Coding <chr>, Notes <chr>, Link <chr>

# ...or from data frame
make_data_dict(
  ukb_main = get_ukb_dummy("dummy_ukb_main.tsv"),
  ukb_data_dict = dummy_ukb_data_dict
)
#> # A tibble: 71 × 22
#>    descriptive_colnames     colheaders_raw colheaders_processed FieldID instance
#>    <chr>                    <chr>          <chr>                <chr>   <chr>   
#>  1 eid                      eid            feid                 eid     NA      
#>  2 sex_f31_0_0              31-0.0         f31_0_0              31      0       
#>  3 year_of_birth_f34_0_0    34-0.0         f34_0_0              34      0       
#>  4 month_of_birth_f52_0_0   52-0.0         f52_0_0              52      0       
#>  5 ethnic_background_f2100… 21000-0.0      f21000_0_0           21000   0       
#>  6 ethnic_background_f2100… 21000-1.0      f21000_1_0           21000   1       
#>  7 ethnic_background_f2100… 21000-2.0      f21000_2_0           21000   2       
#>  8 body_mass_index_bmi_f21… 21001-0.0      f21001_0_0           21001   0       
#>  9 body_mass_index_bmi_f21… 21001-1.0      f21001_1_0           21001   1       
#> 10 body_mass_index_bmi_f21… 21001-2.0      f21001_2_0           21001   2       
#> # ℹ 61 more rows
#> # ℹ 17 more variables: array <chr>, Path <chr>, Category <chr>, Field <chr>,
#> #   Participants <chr>, Items <chr>, Stability <chr>, ValueType <chr>,
#> #   Units <chr>, ItemType <chr>, Strata <chr>, Sexed <chr>, Instances <chr>,
#> #   Array <chr>, Coding <chr>, Notes <chr>, Link <chr>