Creates a data dictionary for a raw UK Biobank main dataset, either using file path or from a data frame if the dataset has already been loaded into R.
Usage
make_data_dict(ukb_main, delim = "auto", ukb_data_dict = get_ukb_data_dict())
Arguments
- ukb_main
Either the path to a UK Biobank main dataset file (character string) or a data frame.
- delim
Delimiter for the UKB main dataset file. Default is "auto" (see
data.table::fread()
). Ignored if the file name ends with.dta
(i.e. is a STATA file) or ifukb_main
is a data frame.- ukb_data_dict
The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type
character
.
Value
A data dictionary (data frame) specific to ukb_main
. This
includes columns with descriptive column names ("descriptive_colnames") and
the current column names ("colheaders_raw").
Examples
# dummy UKB data dictionary
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
# create data dictionary specific to UKB main dataset, either using file path
make_data_dict(
ukb_main = get_ukb_dummy("dummy_ukb_main.tsv", path_only = TRUE),
ukb_data_dict = dummy_ukb_data_dict
)
#> # A tibble: 71 × 22
#> descriptive_colnames colheaders_raw colheaders_processed FieldID instance
#> <chr> <chr> <chr> <chr> <chr>
#> 1 eid eid feid eid NA
#> 2 sex_f31_0_0 31-0.0 f31_0_0 31 0
#> 3 year_of_birth_f34_0_0 34-0.0 f34_0_0 34 0
#> 4 month_of_birth_f52_0_0 52-0.0 f52_0_0 52 0
#> 5 ethnic_background_f2100… 21000-0.0 f21000_0_0 21000 0
#> 6 ethnic_background_f2100… 21000-1.0 f21000_1_0 21000 1
#> 7 ethnic_background_f2100… 21000-2.0 f21000_2_0 21000 2
#> 8 body_mass_index_bmi_f21… 21001-0.0 f21001_0_0 21001 0
#> 9 body_mass_index_bmi_f21… 21001-1.0 f21001_1_0 21001 1
#> 10 body_mass_index_bmi_f21… 21001-2.0 f21001_2_0 21001 2
#> # ℹ 61 more rows
#> # ℹ 17 more variables: array <chr>, Path <chr>, Category <chr>, Field <chr>,
#> # Participants <chr>, Items <chr>, Stability <chr>, ValueType <chr>,
#> # Units <chr>, ItemType <chr>, Strata <chr>, Sexed <chr>, Instances <chr>,
#> # Array <chr>, Coding <chr>, Notes <chr>, Link <chr>
# ...or from data frame
make_data_dict(
ukb_main = get_ukb_dummy("dummy_ukb_main.tsv"),
ukb_data_dict = dummy_ukb_data_dict
)
#> # A tibble: 71 × 22
#> descriptive_colnames colheaders_raw colheaders_processed FieldID instance
#> <chr> <chr> <chr> <chr> <chr>
#> 1 eid eid feid eid NA
#> 2 sex_f31_0_0 31-0.0 f31_0_0 31 0
#> 3 year_of_birth_f34_0_0 34-0.0 f34_0_0 34 0
#> 4 month_of_birth_f52_0_0 52-0.0 f52_0_0 52 0
#> 5 ethnic_background_f2100… 21000-0.0 f21000_0_0 21000 0
#> 6 ethnic_background_f2100… 21000-1.0 f21000_1_0 21000 1
#> 7 ethnic_background_f2100… 21000-2.0 f21000_2_0 21000 2
#> 8 body_mass_index_bmi_f21… 21001-0.0 f21001_0_0 21001 0
#> 9 body_mass_index_bmi_f21… 21001-1.0 f21001_1_0 21001 1
#> 10 body_mass_index_bmi_f21… 21001-2.0 f21001_2_0 21001 2
#> # ℹ 61 more rows
#> # ℹ 17 more variables: array <chr>, Path <chr>, Category <chr>, Field <chr>,
#> # Participants <chr>, Items <chr>, Stability <chr>, ValueType <chr>,
#> # Units <chr>, ItemType <chr>, Strata <chr>, Sexed <chr>, Instances <chr>,
#> # Array <chr>, Coding <chr>, Notes <chr>, Link <chr>