A convenience function that returns a data frame for a main UK Biobank dataset with a unique ID column, created by concatenating values from a selection of variables. Manual validation of any subsequent linkage is strongly advised.
Usage
create_unique_id_df(
path,
delim = "\t",
ukb_data_dict = ukbwranglr::get_ukb_data_dict(),
ukb_codings = ukbwranglr::get_ukb_codings(),
descriptive_colnames = TRUE,
label = FALSE,
field_ids = c("31", "52", "34", "21000", "53", "96", "50"),
instances = "0",
id_col = "..unique_id",
remove = TRUE,
.ignore_duplicate_ids = FALSE
)
Arguments
- path
The path to a UK Biobank main dataset file.
- delim
Delimiter for the UKB main dataset file. Default is "auto" (see
data.table::fread()
). Ignored if the file name ends with.dta
(i.e. is a STATA file) or ifukb_main
is a data frame.- ukb_data_dict
The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type
character
.- ukb_codings
The UKB codings file (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type
character
.- descriptive_colnames
If
TRUE
, rename columns with longer descriptive names derived from the UK Biobank's data dictionary 'Field' column.- label
If
TRUE
, apply variable labels and label coded values as factors.- field_ids
A character vector of fields IDs that will be used to create the new unique ID column. These should match the values under column 'Field' in the UK Biobank data dictionary.
- instances
A character vector of instances to include when generating the new unique ID column. Should contain one or more of the following digits: '0', '1', '2', '3'. Note that more recent datasets may include instances that are not present in older datasets. By default only the first instance is used.
- id_col
Name of the the new column to be created.
- remove
If
TRUE
, remove input columns from output data frame.- .ignore_duplicate_ids
If
TRUE
, allow duplicate ID values and raise a warning if any are found. May be helpful for debugging. By default this isFALSE
.