Create a unique participant ID by combining UKB data field IDs
Source:R/link_ukb.R
create_unique_id.Rd
Creates a unique participant ID by concatenating values from a selection of UKB data fields. An error is raised if the final ID column contains non-unique values. Manual validation of any subsequent linkage is strongly advised.
Usage
create_unique_id(
ukb_main,
ukb_data_dict = ukbwranglr::get_ukb_data_dict(),
field_ids = c("31", "52", "34", "21000", "53", "96", "50"),
instances = "0",
id_col = "..unique_id",
remove = TRUE,
.ignore_duplicate_ids = FALSE
)
Arguments
- ukb_main
A data frame - a UKB main dataset.
- ukb_data_dict
The UK Biobank data dictionary (available data online).
- field_ids
A character vector of fields IDs that will be used to create the new unique ID column. These should match the values under column 'Field' in the UK Biobank data dictionary.
- instances
A character vector of instances to include when generating the new unique ID column. Should contain one or more of the following digits: '0', '1', '2', '3'. Note that more recent datasets may include instances that are not present in older datasets. By default only the first instance is used.
- id_col
Name of the the new column to be created.
- remove
If
TRUE
, remove input columns from output data frame.- .ignore_duplicate_ids
If
TRUE
, allow duplicate ID values and raise a warning if any are found. May be helpful for debugging. By default this isFALSE
.
Details
By default, the following field IDs are used: 31 (Sex), 52 (Month of birth), 34 (Year of birth), 21000 (Ethnic background), 53 (Date of attending assessment centre), 96 (Time since interview start at which blood pressure screen(s) shown), and 50 (Standing height). Any columns of type factor will be converted to type integer.