Skip to contents

Creates a unique participant ID by concatenating values from a selection of UKB data fields. An error is raised if the final ID column contains non-unique values. Manual validation of any subsequent linkage is strongly advised.

Usage

create_unique_id(
  ukb_main,
  ukb_data_dict = ukbwranglr::get_ukb_data_dict(),
  field_ids = c("31", "52", "34", "21000", "53", "96", "50"),
  instances = "0",
  id_col = "..unique_id",
  remove = TRUE,
  .ignore_duplicate_ids = FALSE
)

Arguments

ukb_main

A data frame - a UKB main dataset.

ukb_data_dict

The UK Biobank data dictionary (available data online).

field_ids

A character vector of fields IDs that will be used to create the new unique ID column. These should match the values under column 'Field' in the UK Biobank data dictionary.

instances

A character vector of instances to include when generating the new unique ID column. Should contain one or more of the following digits: '0', '1', '2', '3'. Note that more recent datasets may include instances that are not present in older datasets. By default only the first instance is used.

id_col

Name of the the new column to be created.

remove

If TRUE, remove input columns from output data frame.

.ignore_duplicate_ids

If TRUE, allow duplicate ID values and raise a warning if any are found. May be helpful for debugging. By default this is FALSE.

Value

A data frame with an additional column named as specified by id_col.

Details

By default, the following field IDs are used: 31 (Sex), 52 (Month of birth), 34 (Year of birth), 21000 (Ethnic background), 53 (Date of attending assessment centre), 96 (Time since interview start at which blood pressure screen(s) shown), and 50 (Standing height). Any columns of type factor will be converted to type integer.