Skip to contents

Simplifies ethnic background in a UK Biobank main dataset to the main categories for Field ID 21000.

Usage

derive_ethnic_background_simplified(
  ukb_main,
  ukb_data_dict = get_ukb_data_dict(),
  ethnicity_levels = c("White", "Mixed", "Asian or Asian British",
    "Black or Black British", "Chinese", "Other ethnic group"),
  .drop = FALSE,
  .details_only = FALSE
)

Arguments

ukb_main

A UK Biobank main dataset.

ukb_data_dict

The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type character.

ethnicity_levels

The factor level order for the appended ethnic_background_simplified column. By default, the baseline level is set to "White" ethnicity.

.drop

If TRUE, remove the required input columns from the result

.details_only

If TRUE, return a list containing details of required input variables (Field IDs) and derived variables (new column name, label and values/value labels).

Value

A data frame with a column called ethnic_background_simplified

(type factor).

Details

Categories "Do not know" and "Prefer not to answer" are converted to NA. A new column called ethnic_background_simplified of type factor is appended to the input data frame. By default, "White" ethnicity is set to the baseline level as this is the largest category. Levels can be explicitly specified using the ethnicity_levels argument.

Examples

library(magrittr)
# dummy UKB data and data dictionary
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")

dummy_ukb_main <- read_ukb(
  path = get_ukb_dummy("dummy_ukb_main.tsv", path_only = TRUE),
  ukb_data_dict = dummy_ukb_data_dict,
  ukb_codings = dummy_ukb_codings
)
#> Creating data dictionary
#> STEP 1 of 3
#> Reading data into R
#> STEP 2 of 3
#> Renaming with descriptive column names
#> STEP 3 of 3
#> Applying variable and value labels
#> Labelling dataset
#> Time taken: 0 minutes, 0 seconds.

# derive ethnic background
derive_ethnic_background_simplified(
  ukb_main = dummy_ukb_main,
  ukb_data_dict = dummy_ukb_data_dict
) %>%
  dplyr::select(tidyselect::contains("ethnic"))
#>     ethnic_background_f21000_0_0 ethnic_background_f21000_1_0
#>                           <fctr>                       <fctr>
#>  1:                  Do not know                        Mixed
#>  2:         Prefer not to answer    White and Black Caribbean
#>  3:                        White      White and Black African
#>  4:                      British              White and Asian
#>  5:                        Irish   Any other mixed background
#>  6:   Any other white background       Asian or Asian British
#>  7:                         <NA>                       Indian
#>  8:                         <NA>                      Chinese
#>  9:                    Caribbean                         <NA>
#> 10:                    Caribbean                         <NA>
#>     ethnic_background_f21000_2_0 ethnic_background_simplified
#>                           <fctr>                       <fctr>
#>  1:                  Bangladeshi                        Mixed
#>  2:   Any other Asian background                        Mixed
#>  3:                  Do not know                        White
#>  4:                    Caribbean                        White
#>  5:                      African                        White
#>  6:   Any other Black background                        White
#>  7:                      Chinese       Asian or Asian British
#>  8:                         <NA>                      Chinese
#>  9:                         <NA>       Black or Black British
#> 10:                         <NA>       Black or Black British