Derive simplified ethnic background
Source:R/derived_variables.R
derive_ethnic_background_simplified.Rd
Simplifies ethnic background in a UK Biobank main dataset to the main categories for Field ID 21000.
Usage
derive_ethnic_background_simplified(
ukb_main,
ukb_data_dict = get_ukb_data_dict(),
ethnicity_levels = c("White", "Mixed", "Asian or Asian British",
"Black or Black British", "Chinese", "Other ethnic group"),
.drop = FALSE,
.details_only = FALSE
)
Arguments
- ukb_main
A UK Biobank main dataset.
- ukb_data_dict
The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type
character
.- ethnicity_levels
The factor level order for the appended
ethnic_background_simplified
column. By default, the baseline level is set to "White" ethnicity.- .drop
If
TRUE
, remove the required input columns from the result- .details_only
If
TRUE
, return a list containing details of required input variables (Field IDs) and derived variables (new column name, label and values/value labels).
Details
Categories "Do not know" and "Prefer not to answer" are converted to
NA
. A new column called ethnic_background_simplified
of type
factor is appended to the input data frame. By default, "White" ethnicity is
set to the baseline level as this is the largest category. Levels can be
explicitly specified using the ethnicity_levels
argument.
Examples
library(magrittr)
# dummy UKB data and data dictionary
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")
dummy_ukb_main <- read_ukb(
path = get_ukb_dummy("dummy_ukb_main.tsv", path_only = TRUE),
ukb_data_dict = dummy_ukb_data_dict,
ukb_codings = dummy_ukb_codings
)
#> Creating data dictionary
#> STEP 1 of 3
#> Reading data into R
#> STEP 2 of 3
#> Renaming with descriptive column names
#> STEP 3 of 3
#> Applying variable and value labels
#> Labelling dataset
#> Time taken: 0 minutes, 0 seconds.
# derive ethnic background
derive_ethnic_background_simplified(
ukb_main = dummy_ukb_main,
ukb_data_dict = dummy_ukb_data_dict
) %>%
dplyr::select(tidyselect::contains("ethnic"))
#> ethnic_background_f21000_0_0 ethnic_background_f21000_1_0
#> <fctr> <fctr>
#> 1: Do not know Mixed
#> 2: Prefer not to answer White and Black Caribbean
#> 3: White White and Black African
#> 4: British White and Asian
#> 5: Irish Any other mixed background
#> 6: Any other white background Asian or Asian British
#> 7: <NA> Indian
#> 8: <NA> Chinese
#> 9: Caribbean <NA>
#> 10: Caribbean <NA>
#> ethnic_background_f21000_2_0 ethnic_background_simplified
#> <fctr> <fctr>
#> 1: Bangladeshi Mixed
#> 2: Any other Asian background Mixed
#> 3: Do not know White
#> 4: Caribbean White
#> 5: African White
#> 6: Any other Black background White
#> 7: Chinese Asian or Asian British
#> 8: <NA> Chinese
#> 9: <NA> Black or Black British
#> 10: <NA> Black or Black British