Summarises numerical variables with repeated measurements either by field (i.e. all available measurements) or by instance (i.e. for all measurements at each assessment visit). Currently available summary options are mean, minimum, maximum, sum and number of non-missing values.
Usage
summarise_numerical_variables(
ukb_main,
data_dict = NULL,
ukb_data_dict = get_ukb_data_dict(),
summary_function = "mean",
summarise_by = "Field",
.drop = FALSE
)
Arguments
- ukb_main
A UK Biobank main dataset data frame. Column names must match those under the
descriptive_colnames
column indata_dict
.- data_dict
a data dictionary specific to the UKB main dataset file, created by
make_data_dict
.- ukb_data_dict
The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type
character
.- summary_function
The summary function to be applied. Options: "mean", "min", "max", "sum" or "n_values"
- summarise_by
Whether to summarise by "Field" or by "Instance".
- .drop
If
TRUE
, removes the original numerical variables from the result. Default value isFALSE
.
Value
A data frame with new columns summarising numerical variables. The
names for these new columns are prefixed by the value for
summary_function
and end with 'x', FieldID +/- instance being
summarised e.g. if summarising FieldID 4080 instance 0, the new column
would be named 'mean_systolic_blood_pressure_automated_reading_x4080_0'.
Details
Note that when summary_function = "sum"
, missing values are converted
to zero. Therefore if a set of values are all missing then the sum
will summarised as 0
. See the documentation for
rowSums
for further details.
Examples
library(magrittr)
# get dummy UKB data and data dictionary
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")
dummy_ukb_main <- read_ukb(
path = get_ukb_dummy("dummy_ukb_main.tsv", path_only = TRUE),
ukb_data_dict = dummy_ukb_data_dict,
ukb_codings = dummy_ukb_codings
) %>%
dplyr::select(eid, tidyselect::contains("systolic_blood_pressure")) %>%
tibble::as_tibble()
#> Creating data dictionary
#> STEP 1 of 3
#> Reading data into R
#> STEP 2 of 3
#> Renaming with descriptive column names
#> STEP 3 of 3
#> Applying variable and value labels
#> Labelling dataset
#> Time taken: 0 minutes, 0 seconds.
# summarise mean values by Field, keep original variables
summarise_numerical_variables(
dummy_ukb_main,
ukb_data_dict = dummy_ukb_data_dict
)
#> Number of summary columns to make: 1
#> Time taken: 0 minutes, 0 seconds.
#> # A tibble: 10 × 10
#> eid systolic_blood_pressure…¹ systolic_blood_press…² systolic_blood_press…³
#> <int> <int> <int> <int>
#> 1 1 NA 134 134
#> 2 2 146 145 145
#> 3 3 143 123 123
#> 4 4 NA NA NA
#> 5 5 NA NA NA
#> 6 6 NA NA NA
#> 7 7 NA NA NA
#> 8 8 NA NA NA
#> 9 9 NA NA NA
#> 10 10 NA NA NA
#> # ℹ abbreviated names: ¹systolic_blood_pressure_automated_reading_f4080_0_0,
#> # ²systolic_blood_pressure_automated_reading_f4080_0_1,
#> # ³systolic_blood_pressure_automated_reading_f4080_0_2
#> # ℹ 6 more variables:
#> # systolic_blood_pressure_automated_reading_f4080_0_3 <int>,
#> # systolic_blood_pressure_automated_reading_f4080_1_0 <int>,
#> # systolic_blood_pressure_automated_reading_f4080_1_1 <int>, …
# summarise mean values by Field, drop original variables
summarise_numerical_variables(
dummy_ukb_main,
ukb_data_dict = dummy_ukb_data_dict,
.drop = TRUE
)
#> Number of summary columns to make: 1
#> Time taken: 0 minutes, 0 seconds.
#> # A tibble: 10 × 2
#> eid mean_systolic_blood_pressure_automated_reading_x4080
#> <int> <dbl>
#> 1 1 138.
#> 2 2 143.
#> 3 3 130.
#> 4 4 NaN
#> 5 5 NaN
#> 6 6 NaN
#> 7 7 NaN
#> 8 8 NaN
#> 9 9 NaN
#> 10 10 NaN
# summarise min values by instance, dropping original variables
summarise_numerical_variables(
dummy_ukb_main,
ukb_data_dict = dummy_ukb_data_dict,
summary_function = "min",
summarise_by = "Instance",
.drop = TRUE
)
#> Number of summary columns to make: 2
#> Time taken: 0 minutes, 0 seconds.
#> # A tibble: 10 × 3
#> eid min_systolic_blood_pressure_automated_reading_…¹ min_systolic_blood_p…²
#> <int> <int> <int>
#> 1 1 134 134
#> 2 2 145 129
#> 3 3 123 123
#> 4 4 NA NA
#> 5 5 NA NA
#> 6 6 NA NA
#> 7 7 NA NA
#> 8 8 NA NA
#> 9 9 NA NA
#> 10 10 NA NA
#> # ℹ abbreviated names: ¹min_systolic_blood_pressure_automated_reading_x4080_0,
#> # ²min_systolic_blood_pressure_automated_reading_x4080_1