Skip to contents

Either read a dummy UK Biobank dataset into R or return the file path only.

Usage

get_ukb_dummy(file_name, path_only = FALSE)

Arguments

file_name

Name of dummy dataset file.

path_only

If TRUE, return the file path to the dummy dataset file, otherwise if FALSE (default), read the dummy dataset into R.

Value

A data frame if path_only is FALSE (default) or a string if path_only is TRUE.

Details

The following dummy datasets are included with this package:

  • dummy_Data_Dictionary_Showcase.tsv: A subset of fields from the UK Biobank data dictionary (full version available from the UK Biobank data showcase website).

  • dummy_Codings.tsv: A subset of UK Biobank data codings (full version available from the UK Biobank data showcase website).

  • dummy_ukb_main.tsv: A dummy main UK Biobank dataset. May be read into R with read_ukb(). Tidy clinical events fields with tidy_clinical_events().

  • dummy_gp_clinical.txt: A dummy UK Biobank primary care clinical event records dataset.

  • dummy_gp_scripts.txt: A dummy UK Biobank primary care prescription records dataset.

Examples

library(magrittr)

# available dummy datasets
dummy_datasets <- c(
  "dummy_Data_Dictionary_Showcase.tsv",
  "dummy_Codings.tsv",
  "dummy_ukb_main.tsv",
  "dummy_gp_clinical.txt",
  "dummy_gp_scripts.txt"
)

# read dummy dataset into R
get_ukb_dummy("dummy_ukb_main.tsv")
#>       eid 31-0.0 34-0.0 52-0.0 21000-0.0 21000-1.0 21000-2.0 21001-0.0
#>     <int> <char> <char> <char>    <char>    <char>    <char>    <char>
#>  1:     1      0   1952      8        -1         2      3003   20.1115
#>  2:     2      0   1946      3        -3      2001      3004   30.1536
#>  3:     3      1   1951      4         1      2002        -1   22.8495
#>  4:     4      0   1956      9      1001      2003      4001      <NA>
#>  5:     5   <NA>   <NA>      4      1002      2004      4002   29.2752
#>  6:     6      1   1948      2      1003         3      4003   28.2567
#>  7:     7      0   1949     12      <NA>      3001         5      <NA>
#>  8:     8      1   1956     10      <NA>         5      <NA>      <NA>
#>  9:     9      0   1962      4      4001      <NA>      <NA>   25.4016
#> 10:    10      1   1953      2      4001      <NA>      <NA>      <NA>
#>     21001-1.0 21001-2.0 4080-0.0 4080-0.1 4080-0.2 4080-0.3 4080-1.0 4080-1.1
#>        <char>    <char>   <char>   <char>   <char>   <char>   <char>   <char>
#>  1:    20.864      <NA>     <NA>      134      134      134      159      134
#>  2:   20.2309   27.4936      146      145      145     <NA>      129      145
#>  3:   26.7929   27.6286      143      123      123      123      162      123
#>  4:      <NA>      <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
#>  5:   19.7576   14.6641     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
#>  6:    30.286   27.3534     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
#>  7:      <NA>      <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
#>  8:      <NA>      <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
#>  9:   21.9371   24.4897     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
#> 10:   25.1579   30.0483     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
#>     4080-1.2 4080-1.3 20001-0.0 20001-0.3 20001-2.0 20001-2.3 20002-0.0
#>       <char>   <char>    <char>    <char>    <char>    <char>    <char>
#>  1:      134     <NA>      1048      1005      1045      1017      1665
#>  2:      145      145      1046      1003      1028      1039      1383
#>  3:      123      123      <NA>      <NA>      <NA>      <NA>      1665
#>  4:     <NA>     <NA>      <NA>      <NA>      <NA>      <NA>      1383
#>  5:     <NA>     <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  6:     <NA>     <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  7:     <NA>     <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  8:     <NA>     <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  9:     <NA>     <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#> 10:     <NA>     <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>     20002-0.3 20002-2.0 20002-2.3 20006-0.0 20006-0.3 20006-2.0 20006-2.3
#>        <char>    <char>    <char>    <char>    <char>    <char>    <char>
#>  1:      1223      1514      <NA> 2012.8173 2007.0874 2023.2047 2014.7373
#>  2:      1352      1447      1165 2016.0638 2023.1635 2024.0358 2013.2044
#>  3:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  4:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  5:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  6:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  7:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  8:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  9:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#> 10:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>     20008-0.0 20008-0.3 20008-2.0 20008-2.3 41270-0.0 41270-0.3 41271-0.0
#>        <char>    <char>    <char>    <char>    <char>    <char>    <char>
#>  1: 1998.9782 2003.1527 2011.2636  2018.786      X715       E10    E89115
#>  2: 2011.0121  2020.502 1981.1627 1983.0059       E11     M0087     E8326
#>  3:        -3      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  4:        -1      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  5:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  6:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  7:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  8:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>  9:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#> 10:      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>     41271-0.3  41280-0.0  41280-0.3  41281-0.0  41281-0.3 40001-0.0 40001-1.0
#>        <char>     <char>     <char>     <char>     <char>    <char>    <char>
#>  1:      <NA> 1955-11-12 1910-02-19 1917-10-08 1969-11-23      X095      X095
#>  2:     75513 1939-02-16 1965-08-08 1955-02-11 1956-09-12      A162      A162
#>  3:      <NA>       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>
#>  4:      <NA>       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>
#>  5:      <NA>       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>
#>  6:      <NA>       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>
#>  7:      <NA>       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>
#>  8:      <NA>       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>
#>  9:      <NA>       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>
#> 10:      <NA>       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>
#>     40002-0.0 40002-1.3  40000-0.0  40000-1.0  20003-0.0  20003-2.0  20003-2.3
#>        <char>    <char>     <char>     <char>     <char>     <char>     <char>
#>  1:      W192      X715 1917-10-08 1910-02-19 1140861958 1141146188 1141184722
#>  2:      V374      <NA> 1955-02-11 1965-08-08 1141146234 1141184722 1140861958
#>  3:      <NA>      <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
#>  4:      <NA>      <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
#>  5:      <NA>      <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
#>  6:      <NA>      <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
#>  7:      <NA>      <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
#>  8:      <NA>      <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
#>  9:      <NA>      <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
#> 10:      <NA>      <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
#>         53-0.0     53-2.0  40005-0.0  40005-2.0 40006-0.0 40006-2.0 40013-0.0
#>         <char>     <char>     <char>     <char>    <char>    <char>    <char>
#>  1: 1955-02-11 1910-02-19 1956-11-24 1962-09-04     M4815      C850     27134
#>  2: 1965-08-08 1915-03-18 1910-10-04       <NA>      <NA>      W192      9626
#>  3:       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>      <NA>
#>  4:       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>      <NA>
#>  5:       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>      <NA>
#>  6:       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>      <NA>
#>  7:       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>      <NA>
#>  8:       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>      <NA>
#>  9:       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>      <NA>
#> 10:       <NA>       <NA>       <NA>       <NA>      <NA>      <NA>      <NA>
#>     40013-2.0 41272-0.0 41272-0.3  41282-0.0  41282-0.3 41273-0.0 41273-0.3
#>        <char>    <char>    <char>     <char>     <char>    <char>    <char>
#>  1:      2042       A01      A018 1956-11-24 1969-11-23       001      0081
#>  2:    E90200      A023       A02 1910-10-04 1956-09-12      0011      0071
#>  3:      <NA>       H01      <NA>       <NA>       <NA>      <NA>      <NA>
#>  4:      <NA>      H011      <NA>       <NA>       <NA>      <NA>      <NA>
#>  5:      <NA>      H022      <NA>       <NA>       <NA>      <NA>      <NA>
#>  6:      <NA>      H013      <NA>       <NA>       <NA>      <NA>      <NA>
#>  7:      <NA>      H018      <NA>       <NA>       <NA>      <NA>      <NA>
#>  8:      <NA>      H019      <NA>       <NA>       <NA>      <NA>      <NA>
#>  9:      <NA>      <NA>      <NA>       <NA>       <NA>      <NA>      <NA>
#> 10:      <NA>      <NA>      <NA>       <NA>       <NA>      <NA>      <NA>
#>      41283-0.0  41283-0.3 20004-0.0 20004-0.3 20010-0.0 20010-0.3
#>         <char>     <char>    <char>    <char>    <char>    <char>
#>  1: 1969-11-23 1955-11-12      1102      1108 2012.8173 2008.2342
#>  2: 1956-09-12 1939-02-16      1105      1109 2016.0638      <NA>
#>  3:       <NA>       <NA>      <NA>      <NA>      <NA>      <NA>
#>  4:       <NA>       <NA>      <NA>      <NA>      <NA>      <NA>
#>  5:       <NA>       <NA>      <NA>      <NA>      <NA>      <NA>
#>  6:       <NA>       <NA>      <NA>      <NA>      <NA>      <NA>
#>  7:       <NA>       <NA>      <NA>      <NA>      <NA>      <NA>
#>  8:       <NA>       <NA>      <NA>      <NA>      <NA>      <NA>
#>  9:       <NA>       <NA>      <NA>      <NA>      <NA>      <NA>
#> 10:       <NA>       <NA>      <NA>      <NA>      <NA>      <NA>

# get file path to dummy dataset
get_ukb_dummy("dummy_ukb_main.tsv", path_only = TRUE)
#> [1] "/home/runner/work/ukbwranglr/ukbwranglr/renv/library/R-4.4/x86_64-pc-linux-gnu/ukbwranglr/extdata/dummy_ukb_main.tsv"

# read all available dummy dataset into R
dummy_datasets %>%
  purrr::set_names() %>%
  purrr::map(get_ukb_dummy, path_only = FALSE) %>%
  purrr::map(tibble::as_tibble)
#> $dummy_Data_Dictionary_Showcase.tsv
#> # A tibble: 28 × 17
#>    Path      Category FieldID Field Participants Items Stability ValueType Units
#>    <chr>     <chr>    <chr>   <chr> <chr>        <chr> <chr>     <chr>     <chr>
#>  1 Populati… 100094   31      Sex   502413       5024… Complete  Categori… NA   
#>  2 Populati… 100094   34      Year… 502413       5024… Complete  Integer   years
#>  3 Populati… 100094   52      Mont… 502413       5024… Complete  Categori… NA   
#>  4 Assessme… 100024   53      Date… 502414       5795… Complete  Date      NA   
#>  5 Assessme… 100011   4080    Syst… 475231       1061… Complete  Integer   mmHg 
#>  6 Assessme… 100074   20001   Canc… 45950        54022 Complete  Categori… NA   
#>  7 Assessme… 100074   20002   Non-… 386743       1145… Complete  Categori… NA   
#>  8 Assessme… 100075   20003   Trea… 373347       1389… Complete  Categori… NA   
#>  9 Assessme… 100076   20004   Oper… 399178       1003… Complete  Categori… NA   
#> 10 Assessme… 100074   20006   Inte… 45950        54022 Complete  Continuo… years
#> # ℹ 18 more rows
#> # ℹ 8 more variables: ItemType <chr>, Strata <chr>, Sexed <chr>,
#> #   Instances <chr>, Array <chr>, Coding <chr>, Notes <chr>, Link <chr>
#> 
#> $dummy_Codings.tsv
#> # A tibble: 425 × 3
#>    Coding Value Meaning 
#>    <chr>  <chr> <chr>   
#>  1 8      1     January 
#>  2 8      10    October 
#>  3 8      11    November
#>  4 8      12    December
#>  5 8      2     February
#>  6 8      3     March   
#>  7 8      4     April   
#>  8 8      5     May     
#>  9 8      6     June    
#> 10 8      7     July    
#> # ℹ 415 more rows
#> 
#> $dummy_ukb_main.tsv
#> # A tibble: 10 × 71
#>      eid `31-0.0` `34-0.0` `52-0.0` `21000-0.0` `21000-1.0` `21000-2.0`
#>    <int> <chr>    <chr>    <chr>    <chr>       <chr>       <chr>      
#>  1     1 0        1952     8        -1          2           3003       
#>  2     2 0        1946     3        -3          2001        3004       
#>  3     3 1        1951     4        1           2002        -1         
#>  4     4 0        1956     9        1001        2003        4001       
#>  5     5 NA       NA       4        1002        2004        4002       
#>  6     6 1        1948     2        1003        3           4003       
#>  7     7 0        1949     12       NA          3001        5          
#>  8     8 1        1956     10       NA          5           NA         
#>  9     9 0        1962     4        4001        NA          NA         
#> 10    10 1        1953     2        4001        NA          NA         
#> # ℹ 64 more variables: `21001-0.0` <chr>, `21001-1.0` <chr>, `21001-2.0` <chr>,
#> #   `4080-0.0` <chr>, `4080-0.1` <chr>, `4080-0.2` <chr>, `4080-0.3` <chr>,
#> #   `4080-1.0` <chr>, `4080-1.1` <chr>, `4080-1.2` <chr>, `4080-1.3` <chr>,
#> #   `20001-0.0` <chr>, `20001-0.3` <chr>, `20001-2.0` <chr>, `20001-2.3` <chr>,
#> #   `20002-0.0` <chr>, `20002-0.3` <chr>, `20002-2.0` <chr>, `20002-2.3` <chr>,
#> #   `20006-0.0` <chr>, `20006-0.3` <chr>, `20006-2.0` <chr>, `20006-2.3` <chr>,
#> #   `20008-0.0` <chr>, `20008-0.3` <chr>, `20008-2.0` <chr>, …
#> 
#> $dummy_gp_clinical.txt
#> # A tibble: 12 × 8
#>      eid data_provider event_dt   read_2 read_3 value1 value2 value3
#>    <int> <chr>         <chr>      <chr>  <chr>  <chr>  <chr>  <chr> 
#>  1     1 1             03/03/1903 C      NA     1      2      3     
#>  2     1 4             01/01/1901 A      NA     1      2      3     
#>  3     1 3             07/07/2037 NA     E      1      2      3     
#>  4     3 1             07/07/2037 E      NA     1      2      3     
#>  5     4 2             01/02/1999 J      NA     1      2      3     
#>  6     8 1             01/02/1999 G      NA     1      2      3     
#>  7     1 1             01/10/1990 C108.  NA     NA     NA     NA    
#>  8     2 2             02/10/1990 C109.  NA     NA     NA     NA    
#>  9     1 3             03/10/1990 NA     X40J4  NA     NA     NA    
#> 10     2 3             04/10/1990 NA     X40J5  NA     NA     NA    
#> 11     1 1             03/10/1990 C108.  NA     NA     NA     NA    
#> 12     2 2             04/10/1990 C109.  NA     NA     NA     NA    
#> 
#> $dummy_gp_scripts.txt
#> # A tibble: 6 × 8
#>     eid data_provider issue_date read_2 bnf_code     dmd_code drug_name quantity
#>   <int> <chr>         <chr>      <chr>  <chr>        <chr>    <chr>     <chr>   
#> 1     1 1             03/03/1903 bxi300 NA           1        drug2     50      
#> 2     1 4             01/01/1901 bxi3   NA           NA       NA        NA      
#> 3     1 3             07/07/2037 NA     02.02.01.00… NA       drug2     30      
#> 4     3 1             07/07/2037 bd3j00 NA           1        drug2     30      
#> 5     4 2             01/02/1999 bd3j   02020100     NA       drug2     30      
#> 6     8 1             01/02/1999 NA     NA           1        2         30      
#>