Skip to contents

Reads the UK Biobank data dictionary into R with all columns as type character.

Usage

get_ukb_data_dict(path = NULL)

Arguments

path

Either NULL, or a file path. If no file exists at this path, then it will be downloaded directly from the UK Biobank data showcase website to this location when the function is first called.

Value

A data frame.

Details

By default, an attempt will be made to read from a file at the path specified by an environmental variable named UKB_DATA_DICT (see vignette('ukbwranglr') for further details), or if this variable is not found then the data dictionary will be downloaded directly from the UK Biobank website to tempdir() when the function is first called.

Examples

get_ukb_data_dict(
  get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv", path_only = TRUE)
)
#>                                                                           Path
#>                                                                         <char>
#>  1:                      Population characteristics > Baseline characteristics
#>  2:                      Population characteristics > Baseline characteristics
#>  3:                      Population characteristics > Baseline characteristics
#>  4:                                Assessment Centre > Recruitment > Reception
#>  5:                     Assessment Centre > Physical measures > Blood pressure
#>  6:                  Assessment Centre > Verbal interview > Medical conditions
#>  7:                  Assessment Centre > Verbal interview > Medical conditions
#>  8:                         Assessment Centre > Verbal interview > Medications
#>  9:                          Assessment Centre > Verbal interview > Operations
#> 10:                  Assessment Centre > Verbal interview > Medical conditions
#> 11:                  Assessment Centre > Verbal interview > Medical conditions
#> 12:                          Assessment Centre > Verbal interview > Operations
#> 13:            Assessment Centre > Touchscreen > Sociodemographics > Ethnicity
#> 14: Assessment Centre > Physical measures > Anthropometry > Body size measures
#> 15:                                   Health-related outcomes > Death register
#> 16:                                   Health-related outcomes > Death register
#> 17:                                   Health-related outcomes > Death register
#> 18:                                  Health-related outcomes > Cancer register
#> 19:                                  Health-related outcomes > Cancer register
#> 20:                                  Health-related outcomes > Cancer register
#> 21:           Health-related outcomes > Hospital inpatient > Summary Diagnoses
#> 22:           Health-related outcomes > Hospital inpatient > Summary Diagnoses
#> 23:          Health-related outcomes > Hospital inpatient > Summary Operations
#> 24:          Health-related outcomes > Hospital inpatient > Summary Operations
#> 25:           Health-related outcomes > Hospital inpatient > Summary Diagnoses
#> 26:           Health-related outcomes > Hospital inpatient > Summary Diagnoses
#> 27:          Health-related outcomes > Hospital inpatient > Summary Operations
#> 28:          Health-related outcomes > Hospital inpatient > Summary Operations
#>                                                                           Path
#>     Category FieldID                                                     Field
#>       <char>  <char>                                                    <char>
#>  1:   100094      31                                                       Sex
#>  2:   100094      34                                             Year of birth
#>  3:   100094      52                                            Month of birth
#>  4:   100024      53                       Date of attending assessment centre
#>  5:   100011    4080                Systolic blood pressure, automated reading
#>  6:   100074   20001                                Cancer code, self-reported
#>  7:   100074   20002                    Non-cancer illness code, self-reported
#>  8:   100075   20003                                 Treatment/medication code
#>  9:   100076   20004                                            Operation code
#> 10:   100074   20006             Interpolated Year when cancer first diagnosed
#> 11:   100074   20008 Interpolated Year when non-cancer illness first diagnosed
#> 12:   100076   20010               Interpolated Year when operation took place
#> 13:   100065   21000                                         Ethnic background
#> 14:   100010   21001                                     Body mass index (BMI)
#> 15:   100093   40000                                             Date of death
#> 16:   100093   40001                Underlying (primary) cause of death: ICD10
#> 17:   100093   40002           Contributory (secondary) causes of death: ICD10
#> 18:   100092   40005                                  Date of cancer diagnosis
#> 19:   100092   40006                                     Type of cancer: ICD10
#> 20:   100092   40013                                      Type of cancer: ICD9
#> 21:     2002   41270                                         Diagnoses - ICD10
#> 22:     2002   41271                                          Diagnoses - ICD9
#> 23:     2005   41272                              Operative procedures - OPCS4
#> 24:     2005   41273                              Operative procedures - OPCS3
#> 25:     2002   41280                Date of first in-patient diagnosis - ICD10
#> 26:     2002   41281                 Date of first in-patient diagnosis - ICD9
#> 27:     2005   41282                 Date of first operative procedure - OPCS4
#> 28:     2005   41283                 Date of first operative procedure - OPCS3
#>     Category FieldID                                                     Field
#>     Participants   Items Stability            ValueType  Units ItemType  Strata
#>           <char>  <char>    <char>               <char> <char>   <char>  <char>
#>  1:       502413  502413  Complete   Categorical single   <NA>     Data Primary
#>  2:       502413  502413  Complete              Integer  years     Data Primary
#>  3:       502413  502413  Complete   Categorical single   <NA>     Data Primary
#>  4:       502414  579587  Complete                 Date   <NA>     Data Primary
#>  5:       475231 1061497  Complete              Integer   mmHg     Data Primary
#>  6:        45950   54022  Complete Categorical multiple   <NA>     Data Derived
#>  7:       386743 1145486  Complete Categorical multiple   <NA>     Data Derived
#>  8:       373347 1389143  Complete Categorical multiple   <NA>     Data Primary
#>  9:       399178 1003287  Complete Categorical multiple   <NA>     Data Derived
#> 10:        45950   54022  Complete           Continuous  years     Data Primary
#> 11:       386742 1145473  Complete           Continuous  years     Data Primary
#> 12:       399176 1003283  Complete           Continuous  years     Data Primary
#> 13:       501521  533516  Complete   Categorical single   <NA>     Data Derived
#> 14:       499421  574596  Complete           Continuous  Kg/m2     Data Derived
#> 15:        37897   37957  Accruing                 Date   <NA>     Data Primary
#> 16:        37735   37795  Accruing   Categorical single   <NA>     Data Primary
#> 17:        24774   56655  Accruing   Categorical single   <NA>     Data Primary
#> 18:       116047  156178  Accruing                 Date   <NA>     Data Primary
#> 19:       111445  140930  Accruing   Categorical single   <NA>     Data Primary
#> 20:        11221   15240  Accruing   Categorical single   <NA>     Data Primary
#> 21:       440019 6302100   Ongoing Categorical multiple   <NA>     Data Primary
#> 22:        20299   58684   Ongoing Categorical multiple   <NA>     Data Primary
#> 23:       440161 4977062   Ongoing Categorical multiple   <NA>     Data Primary
#> 24:        10699   18579   Ongoing Categorical multiple   <NA>     Data Primary
#> 25:       440016 6301998   Ongoing                 Date   <NA>     Data Primary
#> 26:        20299   58684   Ongoing                 Date   <NA>     Data Primary
#> 27:       440155 4976965   Ongoing                 Date   <NA>     Data Primary
#> 28:        10699   18579   Ongoing                 Date   <NA>     Data Primary
#>     Participants   Items Stability            ValueType  Units ItemType  Strata
#>      Sexed Instances  Array Coding
#>     <char>    <char> <char> <char>
#>  1: Unisex         1      1      9
#>  2: Unisex         1      1   <NA>
#>  3: Unisex         1      1      8
#>  4: Unisex         4      1   <NA>
#>  5: Unisex         4      2   <NA>
#>  6: Unisex         4      6      3
#>  7: Unisex         4     34      6
#>  8: Unisex         4     48      4
#>  9: Unisex         4     32      5
#> 10: Unisex         4      6     13
#> 11: Unisex         4     34     13
#> 12: Unisex         4     32     13
#> 13: Unisex         3      1   1001
#> 14: Unisex         4      1   <NA>
#> 15: Unisex         2      1   <NA>
#> 16: Unisex         2      1     19
#> 17: Unisex         2     14     19
#> 18: Unisex        18      1   <NA>
#> 19: Unisex        18      1     19
#> 20: Unisex        15      1     87
#> 21: Unisex         1    243     19
#> 22: Unisex         1     47     87
#> 23: Unisex         1    124    240
#> 24: Unisex         1     16    259
#> 25: Unisex         1    243   <NA>
#> 26: Unisex         1     47   <NA>
#> 27: Unisex         1    124   <NA>
#> 28: Unisex         1     16   <NA>
#>      Sexed Instances  Array Coding
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Notes
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         <char>
#>  1:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Sex of participant.  Acquired from central registry at recruitment, but in some cases updated by the participant. Hence this field may contain a mixture of the sex the NHS had recorded for the participant and self-reported sex.
#>  2:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Year of birth of participant. Acquired from central registry, updated by participant.
#>  3:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Calendar month of birth of participant. Acquired from central registry, updated by participant.
#>  4:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Date when a participant attended a UK Biobank assessment centre.  Automatically acquired at Reception stage.
#>  5:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Blood pressure, automated reading, systolic. Two measures of blood pressure were taken a few moments apart.   Range returned by the Omron device is is 0-255
#>  6:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Code for cancer. If the participant was uncertain of the type of cancer they had had, then they described it to the interviewer (a trained nurse) who attempted to place it within the coding tree. If the cancer could not be located in the coding tree then the interviewer entered a free-text description of it. These free-text descriptions were subsequently examined by a doctor and, where possible, matched to entries in the coding tree. Free-text descriptions which could not be matched with very high probability have been marked as ""unclassifiable"".
#>  7:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Code for non-cancer illness. If the participant was uncertain of the type of illness they had had, then they described it to the interviewer (a trained nurse) who attempted to place it within the coding tree. If the illness could not be located in the coding tree then the interviewer entered a free-text description of it. These free-text descriptions were subsequently examined by a doctor and, where possible, matched to entries in the coding tree. Free-text descriptions which could not be matched with very high probability have been marked as ""unclassifiable"".   Note that myasthenia gravis appears twice (under codes 1260 and 1437). Please ensure you use both codes to capture all relevant diagnoses.
#>  8:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Code for treatment Negative codes indicate free-text entry.
#>  9:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Code for operation. If the participant was uncertain of the type of operation they had undergone, then they described it to the interviewer (a trained nurse) who attempted to place it within the coding tree. If the operation could not be located in the coding tree then the interviewer entered a free-text description of it. These free-text descriptions were subsequently examined by a doctor and, where possible, matched to entries in the coding tree. Free-text descriptions which could not be matched with very high probability have been marked as ""unclassifiable"".
#> 10:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        This is the interpolated time when the participant indicated the corresponding cancer was first diagnosed by a doctor, measured in years.    If the participant gave a calendar year, then the best-fit time is half-way through that year. For example if the year was given as 1970, then the value presented is 1970.5   If the participant gave their age then the value presented is the fractional year corresponding to the mid-point of that age. For example, if the participant said they were 30 years old then the value is the date at which they were 30years+6months.  Interpolated values before the date of birth were truncated forwards to that time.   Interpolated values after the time of data acquisition were truncated back to that time.
#> 11:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     This is the interpolated time when the participant indicated the corresponding condition was first diagnosed by a doctor, measured in years.    If the participant gave a calendar year, then the best-fit time is half-way through that year. For example if the year was given as 1970, then the value presented is 1970.5   If the participant gave their age then the value presented is the fractional year corresponding to the mid-point of that age. For example, if the participant said they were 30 years old then the value is the date at which they were 30years+6months.  Interpolated values before the date of birth were truncated forwards to that time.   Interpolated values after the time of data acquisition were truncated back to that time.
#> 12:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  This is the year when the participant indicated the operation took place.
#> 13:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   This is an amalgam of sequential branching questions asked during the initial Assessment Centre visit as part of the touchscreen questionnaire.    The question was dropped from the touchscreen protocol on 24/10/2016.
#> 14:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   BMI value here is constructed from height and weight measured during the initial Assessment Centre visit. Value is not present if either of these readings were omitted.
#> 15:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Date of death. Acquired from central registry.
#> 16:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Underlying/primary cause of death reported for participant. Note that this may not match the text value in Field 40010 due to transcription errors at source.  Acquired from central registry.
#> 17:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Contributory/secondary causes of death reported for participant. There may be zero, one or many. Acquired from central registry.
#> 18:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Date of cancer diagnosis, acquired from central registry.  Note that data from the most recent 12-18 months is still accruing (i.e. it is not complete).   The events/dates are indexed in the order in which they are received and processed by UK Biobank rather than in their own chronological order.
#> 19:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           The ICD-10 code for the type of cancer. Acquired from central registry. The ICD-10 tree displayed on this page includes all cancer records for each UK Biobank participant (for which there may be multiple ICD codes). For more information on the first diagnosed cancer in each participant, please refer to Category 100092.
#> 20:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      The ICD9 code for the type of cancer. Acquired from central registry.
#> 21:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                This field is a summary of the distinct diagnosis codes a participant has had recorded across all their hospital inpatient records in either the primary or secondary position. Diagnoses are coded according to the International Classification of Disease version 10 (ICD-10).   The corresponding date each diagnosis was first recorded across all their episodes in hospital is given in Field 41280.
#> 22:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               This field is a summary of the distinct diagnosis codes a participant has had recorded across all their hospital inpatient records in either the primary or secondary position. Diagnoses are coded according to the International Classification of Disease version 9 (ICD-9).  The corresponding date each diagnosis was first recorded in each participant's hospital inpatient records is given in Field 41281.  Please note ICD-9 coded hospital inpatient data are only available for older Scottish hospital records.
#> 23:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          This field is a summary of the operation and procedure codes a participant has had recorded across all their hospital inpatient records in either the main or secondary position. Operative procedures are coded according to the Office of Population Censuses and Surveys Classification of Interventions and Procedures, version 4 (OPCS-4).  The corresponding date when each procedure was first recorded in the inpatient data can be found in Field 41282.
#> 24:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                This field is a summary of the operation and procedure codes a participant has had recorded across all their hospital inpatient records in either the main or secondary position. Operative procedures are coded according to the Office of Population Censuses and Surveys Classification of Interventions and Procedures, version 3 (OPCS-3).  The corresponding date when each procedure was first recorded in the inpatient data can be found in Field 41283.  Please note OPCS-3 coded hospital inpatient data are only available for older Scottish hospital records.
#> 25:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            This field provides, for each participant, the date each ICD-10 diagnosis code was first recorded in either the primary or secondary position in the participant's hospital inpatient records (in the field DIAG_ICD10 in the HESIN_DIAG table).  The date given is the episode start date (the EPISTART field on the HESIN table), or if this was missing the admission date (the ADMIDATE field on the HESIN table). See the Inpatient data Dictionary (Resource 141140) in Category 2000 for information about the HESIN and HESIN_DIAG tables.  The corresponding ICD-10 diagnosis codes can be found in data-field Field 41270 and the two fields can be linked using the array structure.
#> 26:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       This field provides, for each participant, the date each ICD-9 diagnosis code was first recorded as a primary or secondary diagnosis in the participant's hospital inpatient records (in the field DIAG_ICD9 in the HESIN_DIAG table).  The date given is the episode start date (the EPISTART field on the HESIN table), or if this was missing the admission date (the ADMIDATE field on the HESIN table). See the Inpatient data Dictionary (Resource 141140) in Category 2000 for information about the HESIN and HESIN_DIAG tables.  The corresponding ICD-9 diagnosis codes can be found in data-field Field 41271 and the two fields can be linked using the array structure.   Please note ICD-9 coded hospital data are only available for older Scottish hospital records.
#> 27:                                                                                                           This field provides, for each participant, the dates when each of the procedures in Field 41272 was first recorded in the participant's hospital inpatient records (in either the main or a secondary position). The date can be linked to Field 41272 by using the array structure.   Please note that we have opted to populate this data-field with the episode start date (EPISTART in the HESIN table) or, if that was missing, the admission start date (ADMIDATE in the HESIN table) rather than use the operation date (OPDATE in the HESIN_OPER table). The rationale for this was:    around 1 in 5 primary procedures lacked a corresponding value for OPDATE in HESIN_OPER;     some recorded OPDATEs were inconsistent with the other date fields such as EPISTART, ADMIDATE, EPIEND and DISDATE (most likely due to data entry errors);     for those episodes with a value for OPDATE recorded the majority were the same as the value of EPISTART, with around 99% of OPDATE values within 7 days of the value of EPISTART;     values of EPISTART/ADMIDATE are more complete and thus may provide greater consistency across data providers.     See the Operations tab of the Hospital Inpatient Data Dictionary (Resource 141140) for further information about these fields.
#> 28: This field provides, for each participant, the dates when each of the procedures in Field 41273 was first recorded in the participant's hospital inpatient records (in either the main or a secondary position). The date can be linked to Field 41273 by using the array structure.   Please note that we have opted to populate this data-field with the episode start date (EPISTART in the HESIN table) or, if that was missing, the admission start date (ADMIDATE in the HESIN table) rather than use the operation date (OPDATE in the HESIN_OPER table). The rationale for this was:    around 1 in 5 primary procedures lacked a corresponding value for OPDATE in HESIN_OPER;     some recorded OPDATEs were inconsistent with the other date fields such as EPISTART, ADMIDATE, EPIEND and DISDATE (most likely due to data entry errors);     for those episodes with a value for OPDATE recorded the majority were the same as the value of EPISTART, with around 99% of OPDATE values within 7 days of the value of EPISTART;     values of EPISTART/ADMIDATE are more complete and thus may provide greater consistency across data providers.     See the Operations tab of the Hospital Inpatient Data Dictionary (Resource 141140) for further information about these fields.  Please note OPCS-3 coded hospital inpatient data are only available for older Scottish hospital records.
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Notes
#>                                                    Link
#>                                                  <char>
#>  1:    http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=31
#>  2:    http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=34
#>  3:    http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=52
#>  4:    http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=53
#>  5:  http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=4080
#>  6: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20001
#>  7: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20002
#>  8: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20003
#>  9: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20004
#> 10: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20006
#> 11: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20008
#> 12: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20010
#> 13: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=21000
#> 14: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=21001
#> 15: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40000
#> 16: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40001
#> 17: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40002
#> 18: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40005
#> 19: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40006
#> 20: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40013
#> 21: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41270
#> 22: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41271
#> 23: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41272
#> 24: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41273
#> 25: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41280
#> 26: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41281
#> 27: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41282
#> 28: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41283
#>                                                    Link