Reads the UK Biobank data dictionary into R with all columns as type character.
Arguments
- path
Either
NULL
, or a file path. If no file exists at this path, then it will be downloaded directly from the UK Biobank data showcase website to this location when the function is first called.
Details
By default, an attempt will be made to read from a file at the path specified
by an environmental variable named UKB_DATA_DICT
(see
vignette('ukbwranglr')
for further details), or if this variable is not
found then the data dictionary will be downloaded directly from the UK Biobank website
to tempdir()
when the function is first called.
Examples
get_ukb_data_dict(
get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv", path_only = TRUE)
)
#> Path
#> <char>
#> 1: Population characteristics > Baseline characteristics
#> 2: Population characteristics > Baseline characteristics
#> 3: Population characteristics > Baseline characteristics
#> 4: Assessment Centre > Recruitment > Reception
#> 5: Assessment Centre > Physical measures > Blood pressure
#> 6: Assessment Centre > Verbal interview > Medical conditions
#> 7: Assessment Centre > Verbal interview > Medical conditions
#> 8: Assessment Centre > Verbal interview > Medications
#> 9: Assessment Centre > Verbal interview > Operations
#> 10: Assessment Centre > Verbal interview > Medical conditions
#> 11: Assessment Centre > Verbal interview > Medical conditions
#> 12: Assessment Centre > Verbal interview > Operations
#> 13: Assessment Centre > Touchscreen > Sociodemographics > Ethnicity
#> 14: Assessment Centre > Physical measures > Anthropometry > Body size measures
#> 15: Health-related outcomes > Death register
#> 16: Health-related outcomes > Death register
#> 17: Health-related outcomes > Death register
#> 18: Health-related outcomes > Cancer register
#> 19: Health-related outcomes > Cancer register
#> 20: Health-related outcomes > Cancer register
#> 21: Health-related outcomes > Hospital inpatient > Summary Diagnoses
#> 22: Health-related outcomes > Hospital inpatient > Summary Diagnoses
#> 23: Health-related outcomes > Hospital inpatient > Summary Operations
#> 24: Health-related outcomes > Hospital inpatient > Summary Operations
#> 25: Health-related outcomes > Hospital inpatient > Summary Diagnoses
#> 26: Health-related outcomes > Hospital inpatient > Summary Diagnoses
#> 27: Health-related outcomes > Hospital inpatient > Summary Operations
#> 28: Health-related outcomes > Hospital inpatient > Summary Operations
#> Path
#> Category FieldID Field
#> <char> <char> <char>
#> 1: 100094 31 Sex
#> 2: 100094 34 Year of birth
#> 3: 100094 52 Month of birth
#> 4: 100024 53 Date of attending assessment centre
#> 5: 100011 4080 Systolic blood pressure, automated reading
#> 6: 100074 20001 Cancer code, self-reported
#> 7: 100074 20002 Non-cancer illness code, self-reported
#> 8: 100075 20003 Treatment/medication code
#> 9: 100076 20004 Operation code
#> 10: 100074 20006 Interpolated Year when cancer first diagnosed
#> 11: 100074 20008 Interpolated Year when non-cancer illness first diagnosed
#> 12: 100076 20010 Interpolated Year when operation took place
#> 13: 100065 21000 Ethnic background
#> 14: 100010 21001 Body mass index (BMI)
#> 15: 100093 40000 Date of death
#> 16: 100093 40001 Underlying (primary) cause of death: ICD10
#> 17: 100093 40002 Contributory (secondary) causes of death: ICD10
#> 18: 100092 40005 Date of cancer diagnosis
#> 19: 100092 40006 Type of cancer: ICD10
#> 20: 100092 40013 Type of cancer: ICD9
#> 21: 2002 41270 Diagnoses - ICD10
#> 22: 2002 41271 Diagnoses - ICD9
#> 23: 2005 41272 Operative procedures - OPCS4
#> 24: 2005 41273 Operative procedures - OPCS3
#> 25: 2002 41280 Date of first in-patient diagnosis - ICD10
#> 26: 2002 41281 Date of first in-patient diagnosis - ICD9
#> 27: 2005 41282 Date of first operative procedure - OPCS4
#> 28: 2005 41283 Date of first operative procedure - OPCS3
#> Category FieldID Field
#> Participants Items Stability ValueType Units ItemType Strata
#> <char> <char> <char> <char> <char> <char> <char>
#> 1: 502413 502413 Complete Categorical single <NA> Data Primary
#> 2: 502413 502413 Complete Integer years Data Primary
#> 3: 502413 502413 Complete Categorical single <NA> Data Primary
#> 4: 502414 579587 Complete Date <NA> Data Primary
#> 5: 475231 1061497 Complete Integer mmHg Data Primary
#> 6: 45950 54022 Complete Categorical multiple <NA> Data Derived
#> 7: 386743 1145486 Complete Categorical multiple <NA> Data Derived
#> 8: 373347 1389143 Complete Categorical multiple <NA> Data Primary
#> 9: 399178 1003287 Complete Categorical multiple <NA> Data Derived
#> 10: 45950 54022 Complete Continuous years Data Primary
#> 11: 386742 1145473 Complete Continuous years Data Primary
#> 12: 399176 1003283 Complete Continuous years Data Primary
#> 13: 501521 533516 Complete Categorical single <NA> Data Derived
#> 14: 499421 574596 Complete Continuous Kg/m2 Data Derived
#> 15: 37897 37957 Accruing Date <NA> Data Primary
#> 16: 37735 37795 Accruing Categorical single <NA> Data Primary
#> 17: 24774 56655 Accruing Categorical single <NA> Data Primary
#> 18: 116047 156178 Accruing Date <NA> Data Primary
#> 19: 111445 140930 Accruing Categorical single <NA> Data Primary
#> 20: 11221 15240 Accruing Categorical single <NA> Data Primary
#> 21: 440019 6302100 Ongoing Categorical multiple <NA> Data Primary
#> 22: 20299 58684 Ongoing Categorical multiple <NA> Data Primary
#> 23: 440161 4977062 Ongoing Categorical multiple <NA> Data Primary
#> 24: 10699 18579 Ongoing Categorical multiple <NA> Data Primary
#> 25: 440016 6301998 Ongoing Date <NA> Data Primary
#> 26: 20299 58684 Ongoing Date <NA> Data Primary
#> 27: 440155 4976965 Ongoing Date <NA> Data Primary
#> 28: 10699 18579 Ongoing Date <NA> Data Primary
#> Participants Items Stability ValueType Units ItemType Strata
#> Sexed Instances Array Coding
#> <char> <char> <char> <char>
#> 1: Unisex 1 1 9
#> 2: Unisex 1 1 <NA>
#> 3: Unisex 1 1 8
#> 4: Unisex 4 1 <NA>
#> 5: Unisex 4 2 <NA>
#> 6: Unisex 4 6 3
#> 7: Unisex 4 34 6
#> 8: Unisex 4 48 4
#> 9: Unisex 4 32 5
#> 10: Unisex 4 6 13
#> 11: Unisex 4 34 13
#> 12: Unisex 4 32 13
#> 13: Unisex 3 1 1001
#> 14: Unisex 4 1 <NA>
#> 15: Unisex 2 1 <NA>
#> 16: Unisex 2 1 19
#> 17: Unisex 2 14 19
#> 18: Unisex 18 1 <NA>
#> 19: Unisex 18 1 19
#> 20: Unisex 15 1 87
#> 21: Unisex 1 243 19
#> 22: Unisex 1 47 87
#> 23: Unisex 1 124 240
#> 24: Unisex 1 16 259
#> 25: Unisex 1 243 <NA>
#> 26: Unisex 1 47 <NA>
#> 27: Unisex 1 124 <NA>
#> 28: Unisex 1 16 <NA>
#> Sexed Instances Array Coding
#> Notes
#> <char>
#> 1: Sex of participant. Acquired from central registry at recruitment, but in some cases updated by the participant. Hence this field may contain a mixture of the sex the NHS had recorded for the participant and self-reported sex.
#> 2: Year of birth of participant. Acquired from central registry, updated by participant.
#> 3: Calendar month of birth of participant. Acquired from central registry, updated by participant.
#> 4: Date when a participant attended a UK Biobank assessment centre. Automatically acquired at Reception stage.
#> 5: Blood pressure, automated reading, systolic. Two measures of blood pressure were taken a few moments apart. Range returned by the Omron device is is 0-255
#> 6: Code for cancer. If the participant was uncertain of the type of cancer they had had, then they described it to the interviewer (a trained nurse) who attempted to place it within the coding tree. If the cancer could not be located in the coding tree then the interviewer entered a free-text description of it. These free-text descriptions were subsequently examined by a doctor and, where possible, matched to entries in the coding tree. Free-text descriptions which could not be matched with very high probability have been marked as ""unclassifiable"".
#> 7: Code for non-cancer illness. If the participant was uncertain of the type of illness they had had, then they described it to the interviewer (a trained nurse) who attempted to place it within the coding tree. If the illness could not be located in the coding tree then the interviewer entered a free-text description of it. These free-text descriptions were subsequently examined by a doctor and, where possible, matched to entries in the coding tree. Free-text descriptions which could not be matched with very high probability have been marked as ""unclassifiable"". Note that myasthenia gravis appears twice (under codes 1260 and 1437). Please ensure you use both codes to capture all relevant diagnoses.
#> 8: Code for treatment Negative codes indicate free-text entry.
#> 9: Code for operation. If the participant was uncertain of the type of operation they had undergone, then they described it to the interviewer (a trained nurse) who attempted to place it within the coding tree. If the operation could not be located in the coding tree then the interviewer entered a free-text description of it. These free-text descriptions were subsequently examined by a doctor and, where possible, matched to entries in the coding tree. Free-text descriptions which could not be matched with very high probability have been marked as ""unclassifiable"".
#> 10: This is the interpolated time when the participant indicated the corresponding cancer was first diagnosed by a doctor, measured in years. If the participant gave a calendar year, then the best-fit time is half-way through that year. For example if the year was given as 1970, then the value presented is 1970.5 If the participant gave their age then the value presented is the fractional year corresponding to the mid-point of that age. For example, if the participant said they were 30 years old then the value is the date at which they were 30years+6months. Interpolated values before the date of birth were truncated forwards to that time. Interpolated values after the time of data acquisition were truncated back to that time.
#> 11: This is the interpolated time when the participant indicated the corresponding condition was first diagnosed by a doctor, measured in years. If the participant gave a calendar year, then the best-fit time is half-way through that year. For example if the year was given as 1970, then the value presented is 1970.5 If the participant gave their age then the value presented is the fractional year corresponding to the mid-point of that age. For example, if the participant said they were 30 years old then the value is the date at which they were 30years+6months. Interpolated values before the date of birth were truncated forwards to that time. Interpolated values after the time of data acquisition were truncated back to that time.
#> 12: This is the year when the participant indicated the operation took place.
#> 13: This is an amalgam of sequential branching questions asked during the initial Assessment Centre visit as part of the touchscreen questionnaire. The question was dropped from the touchscreen protocol on 24/10/2016.
#> 14: BMI value here is constructed from height and weight measured during the initial Assessment Centre visit. Value is not present if either of these readings were omitted.
#> 15: Date of death. Acquired from central registry.
#> 16: Underlying/primary cause of death reported for participant. Note that this may not match the text value in Field 40010 due to transcription errors at source. Acquired from central registry.
#> 17: Contributory/secondary causes of death reported for participant. There may be zero, one or many. Acquired from central registry.
#> 18: Date of cancer diagnosis, acquired from central registry. Note that data from the most recent 12-18 months is still accruing (i.e. it is not complete). The events/dates are indexed in the order in which they are received and processed by UK Biobank rather than in their own chronological order.
#> 19: The ICD-10 code for the type of cancer. Acquired from central registry. The ICD-10 tree displayed on this page includes all cancer records for each UK Biobank participant (for which there may be multiple ICD codes). For more information on the first diagnosed cancer in each participant, please refer to Category 100092.
#> 20: The ICD9 code for the type of cancer. Acquired from central registry.
#> 21: This field is a summary of the distinct diagnosis codes a participant has had recorded across all their hospital inpatient records in either the primary or secondary position. Diagnoses are coded according to the International Classification of Disease version 10 (ICD-10). The corresponding date each diagnosis was first recorded across all their episodes in hospital is given in Field 41280.
#> 22: This field is a summary of the distinct diagnosis codes a participant has had recorded across all their hospital inpatient records in either the primary or secondary position. Diagnoses are coded according to the International Classification of Disease version 9 (ICD-9). The corresponding date each diagnosis was first recorded in each participant's hospital inpatient records is given in Field 41281. Please note ICD-9 coded hospital inpatient data are only available for older Scottish hospital records.
#> 23: This field is a summary of the operation and procedure codes a participant has had recorded across all their hospital inpatient records in either the main or secondary position. Operative procedures are coded according to the Office of Population Censuses and Surveys Classification of Interventions and Procedures, version 4 (OPCS-4). The corresponding date when each procedure was first recorded in the inpatient data can be found in Field 41282.
#> 24: This field is a summary of the operation and procedure codes a participant has had recorded across all their hospital inpatient records in either the main or secondary position. Operative procedures are coded according to the Office of Population Censuses and Surveys Classification of Interventions and Procedures, version 3 (OPCS-3). The corresponding date when each procedure was first recorded in the inpatient data can be found in Field 41283. Please note OPCS-3 coded hospital inpatient data are only available for older Scottish hospital records.
#> 25: This field provides, for each participant, the date each ICD-10 diagnosis code was first recorded in either the primary or secondary position in the participant's hospital inpatient records (in the field DIAG_ICD10 in the HESIN_DIAG table). The date given is the episode start date (the EPISTART field on the HESIN table), or if this was missing the admission date (the ADMIDATE field on the HESIN table). See the Inpatient data Dictionary (Resource 141140) in Category 2000 for information about the HESIN and HESIN_DIAG tables. The corresponding ICD-10 diagnosis codes can be found in data-field Field 41270 and the two fields can be linked using the array structure.
#> 26: This field provides, for each participant, the date each ICD-9 diagnosis code was first recorded as a primary or secondary diagnosis in the participant's hospital inpatient records (in the field DIAG_ICD9 in the HESIN_DIAG table). The date given is the episode start date (the EPISTART field on the HESIN table), or if this was missing the admission date (the ADMIDATE field on the HESIN table). See the Inpatient data Dictionary (Resource 141140) in Category 2000 for information about the HESIN and HESIN_DIAG tables. The corresponding ICD-9 diagnosis codes can be found in data-field Field 41271 and the two fields can be linked using the array structure. Please note ICD-9 coded hospital data are only available for older Scottish hospital records.
#> 27: This field provides, for each participant, the dates when each of the procedures in Field 41272 was first recorded in the participant's hospital inpatient records (in either the main or a secondary position). The date can be linked to Field 41272 by using the array structure. Please note that we have opted to populate this data-field with the episode start date (EPISTART in the HESIN table) or, if that was missing, the admission start date (ADMIDATE in the HESIN table) rather than use the operation date (OPDATE in the HESIN_OPER table). The rationale for this was: around 1 in 5 primary procedures lacked a corresponding value for OPDATE in HESIN_OPER; some recorded OPDATEs were inconsistent with the other date fields such as EPISTART, ADMIDATE, EPIEND and DISDATE (most likely due to data entry errors); for those episodes with a value for OPDATE recorded the majority were the same as the value of EPISTART, with around 99% of OPDATE values within 7 days of the value of EPISTART; values of EPISTART/ADMIDATE are more complete and thus may provide greater consistency across data providers. See the Operations tab of the Hospital Inpatient Data Dictionary (Resource 141140) for further information about these fields.
#> 28: This field provides, for each participant, the dates when each of the procedures in Field 41273 was first recorded in the participant's hospital inpatient records (in either the main or a secondary position). The date can be linked to Field 41273 by using the array structure. Please note that we have opted to populate this data-field with the episode start date (EPISTART in the HESIN table) or, if that was missing, the admission start date (ADMIDATE in the HESIN table) rather than use the operation date (OPDATE in the HESIN_OPER table). The rationale for this was: around 1 in 5 primary procedures lacked a corresponding value for OPDATE in HESIN_OPER; some recorded OPDATEs were inconsistent with the other date fields such as EPISTART, ADMIDATE, EPIEND and DISDATE (most likely due to data entry errors); for those episodes with a value for OPDATE recorded the majority were the same as the value of EPISTART, with around 99% of OPDATE values within 7 days of the value of EPISTART; values of EPISTART/ADMIDATE are more complete and thus may provide greater consistency across data providers. See the Operations tab of the Hospital Inpatient Data Dictionary (Resource 141140) for further information about these fields. Please note OPCS-3 coded hospital inpatient data are only available for older Scottish hospital records.
#> Notes
#> Link
#> <char>
#> 1: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=31
#> 2: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=34
#> 3: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=52
#> 4: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=53
#> 5: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=4080
#> 6: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20001
#> 7: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20002
#> 8: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20003
#> 9: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20004
#> 10: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20006
#> 11: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20008
#> 12: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=20010
#> 13: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=21000
#> 14: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=21001
#> 15: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40000
#> 16: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40001
#> 17: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40002
#> 18: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40005
#> 19: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40006
#> 20: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=40013
#> 21: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41270
#> 22: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41271
#> 23: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41272
#> 24: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41273
#> 25: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41280
#> 26: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41281
#> 27: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41282
#> 28: http://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=41283
#> Link