Skip to content

List of data files and their descriptions

All data files released under the Main study SN6614 EUL version are listed below. The Special License version of the survey SN6931 includes the same files but with some additional variables and for some income variables the non-top-coded values of those variables. The Secure Access version of the survey data is available as SN6676. It includes all files in the Special Licence version and files that contain 3 variables relating to the National Grid Reference for each household: Easting, Northing and positional quality indicator (OSGRDIND). This Secure Access version also includes variables for the full dates of birth for Understanding Society and BHPS respondents. The different access level are explained on the Data Access page.

Data collected from different sources (e.g. the household interview, the adult interview, the youth interview) are stored in separate files. Each wave has a set of such files. To make it easier to use, files have the same root name, but begin with a letter prefix to reflect the wave the data are collected. So, “a_” for the first wave, “b_” for the second wave (in this user guide we have used “w_” to denote waves in general). From the Wave 7 data release onward (November 2017) Understanding Society-harmonised BHPS data files are also included. Most files exist for both studies and if they do and have been harmonised, the file stem name will match. Wave-specific harmonised BHPS files can be identified by the wave prefix bw_. Note: BHPS data files that have not yet been harmonised but have the same stem name as UKHLS files have the suffix _bh , but files that are unique to the BHPS do not need to have such suffixes (harmonised or not).

Table 1 lists the main data files such as indresp which includes information collected during adult interviews, youth which includes information collected during youth interviews. To avoid creating very large files, some information collected during adult interviews are provided as separate smaller multi-level levels, see Table 3 and Table 4 for a list of these files.

Stable characteristics – information collected once during the first interview

Some stable information such as date of birth, ethnicity, country of birth are collected in the first time a person is interviewed. So, while for the core sample members this will be asked in the initial wave, for those joining the household after that, they will be asked in the wave they joined. To make it easier for data users, this information has been asked in different waves for different respondents and has been put in one individual level file, xwavedat. There are a few other such cross-wave files, which all begin with “x”. See Table 2 for a list of such files.

Some information is collected about the interview and sampling process, such as number of calls made by the interviewer, outcome of each call, interviewer ID, the information the interviewer collects about the condition of the property and neighbourhood, time taken to complete a questionnaire module and so on. See Table 5 for a list of these files.

Some datafiles are particularly useful for analysts, irrespective of their area of research.

Table 1: List of main data files

FilenameDescription
w_indall bw_indallHousehold grid data for all persons in the household, including children and non-repondents. The variable pidp or the combination of variables “w_hidp w_pno” uniquely identifies each row of w_indall. The variable pidp or the combination of variables “bw_hidp bw_pno” uniquely identifies each row of bw_indall.
w_hhresp bw_hhrespSubstantive data collected from responding households. The variable w_hidp uniquely identifies each row in b_hhresp. The variable bw_hidp uniquely identifies each row in bw_hhresp.
w_indresp bw_indrespSubstantive data collected from responding adults (16+) including proxies. Some information collected in these questionnaires are better presented in multi-level files (see Table 2). The variable pidp or the combination of variables “w_hidp w_pno” uniquely identifies each row of w_indresp. The variable pidp or the combination of variables “bw_hidp bw_pno” uniquely identifies each row of bw_indresp.
w_youth bw_youthSubstantive data from youth questionnaire. The variable pidp or the combination of variables “w_hidp w_pno” uniquely identifies each row of w_youthl. The variable pidp or the combination of variables “bw_hidp bw_pno” uniquely identifies each row of bw_youth.
w_child bw_childChildcare, consents and school information of all children (0-15 years) in the household. This is a derived data file collecting information pertaining to children as reported by their parents and guardians in the adult questionnaire. The variable pidp or the combination of variables “w_hidp w_pno” uniquely identifies each row of w_child.
w_egoalt bw_egoaltKin and other relationships between pairs of individuals in the household. This is a derived data file based on information collected in the household grid about relationships between household members. The combination of variables “pidp apidp” or “w_hidp w_pno w_apno” uniquely identifies each row in w_egoalt. The combination of variables “pidp apidp” or “bw_hidp bw_pno bw_apno” uniquely identifies each row in bw_egoalt.
w_income bw_incomeThis file contains reports of unearned income and state benefits for each individual. The combination of variables “pidp w_fiseq” uniquely identifies each row in w_income. The combination of variables “pidp bw_ficode bw_fiseq” uniquely identifies each row in bw_income.

Table 2: List cross-wave files

FilenameDescription
xwavedatStable characteristics of individuals, such as date of birth, country of birth, ethnicity, which is typically collected only once in the lifetime of the Study are picked from different data files and put into this file. This file now includes all sample members ever enumerated in either Understanding Society and BHPS and variables have been harmonised across studies where possible. The variable pidp uniquely identifies each row.
xivdata xivdata_bhSome basic information about interviewers is stored in these files (non-harmonised). These are available in the Special License version of the survey SN8579.
xwaveid xwaveid_bhSome basic sampling information from each wave such as interview outcomes is included in this file (non-harmonised). The variable pidp uniquely identifies each row in these files.
xwlstenContains information on the latest known sample status of individuals (Only BHPS). The variable pidp uniquely identifies each row.
XhhrelFamily matrix file which allows family members and households to be connected over time. The file compiles existing Understanding Society main survey data, particularly from the egoalt and indall files, and includes every sample member ever enumerated as part of the study. The variable pidp uniquely identifies each row and the variable osm_hh identifies the cross-wave household each pidp belongs to.

Table 3: List of data files about children based on information collected during adult interviews

FilenameDescription
a_natchild f_natchild
n_natchild2
bb_childnt bk_childnt bl_childnt
Some basic information about all biological children born to the sample members, whether co-resident or not. These are collected in the first wave for any sample. So, for example, a_natchild was collected in Wave 1 of UKHLS for GPS, EMBS and bl_childnt was collected in Wave 12 of the BHPS for the Northern Irish boost sample. These files are not harmonised.
 
The combination of variables “pidp w_childno” or “w_hidp w_pno w_childno” uniquely identifies each row of the files w_natchild. The combination of variables “pidp bw_lncno” or “bw_hidp bw_pno bw_lncno” uniquely identifies each row of the files bw_childnt.
a_adopt
f_adopt
n_adopt
n_stepchild
bb_childad bk_childad bl_childad
Some basic information about all adopted and step children born to the sample members, whether co-resident or not. These are collected in the first wave for any sample. So, for example, a_adopt was collected in Wave 1 for GPS, EMBS and bl_childat was collected in Wave 12 of the BHPS for the Northern Irish boost sample.  These files are not harmonised. Note that in Wave 14, information about stepchildren for GPS2 was collected separately from that of adopted children and is available in a separate file n_stepchild.
 
The combination of variables “pidp w_adoptno” or “w_hidp w_pno w_adoptno” uniquely identifies each row of the files w_adopt. The combination of variables “pidp bw_lacno” or “bw_hidp bw_pno bw_lacno” uniquely identifies each row of the files bw_childad.
w_newbornEvery wave after Wave 1, basic information about new born children such as birth weight, etc. is collected from new parents. The combination of variables “pidp w_newchno” or “w_hidp w_pno w_newchno” uniquely identifies each row of the files w_newborn
w_chmainInformation about child maintenance arrangements was collected in waves 3, 5, 7, 9,…The combination of variables “pidp c_absparno” or “c_hidp c_pno c_absparno” uniquely identifies each row in c_chmain. The combination of variables “pidp w_childpno” or “w_hidp w_pno w_childpno” uniquely identifies each row in w_chmain where w is e, g, I, k,…
w_parstyleEvery wave from onwards Wave 4, information about parenting styles was collected. The combination of variables “pidp w_childpno” or “w_hidp w_pno w_childpno” uniquely identifies each row of the files w_parstyle

Table 4: List of data files about partnerships, jobs and employment histories based on information collected during adult interviews

FilenameDescription
a_marriage f_marriage ba_marriag bk_marriag bl_marriagStart and end dates of past marriages and how that marriage ended was collected during adult interviews in the first wave a sample was selected. So, for example, a_marriage was collected in Wave 1 for GPS & EMBS (non-harmonised). ] The combination of variables “pidp w_marno” or “w_hidp w_pno w_marno” uniquely identifies each row of the files w_marriage. The combination of variables “pidp bw_marno” or “bw_hidp bw_pno w_bmarno” uniquely identifies each row of the files bw_marriag.
a_cohabit
f_cohabit
bb_cohab
bk_cohab
bl_cohab
Start and end dates of past cohabitations and how that cohabitation ended was collected during adult interviews in the first wave a sample was selected. So, for example, a_cohabit was collected in Wave 1 for GPS & EMBS (non-harmonised). The combination of variables “pidp w_cohabno” or “w_hidp w_pno w_cohabno” uniquely identifies each row of the files w_cohab. The combination of variables “pidp bw_lcsno” or “bw_hidp bw_pno bw_lcsno” uniquely identifies each row of the files bw_cohabit.
bw_jobhist (bw_jobhistd)Contains information about employment history between two waves collected during adult interviews (Only BHPS). The combination of variables “pidp bw_jspno” or “bw_hidp bw_pno bw_jspno” uniquely identifies each row in these files.
a_empstat
e_empstat bw_lifemst bk_lifemst
bl_lifemst
Employment history was collected during adult interviews in Wave 1 for part of the GPS & EMB samples and in Wave 5 for rest of the samples, this was not asked for the IEMBS in Wave 6 (non-harmonised). The combination of variables “pidp w_spellno” or “w_hidp w_pno w_empstat” uniquely identifies each row of w_empstat. The combination of variables “pidp bw_leshno” or “bw_hidp bw_pno bw_leshno” uniquely identifies each row of bw_lifemst.
bc_lifejobContains information about jobs held in employment spells (Only BHPS). The combination of variables “pidp bw_ljseq” or “bw_hidp bw_pno bw_ljseq” uniquely identifies each row of this file.

Table 5: Paradata and interview related files

w_hhsamp bw_hhsampThis file contains information about each household that the interviewer collects about the condition of the property, neighbourhood, interview outcome and so on. The variable w_hidp uniquely identifies each row in b_hhsamp. The variable bw_hidp uniquely identifies each row in bw_hhsamp.
w_indsamp bw_indsampIncludes current interview outcome for anyone enumerated in the last interview wave, for example, whether they have respondended, only enumerated, couldn’t be contacted or refused, or were ineligible. If you restrict the data to cases where w_finloc=1/bw_finloc=1, then pidp uniquely identifies each row.
w_callrecIncludes information about each interview call made to each household, such as outcome of the call, interview ID. The combination of variables “w_hidp w_issueno w_callno” uniquely identifies each row in this file.
  
Timings FilesVarious files are available that capture the time taken to complete questions and modules within individual and household questionnaires. Given that these files vary from wave to wave and are of limited, specialist use only, they are not released as standard. If you want to make use of them please contact usersupport@understandingsociety.ac.uk who will be happy to advise.

Email newsletter

Sign up to our newsletter