List of data files and their descriptions

All data files released under the Main study SN6614 EUL version are listed below. The Special License version of the survey SN6931 includes the same files but with some additional variables and for some income variables the non-top-coded values of those variables. The Secure Access version of the survey data is available as SN6676. It includes all files in the Special Licence version and files that contain 3 variables relating to the National Grid Reference for each household: Easting, Northing and positional quality indicator (OSGRDIND). This Secure Access version also includes variables for the full dates of birth for Understanding Society and BHPS respondents. The different access level are explained on the Data Access page.

Data collected from different sources (e.g. the household interview, the adult interview, the youth interview) are stored in separate files. Each wave has a set of such files. To make it easier to use, files have the same root name, but begin with a letter prefix to reflect the wave the data are collected. So, “a_” for the first wave, “b_” for the second wave (in this user guide we have used “w_” to denote waves in general). From the Wave 7 data release onward (November 2017) Understanding Society-harmonised BHPS data files are also included. Most files exist for both studies and if they do and have been harmonised, the file stem name will match. Wave-specific harmonised BHPS files can be identified by the wave prefix bw_. Note: BHPS data files that have not yet been harmonised but have the same stem name as UKHLS files have the suffix _bh , but files that are unique to the BHPS do not need to have such suffixes (harmonised or not).

Table 1 lists the main data files such as indresp which includes information collected during adult interviews, youth which includes information collected during youth interviews. To avoid creating very large files, some information collected during adult interviews are provided as separate smaller multi-level levels, see Table 3 and Table 4 for a list of these files.

Stable characteristics – information collected once during the first interview

Some stable information such as date of birth, ethnicity, country of birth are collected in the first time a person is interviewed. So, while for the core sample members this will be asked in the initial wave, for those joining the household after that, they will be asked in the wave they joined. To make it easier for data users, this information has been asked in different waves for different respondents and has been put in one individual level file, xwavedat. There are a few other such cross-wave files, which all begin with “x”. See Table 2 for a list of such files.

Some information is collected about the interview and sampling process, such as number of calls made by the interviewer, outcome of each call, interviewer ID, the information the interviewer collects about the condition of the property and neighbourhood, time taken to complete a questionnaire module and so on. See Table 5 for a list of these files.

Some datafiles are particularly useful for analysts, irrespective of their area of research.

Table 1: List of main data files

Filename	Description
w_indall bw_indall	Household grid data for all persons in the household, including children and non-repondents. The variable pidp or the combination of variables “w_hidp w_pno” uniquely identifies each row of w_indall. The variable pidp or the combination of variables “bw_hidp bw_pno” uniquely identifies each row of bw_indall.
w_hhresp bw_hhresp	Substantive data collected from responding households. The variable w_hidp uniquely identifies each row in b_hhresp. The variable bw_hidp uniquely identifies each row in bw_hhresp.
w_indresp bw_indresp	Substantive data collected from responding adults (16+) including proxies. Some information collected in these questionnaires are better presented in multi-level files (see Table 2). The variable pidp or the combination of variables “w_hidp w_pno” uniquely identifies each row of w_indresp. The variable pidp or the combination of variables “bw_hidp bw_pno” uniquely identifies each row of bw_indresp.
w_youth bw_youth	Substantive data from youth questionnaire. The variable pidp or the combination of variables “w_hidp w_pno” uniquely identifies each row of w_youthl. The variable pidp or the combination of variables “bw_hidp bw_pno” uniquely identifies each row of bw_youth.
w_child bw_child	Childcare, consents and school information of all children (0-15 years) in the household. This is a derived data file collecting information pertaining to children as reported by their parents and guardians in the adult questionnaire. The variable pidp or the combination of variables “w_hidp w_pno” uniquely identifies each row of w_child.
w_egoalt bw_egoalt	Kin and other relationships between pairs of individuals in the household. This is a derived data file based on information collected in the household grid about relationships between household members. The combination of variables “pidp apidp” or “w_hidp w_pno w_apno” uniquely identifies each row in w_egoalt. The combination of variables “pidp apidp” or “bw_hidp bw_pno bw_apno” uniquely identifies each row in bw_egoalt.
w_income bw_income	This file contains reports of unearned income and state benefits for each individual. The combination of variables “pidp w_fiseq” uniquely identifies each row in w_income. The combination of variables “pidp bw_ficode bw_fiseq” uniquely identifies each row in bw_income.
w_employment	This file contains information about sample members current (one or more) jobs. Previously this information was provided in wide format, contained in the indresp file. The combination of variables “pidp w_jobcode” uniquely identifies each row in w_employment.

Table 2: List cross-wave files

Filename	Description
xwavedat	Stable characteristics of individuals, such as date of birth, country of birth, ethnicity, which is typically collected only once in the lifetime of the Study are picked from different data files and put into this file. This file now includes all sample members ever enumerated in either Understanding Society and BHPS and variables have been harmonised across studies where possible. The variable pidp uniquely identifies each row.
xivdata xivdata_bh	Some basic information about interviewers is stored in these files (non-harmonised). These are available in the Special License version of the survey SN8579.
xwaveid xwaveid_bh	Some basic sampling information from each wave such as interview outcomes is included in this file (non-harmonised). The variable pidp uniquely identifies each row in these files.
xwlsten	Contains information on the latest known sample status of individuals (Only BHPS). The variable pidp uniquely identifies each row.
Xhhrel	Family matrix file which allows family members and households to be connected over time. The file compiles existing Understanding Society main survey data, particularly from the egoalt and indall files, and includes every sample member ever enumerated as part of the study. The variable pidp uniquely identifies each row and the variable osm_hh identifies the cross-wave household each pidp belongs to.

Table 3: List of data files about children based on information collected during adult interviews

Filename	Description
a_natchild f_natchild n_natchild2 bb_childnt bk_childnt bl_childnt	Some basic information about all biological children born to the sample members, whether co-resident or not. These are collected in the first wave for any sample. So, for example, a_natchild was collected in Wave 1 of UKHLS for GPS, EMBS and bl_childnt was collected in Wave 12 of the BHPS for the Northern Irish boost sample. These files are not harmonised. The combination of variables “pidp w_childno” or “w_hidp w_pno w_childno” uniquely identifies each row of the files w_natchild. The combination of variables “pidp bw_lncno” or “bw_hidp bw_pno bw_lncno” uniquely identifies each row of the files bw_childnt.
a_adopt f_adopt n_adopt w_adopt2 n_stepchild w_stepchild2 bb_childad bk_childad bl_childad	Some basic information about all adopted and step children born to the sample members, whether co-resident or not. These are collected in the first wave for any sample. So, for example, a_adopt was collected in Wave 1 for GPS, EMBS and bl_childat was collected in Wave 12 of the BHPS for the Northern Irish boost sample. These files are not harmonised. Note that in Wave 14, information about stepchildren for GPS2 was collected separately from that of adopted children and is available in a separate file n_stepchild. From onwards Wave 14, due to some changes to the questions in this module, the name of the file which includes this information was changed to n_natchild2. Also note that from onwards Wave 14, these questions are asked every wave not just in the initial wave a sample is added. The combination of variables “pidp w_adoptno” or “w_hidp w_pno w_adoptno” uniquely identifies each row of the files w_adopt. The combination of variables “pidp bw_lacno” or “bw_hidp bw_pno bw_lacno” uniquely identifies each row of the files bw_childad.
nonresch	Some basic information about non-resident children born to sample members is collected from onwards Wave 14 and included in these files. n_nonresch is available in Wave 14. From onwards Wave 15 as some questions were changed, the filename was changed as well to o_nonresch2.
o_twinsfh	Some basic information about whether twins are present in the household.
w_newborn	Every wave after Wave 1, basic information about new born children such as birth weight, etc. is collected from new parents. The combination of variables “pidp w_newchno” or “w_hidp w_pno w_newchno” uniquely identifies each row of the files w_newborn.
w_chmain	Information about child maintenance arrangements was collected in waves 3, 5, 7, 9,…The combination of variables “pidp c_absparno” or “c_hidp c_pno c_absparno” uniquely identifies each row in c_chmain. The combination of variables “pidp w_childpno” or “w_hidp w_pno w_childpno” uniquely identifies each row in w_chmain where w is e, g, I, k,…
w_parstyle	Every wave from onwards Wave 4, information about parenting styles was collected. The combination of variables “pidp w_childpno” or “w_hidp w_pno w_childpno” uniquely identifies each row of the files w_parstyle.

Table 4: List of data files about partnerships, jobs and employment histories based on information collected during adult interviews

Filename	Description
a_marriage f_marriage ba_marriag bk_marriag bl_marriag	Start and end dates of past marriages and how that marriage ended was collected during adult interviews in the first wave a sample was selected. So, for example, a_marriage was collected in Wave 1 for GPS & EMBS (non-harmonised). ] The combination of variables “pidp w_marno” or “w_hidp w_pno w_marno” uniquely identifies each row of the files w_marriage. The combination of variables “pidp bw_marno” or “bw_hidp bw_pno w_bmarno” uniquely identifies each row of the files bw_marriag.
a_cohabit f_cohabit bb_cohab bk_cohab bl_cohab	Start and end dates of past cohabitations and how that cohabitation ended was collected during adult interviews in the first wave a sample was selected. So, for example, a_cohabit was collected in Wave 1 for GPS & EMBS (non-harmonised). The combination of variables “pidp w_cohabno” or “w_hidp w_pno w_cohabno” uniquely identifies each row of the files w_cohab. The combination of variables “pidp bw_lcsno” or “bw_hidp bw_pno bw_lcsno” uniquely identifies each row of the files bw_cohabit.
bw_jobhist (bw_jobhistd)	Contains information about employment history between two waves collected during adult interviews (Only BHPS). The combination of variables “pidp bw_jspno” or “bw_hidp bw_pno bw_jspno” uniquely identifies each row in these files.
a_empstat e_empstat bw_lifemst bk_lifemst bl_lifemst	Employment history was collected during adult interviews in Wave 1 for part of the GPS & EMB samples and in Wave 5 for rest of the samples, this was not asked for the IEMBS in Wave 6 (non-harmonised). The combination of variables “pidp w_spellno” or “w_hidp w_pno w_empstat” uniquely identifies each row of w_empstat. The combination of variables “pidp bw_leshno” or “bw_hidp bw_pno bw_leshno” uniquely identifies each row of bw_lifemst.
bc_lifejob	Contains information about jobs held in employment spells (Only BHPS). The combination of variables “pidp bw_ljseq” or “bw_hidp bw_pno bw_ljseq” uniquely identifies each row of this file.

Table 5: Paradata and interview related files

w_hhsamp bw_hhsamp	This file contains information about each household that the interviewer collects about the condition of the property, neighbourhood, interview outcome and so on. The variable w_hidp uniquely identifies each row in b_hhsamp. The variable bw_hidp uniquely identifies each row in bw_hhsamp.
w_indsamp bw_indsamp	Includes current interview outcome for anyone enumerated in the last interview wave, for example, whether they have respondended, only enumerated, couldn’t be contacted or refused, or were ineligible. If you restrict the data to cases where w_finloc=1/bw_finloc=1, then pidp uniquely identifies each row.
w_callrec	Includes information about each interview call made to each household, such as outcome of the call, interview ID. The combination of variables “w_hidp w_issueno w_callno” uniquely identifies each row in this file.

Timings Files	Various files are available that capture the time taken to complete questions and modules within individual and household questionnaires. Given that these files vary from wave to wave and are of limited, specialist use only, they are not released as standard. If you want to make use of them please contact usersupport@understandingsociety.ac.uk who will be happy to advise.

List of data files and their descriptions

Stable characteristics – information collected once during the first interview

Some datafiles are particularly useful for analysts, irrespective of their area of research.

What else is Understanding Society doing?

Weather’s effect on how we feel about life and money

I will survive – can art make us healthy?

New briefing note: Growth with purpose

Barriers to Opportunity report published

Email newsletter