Frequently Asked Questions

A list of the most frequently asked questions regarding Add Health data. It is recommended that users review the documentation page for any additional questions not addressed here.

Open All Panels and press Ctrl-F to search

What is the correct name of the study?

The correct name of the study is: The National Longitudinal Study of Adolescent to Adult Health. It should be abbreviated as: Add Health.


How do I briefly describe the Add Health sample design?

A sample of 80 high schools and 52 middle schools from the US was selected with unequal probability of selection. Incorporating systematic sampling methods and implicit stratification into the Add Health study design ensured this sample is representative of US schools with respect to region of country, urbanicity, school size, school type, and ethnicity.


How do I cite the Add Health research design information found on the web site?

Harris, K.M., C.T. Halpern, E.A. Whitsel, J.M. Hussey, L. Killeya-Jones, J. Tabor, and S.C. Dean.  2019. Cohort Profile: The National Longitudinal Study of Adolescent to Adult Health (Add Health). International Journal of Epidemiology 48(5):1415-1425 https://doi.org/10.1093/ije/dyz115.


What acknowledgment should be included in each written report or other publication based on analysis of data from Add Health?

The Add Health contract and data use agreement require that the following be included:

This research uses data from Add Health, funded by grant P01 HD31921 (Harris) from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), with cooperative funding from 23 other federal agencies and foundations. Add Health is currently directed by Robert A. Hummer and funded by the National Institute on Aging cooperative agreements U01 AG071448 (Hummer) and U01AG071450 (Aiello and Hummer) at the University of North Carolina at Chapel Hill. Add Health was designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill.

Note: Use of this acknowledgment requires no further permission from the persons named.


What acknowledgment should be included when using questions from the Add Health survey in my study?

These questions are from Add Health, funded by grant P01 HD31921 (Harris) from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), with cooperative funding from 23 other federal agencies and foundations. Add Health is currently directed by Robert A. Hummer and funded by the National Institute on Aging cooperative agreements U01 AG071448 (Hummer) and U01AG071450 (Aiello and Hummer) at the University of North Carolina at Chapel Hill. Add Health was designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill. Information on how to obtain the Add Health data files is available on the Add Health website (https://addhealth.cpc.unc.edu). No direct support was received from grant P01-HD31921 for this project.


Regarding the new NIH public access policy, should we include the Add Health grant when our papers are entered into the NIHMS system?

If you have not received direct support from the Add Health Program Project, please use the following acknowledgment statement to satisfy the requirements of the new NIH Public Access Policy:

This research uses data from Waves I-V of Add Health, grant P01 HD31921 (Harris) from Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), with cooperative funding from 23 other federal agencies and foundations. Add Health was designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill.  Add Health is directed by Robert A. Hummer and funded by the National Institute on Aging cooperative agreements U01 AG071448 (Hummer) and U01AG071450 (Aiello and Hummer) at the University of North Carolina at Chapel Hill. No direct support was received from grant P01-HD31921 or cooperative agreements U01 AG071448 and U01AG071450.


Regarding the new NIH public access policy, how do I make sure my journal articles are assigned PubMed Central Reference Numbers (PMCIDs)?

Journal articles published using Add Health data must be submitted to PubMed Central to receive a PMCID.  The method of PubMed Central submission and Investigator responsibility for submission depend on the journal and journal publisher.

  1. Some journals automatically submit published articles to PubMed Central.  For a list of journals that submit articles to PubMed Central please visit the NIH website: https://publicaccess.nih.gov/submit_process_journals.htm.
  2. Some journal publishers may submit the articles to PubMed Central automatically or upon request by the author.  For a list of journal publishers that submit articles to PubMed Central please visit the NIH website: https://publicaccess.nih.gov/select_deposit_publishers.htm#b.
  3. If neither the journal nor the journal publisher will submit the article to PubMed Central, the Investigator will be responsible to submit the final peer-reviewed manuscript to PubMed Central via the NIH Manuscript Submission System (NIHMS).  For detailed instructions on the process of submitting a journal article to PubMed Central, please see the NIH website: https://publicaccess.nih.gov/submit_process.htm.
  4. If you have any problems with this process, please contact the NIHMS or PubMed help desk.

Why do my journal articles need PubMed Central Reference Numbers (PMCIDs)?

NIH policy requires that “[a]nyone submitting an application, proposal or report to the NIH must include the PMC reference number (PMCID) when citing applicable papers that they author or that arise from their NIH-funded research.”


How do I cite the Add Health contractual data?

The recommended citation for an Add Health contractual data set is:

Harris, Kathleen Mullan. 2018. The National Longitudinal Study of Adolescent to Adult Health (Add Health), Waves I & II, 1994–1996; Wave III, 2001–2002; Wave IV, 2007-2009; Wave V, 2016-2018 [machine-readable data file and documentation]. Chapel Hill, NC: Carolina Population Center, University of North Carolina at Chapel Hill. 


How do I obtain a copy of the monograph “Reducing the risk: connections that make a difference in the lives of youth”?

You may view the monograph online or obtain a printed copy by sending an email or letter to:

Reducing the Risk
Adolescent Health Program
University of Minnesota
Box 721, UMHC
420 Delaware Street SE
Minneapolis, MN 55445


Can researchers provide copies of the Add Health data to journal editors who request it?

Add Health adheres to the NIH policy on data sharing but due to the sensitive nature of Add Health data access is limited and governed by the Add Health data management security plan; therefore, authors are unable to provide Add Health data to journal editors. While authors may not provide Add Health data to the editors, they may provide the program code used to construct variables and analyze the data. Editors may obtain a copy of the data under the terms and conditions as described on the Add Health website at: https://addhealth.cpc.unc.edu/data/.


Do the Add Health data contain any identifiers?

Add Health data files do not contain respondent identifiers or any links to identifiers.  The data do contain constructed ID numbers (de-identified) which are necessary to allow researchers to link data across the waves and to friends and partners.


Add Health participants provided written informed consent for participation in all aspects of Add Health in accordance with the University of North Carolina School of Public Health Institutional Review Board guidelines that are based on the Code of Federal Regulations on the Protection of Human Subjects 45CFR46: https://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.html.

Examples of these forms are available in the Wave III Documentation and Wave IV Documentation files.

Were foster youth included in the Add Health study? If so, how many were surveyed?

In the Add Health Wave I in-home interview, 61 respondents out of 20,745 were identified as living with a foster mother and/or foster father.


How were adolescents identified as eligible for special oversamples for the in-home interview?

An adolescent’s answer to a specific question or questions on the In-School Questionnaire determined his or her eligibility for inclusion in an oversample. For example, an adolescent who marked “Chinese” as his or her Asian or Pacific Islander background was eligible for the Chinese oversample. The genetic oversamples were identified in two ways. All adolescents who indicated they were twins were sampled with certainty. When an adolescent indicated at least one other household member in grades 7 through 12 with whom he or she did not share a biological mother and/or biological father, they were added to the pool of potential half-siblings and other non-related adolescents. Full siblings were not oversampled.


How did the interviewer select the parent to interview for the parent questionnaire?

The mother (or other female head of the household) of the originally sampled adolescent will be asked to participate in a 40-minute, interviewer-administered, paper-and-pencil survey regarding health status and behaviors of the adolescent, home environment, and her interpersonal relationships. The parent survey instrument does not contain highly sensitive items about the parent; however, it does ask some sensitive questions about the adolescent. The adolescent’s mother (or other female head of the household) is the preferred respondent to complete the questionnaire because, according to the results of previous studies, mothers are generally more familiar than fathers with the schooling, health status, and health behaviors of their children.

Interviewer instructions for selecting the parent to interview for the Parent Questionnaire are detailed below.

Upon your arrival at the household, ask to speak to the student’s mother, the preferred respondent to complete the Parent Questionnaire. If the student’s mother does not reside in the household, the appropriate respondent is the first person on the following list who lives with the student:

  1. stepmother
  2. other female guardian, such as a legal guardian or grandmother
  3. father
  4. stepfather
  5. other male guardian, such as a legal guardian or grandfather

Do not schedule an interview with a male respondent out of convenience. If the mother, stepmother, or other female guardian lives with the adolescent but is unavailable at the time of your visit, ask your household contact for the best time to reach her. If the preferred female resident refuses to be interviewed, the adolescent’s father, stepfather, or other male guardian may act as respondent.


What is the retention rate for Add Health?

Due to the design of the study, where Wave I seniors were not selected to be interviewed at Wave II, retention rate is not an appropriate statistic to use to describe Add Health study participation. Any calculated retention rate would be misleading. The response rate at each wave is the best indicator to use.


What are the response rates for each wave?

Add Health Response Rates

Wave%
I79
II88.6
III77.4
IV80.3
V*71.8
* Overall effective response rate (OERR); Biemer et al. 2020 (available upon request).

Self-administered sections by wave?

WaveSection(s)Administered
I24-33Computer-Assisted Self-Interview (CASI)
II23-32CASI
III16-29CASI
IV15 and 17-24*CASI
*Section 18 was both interviewer and self administered

Which cases were selected to be re-interviewed at Wave II?

16,706 of the Wave I respondents were selected to be re-interviewed at Wave II. In general, respondents who were seniors at Wave I and were not part of a genetic pair and the disabled sample were not selected to be interviewed at Wave II.


Fielded and interviewed cases at Wave III and IV

WaveWave I In-Home RespondentsWave II GeneticUnfielded CasesFielded CasesInterviewed Cases
III20,745  45687 Wave I cases without a weight and without a genetic sample flag20,10315,197
IV20,745783 determined ineligible19,96215,701

How many of the interviewed Wave III public-use sample were originally selected for the core and high education black samples?

Wave III public-use sample = 4,882

Core sample only = 4,490

High education black sample only = 325

Both samples = 67


When there are gender discrepancies between Wave I and Wave III, how do you know which one is correct?

There are 20 cases in which Wave III gender (BIO_SEX3) does not match the Wave I gender (BIO_SEX). At Wave III, the preloaded gender variable came from the last wave of available data. Eighteen of these inconsistent cases match the Wave II gender (BIO_SEX2) and were confirmed at Wave III as being correct. Of the remaining two inconsistent cases:

  • In one case the Wave III gender, female, was confirmed by the Add Health security manager as being correct.
  • In another case, both the Wave I and Wave II gender are listed as male, which is correct. For this case only, the Wave III gender is incorrect.

General

How does the Public-Use data set compare to the Restricted-Use data set?

Public-use data contains all the data from the In-home Interviews, just a smaller sampling. The smaller sample limits deductive disclosure risk. It is 1/3 of the total Add Health Sample Members (AHSM) who fall into the Core and High Education Black samples. (Note: because some AHSMs are in both samples, one cannot simply add the core sample total to the high education black sample total.) Add Health public-use data can be downloaded from the three sources listed on the Add Health Public-Use Data page.

The sample drawing for public-use data is wave specific. New samples are drawn based on the final N at each wave.

3rd of W13rd of W23rd if W33rd of W4
6915491350665234
3rd of Core & Hi Ed Bl Samples W13rd of Core & Hi Ed Bl Samples W23rd of Core & Hi Ed Bl Samples W33rd of Core & Hi Ed Bl Samples W4
6504483448825114

Public-use data does not contain ID numbers of friends, siblings or romantic partners, so the data cannot be linked. The public-use data also does not contain files on the following. These require a restricted-use contract

  1. Obesity and Neighborhood Environment
  2. Genetics, disposition,
  3. Political context
  4. Alcohol density

How do I determine which data would best fit my research needs?

Add Health has created the following Data Decision chart to help you decide if the public-use data or restricted-use data best fit your research needs, or if your project would be considered an ancillary study.

a flowchart to determine which is the best data for your needs - public use, restricted use, or if your needs fit as an ancillary study.

How much space do I need to accommodate all Add Health data sets that are available?

The Add Health data require less than 4 GB of storage space, but you will also need to have space available for software and temp files created by the software, depending on your computing configuration. Additionally, 4 to 8 GB RAM are needed for processing.


Are geocodes included in the restricted-use data?

Geocodes are not available with the restricted-use data because of deductive disclosure concerns. Add Health, however, has established a set of requirements for investigators seeking to add supplemental contextual data to Add Health. 


What type of geographic data are available?

Region is the only ‘real’ geographical representation in the Add Health restricted-use data files. Nothing below region is available because of deductive disclosure risks. We have pseudo codes for state, county, census tract, and block group.


Can the Add Health data be linked to Census information (neighborhood)?

Add Health does not provide geocodes with the restricted-use data which would allow you to add your own Census data. Many Census variables have been linked to the Add Health data. Descriptions of Add Health contextual data and the codebooks are available from Contextual Data Files on the Codebooks page.

Add Health has established a set of requirements for investigators seeking to add supplemental contextual data to Add Health. A brief introduction to the Ancillary Study proposal process and costs is available in A Guide to Ancillary Studies in Add Health. For more information check Ancillary Studies page. 


Can supplemental contextual or biological data be added to Add Health?

Investigators seeking to add supplemental, contextual, or biological data to Add Health may do so under the auspices of an Add Health ancillary study. An ancillary study is any study that derives support from independent funds outside the Add Health Program Project, and does one or more of the following:

  • Collects new, original questionnaire data on Add Health respondents.
  • Merges secondary data sources onto Add Health respondent or school records and requires personal identifiers (e.g., geocodes) to perform these linkages.
  • Collects new biospecimens from Add Health respondents.
  • Uses archived biospecimens collected by the Add Health study.

For information on ancillary studies and costs please review A Guide to Ancillary Studies in Add Health. Before considering an ancillary study, review the existing Add Health data sets. For more information check Ancillary Studies page.


How do I read a SAS export file?

The following SAS commands will allow you to read a SAS export file:

libname in xport ‘/directory path where file is located/SAS export file name’;
data wave1;
set in.SAS dataset name;
run;

For example, the Add Health data set name on the CD is ALLWAVE1.EXP, the internal SAS data set name is ALLWAVE1, and your CD drive is D:

libname in xport ‘d:allwave1.xpt’;
data wave1;
set in.allwave1;
run;


How do I read a SAS export file with STATA?

If you are using a version of STATA before version 13, use the following STATA command to read a SAS export file.

fdause datasetname.xpt

If you are using STATA 13 and later, use the following STATA command to read a SAS export file.

import sasxport datasetname.xpt

If the SAS file is named data setname.exp, rename the file to data setname.xpt before running the STATA command.


How do I read a SAS export file with SPSS?

The following SPSS command allows you to read a SAS export file.

GET SAS DATA=”\folder\data setname.xpt”.

If the SAS file is named data setname.exp, rename the file to data setname.xpt before running the SPSS command.


What numbers should be used for the NIH Inclusion Enrollment Report?

The Wave I inclusion enrollment numbers are available in this example Inclusion Enrollment Report. In addition, see Cumulative Inclusion Enrollment Report.

What are the codes for anti-hypertensive medications?

The codes for the anti-hypertensive medications are:

‘040-047-xxx’ ‘BETA-BLOCKERS’

‘040-049-156’ ‘THIAZIDE DIURETICS’

‘040-042-xxx’ ‘ACE-INHIBITORS’

‘040-043-xxx’ ‘ANTI-ADRENERGICS (peripherally acting)’

‘040-044-xxx’ ‘ANTI-ADRENERGICS (centrally acting)’

‘040-048-xxx’ ‘CALCIUM CHANNEL BLOCKERS’

‘040-053-xxx’ ‘VASODILATORS’

‘040-056-xxx’ ‘AT2 RECEPTOR BLOCKERS’

‘040-055-xxx’ ‘COMBO ANTI-HYPERTENSIVES’

Anti-hypertensive medication codes used in the following article: Nguyen QC, Tabor JW, Entzel PP, Lau Y, Suchindran C, Hussey JM, Halpern CT, Harris KM, Whitsel EA. Discordance in national estimates of hypertension among young adults. Epidemiology 2011;22(4):532-541.


Sampling and Design Effects

How do I correct for the design effects of the Add Health sampling process?

Guidelines for Analyzing Add Health Data” discusses how to correct for design effects and the unequal probability of selection to ensure that your analysis results are nationally representative with unbiased estimates.


What variables from the public-use data should be used to correct for design effects?

Guidelines for Analyzing Add Health Data” refers to variables from the Add Health restricted-use data.


How were adolescents identified as eligible for special oversamples for the in-home interview?

An adolescent’s answer to a specific question or questions on the In-School Questionnaire determined his or her eligibility for inclusion in an oversample. For example, an adolescent who marked “Chinese” as his or her Asian or Pacific Islander background was eligible for the Chinese oversample. The genetic oversamples were identified in two ways. All adolescents who indicated they were twins were sampled with certainty. When an adolescent indicated at least one other household member in grades 7 through 12 with whom he or she did not share a biological mother and/or biological father, they were added to the pool of potential half-siblings and other, non-related adolescents. Full siblings were not oversampled.


Wave I

How many Wave I in-home respondents have in-school questionnaire data?

15,356 of the Wave I in-home respondents also have in-school data.


What was the response rate for the Wave I school administrator questionnaire?

A total of 132 schools were included in the Add Health Wave I sample. An administrator from each school was asked to complete a questionnaire. The response rate among administrators was 98.5%.


What was the response rate for the Wave I parent questionnaire?

The parent questionnaire response rate was 85.4% for the child-specific data.


What is the best way to compute race in the Add Health Wave I in-home data?

Wave I in-home interview variables* used to construct RACE.

H1GI4 Are you of Hispanic or Latino origin?
H1GI6A What is your race? white
H1GI6B What is your race? black or African American
H1GI6C What is your race? American Indian or Native American
H1GI6D What is your race? Asian or Pacific Islander
H1GI6EWhat is your race? other

A single race variable (RACE) was constructed from the six variables listed above. If the respondent answered “yes” to section 1, question 4 (Are you of Hispanic or Latino origin?), that respondent was given a race designation of “Hispanic” and eliminated from any race category that was marked in section 1, question 6 (What is your race?).

In question 6, respondents were able to mark more than one answer, however they were placed in only one race category in the RACE variable. If the respondent marked “black or African American” and any other race, they were designated as black or African American, and eliminated from the other marked categories. The process was repeated for the remaining race categories in the following order: Asian, Native American, other, and white.

* The racial groups for the in-school questionnaire variables are listed in a different order.

Example code:
/* Hispanic or Latino, All Races */
if h1gi4=1 then race=1;
/* Black or African American, Non-Hispanic */
else if h1gi6b=1 then race=2;
/* Asian or Pacific Islander, Non-Hispanic */
else if h1gi6d=1 then race=3;
/* American Indian or Native American, Non-Hispanic */
else if h1gi6c=1 then race=4;
/* Other, Non-Hispanic */
else if h1gi6e=1 then race=5;
/* White, Non-Hispanic */
else if h1gi6a=1 then race=6;

The information provided in the program code repository by the Add Health team is a service to the Add Health research community. It is provided “as is” with no guarantees as to suitability for a particular purpose.


What is the best way to compute age in the Add Health Wave I in-home data?

To compute a Wave I age variable with the Wave I data, use the following variables and formula:

IMONTH – Month interview completed
IDAY – Day interview completed
IYEAR – Year interview completed
H1GI1M – What is your birth date? month [and year]
H1GI1Y – What is your birth date? [month and] year

The respondent’s age is constructed using the interview completion date and date of birth variables. Because only the month and year of birth are available, 15 is used as the day of birth when calculating age. Consult the Introduction to the Adolescent In-Home Codebook to be sure to take into account the respondents whose birth date and/or interview date is incorrect. Additionally, a few birth dates were corrected during the four waves of data collection so the Wave I date of birth should be compared to the last wave of data for the respondent. The last wave of participation is considered the most correct.

SAS programming code that can be used to construct a Wave I AGE variable using Wave I variables is provided below.

idate=mdy(imonth,iday,iyear); bdate=mdy(h1gi1m,15,h1gi1y); age=int((idate-bdate) / 365.25);

The code to construct Wave I age in Stata is below.

recode h1gilm (96=.), gen (w1bmonth) recode h1gi1y (96=.), gen (w1byear) gen w1bdate = mdy(w1bmonth, 15,1900+w1byear) format w1bdate %d gen w1idate=mdy(imonth, iday,1900+iyear) format w1idate %d gen w1age=int((w1idate-w1bdate)/365.25)

This information is provided by the Add Health team as a service to the Add Health research community. It is provided “as is” with no guarantees as to suitability for a particular purpose.


What does the Wave I variable COMMID represent?

The COMMID variable groups together the respondents who attend the high school and feeder school that make up the 7 – 12 grade span for the strata.


Why are there 1,821 respondents without a Grand Sample Weight at Wave I?

The following Wave I cases could not be weighted:

  • Cases added in the field.
  • Cases selected as a pair (twins, half-sibs) where both were not interviewed.
  • Cases without a sample flag.
  • Respondents from schools outside of the 80 strata.

Wave II

What was the response rate for the Wave II school administrator questionnaire?

A total of 132 schools were included in the Add Health Wave II sample. An administrator from each school was asked to complete a questionnaire. The response rate among administrators was 87.0%.


How do I code gender changes between Wave I and Wave II?

When there is a discrepancy between the Wave I and Wave II gender of a respondent, use the Wave II gender. The restricted-use data include 23 cases in which the Wave I variable BIO_SEX and the Wave II variable BIO_SEX2 do not match. The Wave II data have been confirmed as correct. Wave II includes 7 cases in which the variable SEXFLG2 equals 1. This indicates that the incorrect gender was used to control the questionnaire skips during the interview. The variable BIO_SEX2 was corrected, but answers to questions based on gender will be incorrect.


Wave III

Where can I find the monograph about biomarkers collected in Wave III?

The monograph “Biomarkers in Wave III of the Add Health Study” outlines relevant procedures, design, and sampling schemes used in the collection of biomarker data, and serves as a user guide for analysis and interpretation.


How can I obtain a copy of the first release of the Education Data?

The restricted-use Education Data, collected by the Adolescent Health and Academic Achievement Study, is available to Add Health contract holders. For users who already have a contract, contact Add Health to request an order form for the Education Data. A copy of the public-use version of the file can be downloaded from the ICPSR website.


When I calculate a Wave III respondent age using the birth date (month, 15, year) and date of interview, I do not get the same age for some respondents as the one found in variable CALCAGE3. Why does this happen?

Respondent age calculated during the interviews uses the actual day of birth but is not released with Add Health data. During the Wave III interview, respondent age was calculated by the computer interviewing program and then verified by the respondent. Age discrepancies occur when a respondent is interviewed during his or her birth month.


How many of the interviewed Wave III public-use sample were originally selected for the core and high education black samples?

Wave III public-use sample = 4,882

Core sample only = 4,490

High education black sample only = 325

Both samples = 67


Am I allowed to include quotes from the open-ended question on how a mentor helped the young person, asked in Wave III. Am I allowed to put quotes, in my dissertation?

You may not include any open-ended responses in your dissertation as that is a frequency of 1 which is not allowed.

            “In no table should all cases in any row or column be found in a single cell.”


Wave IV

How many of the interviewed Wave IV public-use sample were originally selected for the core and high education black samples?

Wave IV public-use sample = 5,114

Core sample only = 4,699

High education black sample only = 345

Both samples = 70


How do I recode the variables for Wave IV, Section 21: Criminal Offending and Victimization?

If your Wave IV in-home interview file is dated before March 2012 it will need to be updated with the following:

The Wave IV, Section 21: Criminal Offending and Victimization variables H4DS13 – H4DS20 contain implausible values for some respondents that need to be recoded. To correct these implausible values, 1) you may request an updated Wave IV data file that contains the recoded values, or 2) you may use the SAS program, Recode Wave IV, Section 21_SAS.pdf, or the Stata program, Recode Wave IV, Section 21_Stata.pdf, with your original Wave IV data file to recode these variables. The program will make the following transformations to the data:

  • Temporarily recode values of 6, 8, and 9 for variables H4DS1 –H4DS20 to missing so that these values are not included in the sum of the variables constructed in items 2 and 3.
  • Create a variable, VAR1, that is the sum of variables H4DS1 – H4DS11.
  • Create another variable, VAR2, that is the sum of variables H4DS13 – H4DS20.
  • Recode variables H4DS13 – H4DS20 to missing when VAR1 = 0 and VAR2 = 8.
  • There will now be 1,424 observations in the restricted-use data file with a value of missing for variables H4DS13 – H4DS20. The public-use file will have 442 observations with a value of missing.

Wave V Data Re-release

Three ineligible cases were identified in an earlier release of Wave V data sets. We have amended affected Wave V datasets June 2021 by retaining the three AIDs in the data files, but all related responses have been recoded to be missing values. The following datasets are involved:

Wave V Mixed-Mode Survey (includes Section 16b and Survey Medications)
Wave V Constructed Age
Wave V Mover Distance
Wave I, II, III, IV, & V Grouping Data

The three cases should be removed from data analysis moving forward. This will have no effect on your analyses or use of weights (your N will be reduced by 3 i.e. N=12,297) going forward. Please download the updated datasets from your application in the CPC Data Portal.

General

Where can I find the contract or data use agreement?

Go to CPC Data Portal. The CPC Data Portal is where you will create an account and upload requirements to complete your Add Health Restricted-Use Data Contract. There you will find documentation on how to get started. The Add Health Data Use Agreement (DUA) can specifically be found under Core Files: Requirements: Add Health DUA.


What are the requirements to be eligible to apply for a contract?

Investigators must meet the following criteria:

  • A. Have a PhD or other terminal degree; and
  • B. Hold a faculty appointment or research position at Institution

Institution must meet the following criteria:

  • A. Be an institution of higher education, a research organization, or a government agency
  • B. Have a demonstrated record of using sensitive data according to commonly accepted standards of research ethics

Who has the authority to sign as Institutional Representative?

  • Must be someone not on the contract in any other role.
  • Must be someone able to enter a legal agreement on behalf of your institution. 
  • Must be someone who works at the Office of Sponsored Research or Contracts office.  

How do I apply for Add Health data?

To apply for restricted-use data or Romantic Pairs data, go to CPC Data Portal.


What is the processing fee for Add Health data?

Fee information is available on the CPC Data Portal


How can I pay for the Add Health data?

Payment can be by credit card, check or money order (your personal check or from your institution).

Go to CPC Data Portal. Download and complete Investigator Information Page, Add Health will upload an invoice with instructions.


Is Add Health data regulated by HIPPA?

The UNC IRB has confirmed that, at this time, Add Health Waves I-V data is NOT considered Protected Health Information (PHI) and therefore is not regulated by HIPAA.

Add Health does NOT disseminate any personally identifiable information (PII) data via the restricted-use contracts, so restricted-use contract holders will not receive PII.


Are the Add Health data de-identified?

Yes, Add Health data are de-identified. Any “Personal Identifying Information” (PII) has been stripped from the data.


Will you list my publications and presentations on Add Health’s website?

Absolutely. Any results produced from analysis using the public-use data or the restricted-use data are eligible for posting. Please email the complete reference for your publication or presentation to Add Health Publications. And for publications, please include the email address of the first author.


I don’t wish to renew my contract. What do I need to do to terminate it?

To terminate your Contract, download and complete the contract termination form and email to Add Health Contracts.


Where do I send the CDs when I terminate my contract?

Return CDs to Add Health Contracts by UPS or FedEx with tracking number and signature required to:

Add Health Contracts 
Carolina Population Center
UNC-Chapel Hill
Carolina Square, Suite 210 
123 West Franklin Street 
Chapel Hill, NC 27516


Adding Researchers or Staff to a Contract

How do I add researchers, collaborators, officemates, or information technology staff to my contract?

Add Health now processes all request through the CPC Data Portal. (If you have Add Health forms on file, please discard them, as the Portal will always have the most current forms). To add researchers to a contract, go to CPC Data Portal 

  • Log In (Right upper corner) and go to “Applications tab”
  • Click the “User List” button
  • Fill out all the fields (Last name, First Name, role, email (optional), Access Location)
  • Click “Add” button
  • Click on contract ID “xxxxxxxxx” to go back to the requirements page
  • Go to the requirement: Add Health Supplemental Agreement
    * If this requirement had been previously approved, you will see “Click here to add more documents”
    * Download the blank form  
    * Upload the completed form(s)
  • Go to the requirement: Add Health Security Pledge
    * If this requirement had been previously approved, you will see “Click here to add more documents”
    * Download the blank form  
    * Upload the completed form(s)

What is the procedure for adding Information Technology Staff who will have access to the data but will not use the data for analysis?

Go to the CPC Data Portal

  • Log In (Right upper corner) and go to “Applications tab”
  • Click the “User List” button
  • Fill out all the fields (Last name, First Name, role, email (optional), Access Location)
  • Click “Add” button
  • Click on contract ID “xxxxxxxxx” to go back to the requirements page
    * If this requirement had been previously approved, you will see “Click here to add more documents”
    * Download the blank form  
    * Upload the completed form(s)
  • Go to the requirement: Add Health Security Pledge(s)
    * If this requirement had been previously approved, you will see “Click here to add more documents”
    * Download the blank form  
    * Upload the completed form(s)

Data Access, Storage, and Security

Can I save my temporary data analysis files after I terminate the contract and return the data CDs?

No. You can create a constructed variable file that contains only the variables that you’ve created with no original data components. This variable file does not have to be deleted every six months. This file should be sent to Add Health upon contract termination and we will securely store the CD for 3 years, or until a new contract is established.


How can we consolidate the data storage and security administration for two Add Health contracts at the same institution?

  • One researcher (R1) decides to be responsible for the data.
  • The non-responsible researcher (R2) will return his/her copy of the data, submit a final annual report, and terminate the contract.
  • R2 and data users listed in the terminated contract will be added to R1’s contract as supplemental researchers.
  • R1 (or a systems admin) trains the new users about data access and security for the new contract. All users from R2’s contract must be able to have access to the one copy of the data from R1’s contract.

If there are many users located in different buildings, it’s helpful if the institution’s computing is centralized so that all accounts are on one server.


What are the requirements to request Wave IV Ambient Air Pollutants Data?

The following items are required before this request can be approved. Please submit, as necessary, the documents listed below.

  • Data Analysis Plan (maximum one page; indicate time for completion)
  • Completed Affiliate Form (will be provided upon receipt of this order form) – Due to the sensitive of this data approved users can only access the data remotely on the Add Health Linux Server
  • Completed remote access form which can be found on the CPC Data Portal.

Can we receive the Wave IV Ambient Air Pollutants data on a CD, and we will use it in our cold room?

Due to the sensitive nature of the Ambient Air Pollutants data, approved users can only access the data on the Add Health Linux Server for a limited timeframe (6 weeks).


Wave III Romantic Pairs Data Contract

What’s the difference between the standard data contract and the Romantic Pairs contract?

The Romantic Pairs data are available through a separate contract, which is available upon request. The main differences between the standard contract and the Romantic Pairs contract are the renewal schedule, security plan requirements, and access to the data:

  • The Romantic Pairs contract must be renewed every two years, while the standard contract is on a three-year renewal cycle. Renewal of the Romantic Pairs contract requires an annual report, an updated Data Security plan, and Institutional Signatures.
  • There are several security plan options for housing the standard contract data, but the Romantic Pairs data must be housed on a separate, stand-alone computer.
  • Only one person may access the Romantic Pairs data, while the standard contract allows more flexibility. If there are other researchers interested in working with the Romantic Pairs data, they must apply for separate contracts.

Can we put the Romantic Pairs data on the computer that researchers currently use to analyze the Restricted-Use data?

No. Only one user can work with Romantic Pairs data. Storing it on a computer accessible to multiple users is not permitted because this is the most sensitive and restrictive data disseminated by Add Health. For easier access, we developed a modified plan that approximates the security of a cold room at the researcher’s institution. To maintain security, it is important that the requirements are strictly followed.


I am currently working with the Restricted-Use data and would like to get the 1,507 Romantic Pairs sample in Wave III. Are they included in the data that we already have?

Wave III Romantic Pairs data are not part of the restricted-use data. These data are among the most sensitive disseminated by Add Health and available only through a separate, single-user contract. The contract requires additional security (stand-alone computer in a locked office and use only by one researcher). Your current PI (or another PhD researcher) would need to enter into a separate contract in order to get Romantic Pairs data.


It appears we can not have access to both romantic pairs and HIV data sets simultaneously – why?

Deductive disclosure becomes an even greater issue when the two data files are linked. The file becomes a “couple” file, which creates more unique data. This is a risk that Add Health is preventing.


The romantic partner and HIV are two different data sets, correct?

An HIV contract includes Add Health respondent data and HIV results data. The additional interviews with Wave III partners of Add Health respondents are not included.


The romantic partner data does not contain information on HIV, is this correct?

The Wave III Romantic Pairs contract includes Wave III Add Health respondent interview data, Wave III partner interview data, and a file that allows you to link the partners. The HIV results data are not included with this contract, however, the results of the other STIs are available for both the Add Health respondent and the partner with the Romantic Pairs contract.


Going to the CPC Data Portal, when I add the HIV data to the cart – I do not see it there, only the romantic pairs data. Is this data inaccessible?

That is correct.  Because of the deductive disclosure risk and the sensitive nature of the data information on how to get the HIV data is provided by request only.


If it is not possible to have the romantic pairs and HIV data together, how do we study the relationship between romantic pairs and HIV?

That type of analyses is not possible with the Add Health data. 

What is the UNC SRW?

The UNC SRW is the Secure Research Workspace (SRW) hosted by the University of North Carolina at Chapel Hill’s (UNC) Research Computing division. The UNC SRW provides a digital collaborative environment for teams of researchers to work with regulated data. The platform supports many projects working with a variety of data sensitivity levels and computational needs. The system adheres to a NIST 800-53 System Security Plan, often supporting DUA requirements. The user experience in the UNC SRW is via remote desktop, using the VMWare Horizon View client. Both the UNC SRW and the Horizon View client will be provided to Add Health researchers free of charge, as will most statistical software applications (e.g., SAS, Stata, R, MPlus, MatLab). (https://its.unc.edu/research-computing/secure-research-workspace/).


Is there a fee involved? 

No, there is no fee required to use the UNC SRW or the Horizon View client to connect to the UNC SRW. In addition, SAS, Stata, R, MPlus, and MatLab are also provided at no cost. There is a cost involved for each license needed for SPSS or HLM. See below for what to do if you need software not on the UNC SRW.


What OS does the remote desktop run? Some of our users prefer Mac and others Windows, but none use Linux.

Currently, Windows 10 Enterprise is the default virtual desktop operating system. Windows 11 is a possible upgrade in the near future. Users with a Mac or Windows machine are able to remote into the virtual desktop at the UNC SRW using the Horizon View client, but once in, they will be working on a Windows 10/11 desktop platform.


What software do researchers need to install on their Windows or Mac computers to access the UNC SRW? Are researchers able to use a web interface to login to the UNC SRW?

We are using the Horizon View client, and we are providing that client free of charge to the researchers. Currently, the Horizon View client is the only way to access the UNC SRW.


Will the VMs in the UNC SRW be powerful enough to run my data analysis?

The default VM has 4 processors and 16GB RAM. We also have a VM for power users who require more than 16GB RAM, which has 32GB RAM and 8 processors. Currently fewer than 3% of Add Health users in the UNC SRW need the higher-capacity VM.


How much storage space does each user have on the UNC SRW?

We are not currently limiting the amount of storage each user may use. Because the storage devices use deduplication, our current footprint on the storage devices is at 4% of our current allocation. Having said that, let us know if you think your storage needs may be inordinately high.


How many users can log into the UNC SRW at the same time?

There is currently no limit to the number of concurrent logins available on the UNC SRW, although our data enclave may have a concurrent login limit set. This limit on our enclave can be increased if we reach our maximum concurrent logins. Hardware and Horizon View software to connect to the hardware will be increased as needed to accommodate researchers’ needs.


When trying to login to the SRW, I received the following error message: “Loading failed: All available desktop sources for this desktop are currently busy. Please try connecting to this desktop again later, or contact your system administrator.” What should I do?

Send an email message to addhealth_contracts@unc.edu, and we will alert the UNC SRW systems administrator so he can either add additional desktops for our use, or remove stale connections to free-up desktop connections.

Please note: if you are running statistical jobs that will run for hours/days and you don’t want to remain on the UNC SRW, you may “disconnect” from the UNC SRW. This will allow your job to keep running. However, if you are not running jobs and you are finished for the day, be sure to logout, not simply disconnect, so your desktop VM will be freed up for another user!


Does the UNC SRW usually reach its capacity limit between 9:00 AM and 5:00 PM during weekdays?

The current high-water mark for concurrent logins is well below our capacity. However, as we add more Add Health researchers to the UNC SRW, we do expect to see a higher utilization during normal business hours. Since researchers throughout the world are currently using the UNC SRW to analyze the Add Health data, “normal business hours” for you may be different than those of researchers in different time zones. Our plan is to keep the hardware at a level to support our researchers, no matter the time of day. However, in order to help us with this, we do ask that you logout, rather than simply disconnecting, when you are not using the UNC SRW and you are not running jobs that may take hours/days to run.


We have many researchers on our contract who are using Add Health. Will you set up the virtual desktop for each of them? 

Each researcher on the contract will receive a UNC ID and will be able to remote into a virtual desktop. Each desktop for your contract will have access to your shared data drive and each will have personal storage space for your analysis.


What is the base software on the remote compute server include? Does the UNC SRW have statistical software like Stata, R and SAS?

Software on the UNC SRW is listed under section C of this link, and does include Stata, R, SAS, MatLab, and a 10-concurrent use license for MPlus (https://help.rc.unc.edu/grant-information/). Most any software you need that is not on the UNC SRW can be purchased (we will invoice each researcher individually and annually for each license needed) and installed by the UNC systems administrator. You can ask the Add Health contracts manager (addhealth_contracts@unc.edu) about any software you need that is not on this list.

UNC SRW Web page: https://help.rc.unc.edu/grant-information/

Section C shows available software (some software versions may be out of date on this website: we will install the latest versions of applications, but there may be a slight delay for testing before they are made available).


Can I use the statistical software on my laptop to analyze the data on the SRW remote desktop?

No. You use your laptop to remote into the UNC SRW. All analysis is done on the UNC SRW virtual desktop.


Can researchers request administrative rights to install software packages to UNCs SRW? Includes R Studio and extensions (open-source).

No, but the UNC SRW systems administrator is happy to install software applications for which researchers purchase valid licenses (see below for how to do this), and the Add Health systems administrator has permission to install updates to software, such as R Studio and Stata DO files. Just let us know what you need and we’ll help facilitate the installation/update of the software.


If I cannot install my own software, is there a way for us to purchase additional licenses through UNC to install for the research group? This software includes, but is not limited to, SPSS, HLM, Stat Transfer, and NVivo.

Yes. You will request your special software from us, and we will let you know if the software vendor allows us to install their application in our virtual environment. If your requested software is available to be installed on the SRW, we will invoice each researcher individually and annually for each license needed. Once we receive payment, we will purchase the license on your behalf (we have to maintain chain-of-custody in order to pass an audit by the software vendors) and give the requesting-researcher access to the software in the SRW. Note the software that is already available on the SRW (section C of this link: https://help.rc.unc.edu/grant-information/)


Can I download my command and output files to my computer??

No. When you need statistical summaries or copies of your statistical code, you should send an email request to addhealth_contracts@unc.edu and give specific file locations for what you need. An Add Health staff member will fulfill your request within 24-48 hours, so you will need to plan ahead.


Can command, log, and data files be copied between different SRW folders?

The original data files should not be copied out of their original folder. You may share your interim data sets, as well as command and log files, with others on your research team, but that is best done by having a shared working directory, rather than by copying the files to other directories. Each contract will have a work directory set up where researchers will each have a directory for their work, as well as a shared directory to share files with co-workers.


Once we commit to switching to the UNC SRW, we cannot keep our local, previously approved, system anymore. Is that correct?

This is correct. Each contract is allowed the use of one copy of the data. You will need to securely erase the data from your environment once you are set up to use the UNC SRW. We will work with you so that the transition to the UNC SRW will not negatively impact your research.


Are support services available to UNC’s SRW end-users and what types of services are provided?

Yes, users could call the UNC Help Desk for help with logging in and password changes. However, our preference, to limit the load on the Help Desk from non-UNC researchers, is that each contract provide a technically savvy systems administrator to learn these tasks and to be the tier one helpdesk for your questions. That will add a layer to getting answers to your questions, but in the long run will allow us to continue providing this resource for free.

Add Health staff will give you documentation for installing the Horizon View client, as well as helping you get your UNC account set up and ready to use in the UNC SRW.

You may ask your questions by sending email to addhealth_contracts@unc.edu, and we will answer your questions, as well as continue to update this FAQ as others ask questions we did not anticipate.


In a prior email, you wrote regarding Wave V, “…please note that we are not approving current users on a file server, as a general rule, to work remotely.” Do we interpret this to mean that when Wave VI is released the remote compute server that we host will not be supported?

Yes. Our current goal is to transition off of all researchers’ systems by the time we release Wave VI data in 2025, and use the UNC SRW as the primary compute platform for the Add Health data.


Do researchers seem to like the UNC SRW, or are there limitations that frustrate users?

We are trying to remove all possible roadblocks to utilizing the UNC SRW for our Add Health researchers. If you encounter anything in your experience that frustrates you, please email us so we can try to fix the problem! In the meantime, it is our understanding that our researchers are happy with their experience using the UNC SRW. Below are just a couple of comments we’ve received to date:

  • “… it has been wonderful to use — easy, intuitive, and reliable. I am so glad you helped us set up this option.”
  • “Everyone on the team has been very responsive! They send the output very quickly. Very communicative…There has been a level of collegiality that is unparalleled. Everyone on the UNC team has been great. Excited to be part of the SRW!”