BookmarkSubscribeRSS Feed
MiraR
Calcite | Level 5

Hello,

 

So I am working on my senior thesis and I am trying to analyze the bowel health's relationship to vitamin D status in pregnant women by analyzing the NHANES dataset. I am so confused, having no background in SAS 9.4 whatsoever. I did proccontent, procfreq, and procmeans and have loaded all the datasets of interest into SAS along with having created libraries for each year that I am interested in analyzing. Currently, I am trying to make my "Table 1" for 05-06 data library. I am trying to weed out only the pregnant women's data from each dataset of interest (Bhq, Demo, UCpreg, Vid) and create a consolidated dataset, which I then want to analyze for frequency and means etc. I have tried different code but nothing seems to be working and I am not sure how to do this. Could anyone help me please? I did not find the NHANES tutorials useful for this. I just want to use the pregnant women's data from each of the datasets and create a consolidated dataset which I can then analyze further. 

 

Thank you so much in advance!

5 REPLIES 5
Astounding
PROC Star

To get started ...

 

If you had all the data printed out in front of you, which information would you use to determine if you were looking at a pregnant woman?  Be very specific about the name(s) of the data set(s), name(s) and value(s) of the variables.

 

Down the road ...

 

It may not be a good idea to combine all the data up front.  For example, if you had AGE as a demographic variable in one data set, and a second data set with multiple visits per woman, by combining them you would be repeating the AGE value for each visit.  That combined data set would not be useful for examining the distribution of AGE or its relationship to other variables, once it has been repeated for each visit.

MiraR
Calcite | Level 5

Thank you for your response! How would you suggest I proceed, then? Should I merge the datasets of the years together? I have made a table of all the datasets I want to include with all the variable names

Astounding
PROC Star

As you gain experience solving questions, you will gain a feel for the best way to proceed.  There is no one answer that fits all situations.

 

I would suggest starting with the first question ... how do you identify your population?  Generate a list that identifies the population you are studying, and learn how to subset your tables to include that population only.  

 

Take one table, subset the population, and store the result in a different folder.  Then we can look at how to automate that process for all tables (if there are a lot of them).

 

When subsetting, don't replace any tables.  Create new ones in a separate folder.  

 

Once the subsetting has taken place, subsequent processing should run faster because of the reduced number of observations.

Reeza
Super User

@MiraR wrote:

Hello,

 

So I am working on my senior thesis and I am trying to analyze the bowel health's relationship to vitamin D status in pregnant women by analyzing the NHANES dataset. I am so confused, having no background in SAS 9.4 whatsoever. I did proccontent, procfreq, and procmeans and have loaded all the datasets of interest into SAS along with having created libraries for each year that I am interested in analyzing. Currently, I am trying to make my "Table 1" for 05-06 data library. I am trying to weed out only the pregnant women's data from each dataset of interest (Bhq, Demo, UCpreg, Vid) and create a consolidated dataset, which I then want to analyze for frequency and means etc. I have tried different code but nothing seems to be working and I am not sure how to do this. Could anyone help me please? I did not find the NHANES tutorials useful for this. I just want to use the pregnant women's data from each of the datasets and create a consolidated dataset which I can then analyze further. 

 

Thank you so much in advance!


Have you taken the free SAS e-course? I think it's a two day course and after that you should have a pretty decent handle on how to answer these questions?

ballardw
Super User

NHANES uses a complex sampling structure. As such simple filtering of records will likely cause serious issues with the reliability of any estimates and you would need to use the Survery Procs, Surveyreg, Surveymeans, SurveyFreq or Surveylogistic to incorporate the sample design information.

 

Better in these cases is generally to add indicator variables to show the membership(s) of the subjects and then use those indicator variables as CLASS variables. Proper model or domain statements would then have results for Pregnant and Nonpregnant, assuming you did a two level indicator. Then just use the results for the classes of interest.

 

You may want to start with https://wwwn.cdc.gov/nchs/nhanes/tutorials/default.aspx as there are examples of SAS code for many tasks and the SAMPLE section includes specific code for merging and combining NHANES data sets.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2791 views
  • 2 likes
  • 4 in conversation