BookmarkSubscribeRSS Feed
SamiMajdalanyMD
Calcite | Level 5

Hi everyone,

 

Does anyone have experience using the BRFF data (by the CDC) on SAS Studio? I'm able to upload the data but trying to figure out how to organize it/clean it up?

 

Any tips would be great!

 

Thank you,

 

Sami

3 REPLIES 3
ballardw
Super User

I suggest going to the CDC website and grabbing the SAS code that should be there for analysis.

 

If you go to https://www.cdc.gov/brfss/annual_data/annual_2020.html

you will find things like the "Code book" for what is there, Proc Format code to create useful formats, code with format assignment statements, data files in a couple of formats and code to make them local SAS datasets and a link to a document named Complex-Smple-Weights-Prep-Module-Data-Analysis-2020-508.pdf that has some example code for basic analysis using SAS as well as what is needed to prepare multiple year data sets

 

What do you think you need to "clean up" in the data?

 

Internet searches with "BRFSS SAS" will turn up more results than you likely want to deal with though if you are using recent data you wouldn't want to use things more than 5 or 6 years old due to the changes in BRFSS weighting schemes (and options in the Survey Procedures for analysis).

 

Experience with BRFSS: 7 years working as contractor on the collection side and 15+ years using our state data sets.

 

SamiMajdalanyMD
Calcite | Level 5

Hi! Thanks for your fast reply, i'm looking at the codebook right now. 

 

By 'clean up' I mean get rid of variables that I'm not interested in. For example, I'm only interested in the all of the PSA questions, sex/gender, and age between 40-75. All the other variables I won't be using. 

 

I've unchecked them from the side, and I see them disappear from the table. But there's still so many rows?!

 

 

Sami

ballardw
Super User

BRFSS is one of the data sets that working with a point-and-click interface can be terrible annoying because there are hundreds of variables.

If you feel a subset of the data is really needed to work with so you needed keep wading through those long lists it is likely easier to use some code with a KEEP statement:

 

Data subset;
    set BRFSS2020;
    keep <weighting variables> <strata variable(s)>
     <psa variables> sex age;
run;

Replace the bits between <> with an actual list of variables. You must have the weighting variables and strata/cluster variables or any analysis is pretty much invalid and one reason to reference the code book. You will also want to pay attention in the SURVEY procedures how to specify the BRFSS weighting scheme.

I would suggest leaving AGE until you get more familiar with the data. You may also find more than one "age" variable because of respondets refusing to answer the question and there may be an "imputed" age variable to use to have more records for use.

I would typically use a custom format for the age variable and assign the values I wasn't interested to as a category of "not of interest". That avoids removing records (which I think is related to your rows comment). The Formatted values are used by the analysis procedures so any results associated with those values would appear with "not of interest" in tables so you could ignore them. IF at some time later you were asked about the ages greater than 75 (assuming such respondents and they are asked the questions of interest) you would only need to create a different format and rerun analysis with that format in place. If the records are actually removed then you would have to go back to pull the extra records.

 

In practice, I never subset BRFSS data for analysis but I have been writing code for a long time and find the point-and-click tedious in general and have code example for many tasks so I am seldom starting an analysis from scratch. So I just replace the names of the variables and data set name.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 869 views
  • 2 likes
  • 2 in conversation