Hi,
I want to create a report from the "LLCP2014" data set from CDC Website (Link: https://www.cdc.gov/brfss/annual_data/annual_2014.html)
The objective of this analysis is to investigate the association between Variable A and Variable B after controlling some other dependent variables. The outcome variable is A and the variable of interest (exposure) is B.
My question is, from this particular data set, if I want to make an analysis between 2 variables after controlling some other variables, what variables should I consider?
For example, I have a sample project on BFRRS 2010 Data:
The objective of this analysis is to investigate the association between diabetes and BMI after controlling for exercise and gender. The outcome variable is diabetes and the variable of interest (exposure) is BMI.
Due the nature of BRFSS data as a self-reported survey I would be very careful about using "exposure" and "outcome".
The data in BRFSS is from complex sampling so you would likely be looking for the various survey procedures: Surveyfreq, Surveymeans, Surveyreg, Surveylogistic and Surveyphreg.
Since most of the questions in the BRFSS surveys have categorical responses, exceptions are age, height and weight and a few others, most of the analysis is done with tools like Surveyfreq and a simple table request like Sex*Question5 (obviously pseudocode) yields counts, rates and confidence intervals for percentage of response to question 5 controlling for Sex. or in your case gender*exercise level * diabetes * bmi level in Surveyfreq. Make sure to request column and row statistics as you may find one of those easier to read/interpret.
However, you want to read the documentation, the code book and download the read and format code provided. You will want to know which values for each question indicate Do not know and/or Refused to answer as you may want to exclude those responses. Also some of the calculated varaible such as related to BMI will have a code that indicates missing values as one or both of height and weight may not have been provided and you definitely want to exclude those from BMI analysis (typically the values are 99, 999, 99.9, 99,9999 etc, changes over time) as if left in they will seriously skew any result. There are also values in the data sets that are things like BMI categories.
The format code file helps make legible answers from the code values and may help understand some of the age group variables provided.
PS. I have worked on-and-off with BRFSS data for over 25 years and this is not a trivial exercise. The appropriate missing data use at a minimum is very important.
And don't be surprised if you get odd results by the time you look at 4 variables. The sample sizes for some of the variables are likely to be very small such as "low BMI" and having diabetes after controlling for gender and exercise.
Sounds like a logistic model. See the example titled "Logistic Modeling with Categorical Predictors" in the PROC LOGISTIC documentation.
Due the nature of BRFSS data as a self-reported survey I would be very careful about using "exposure" and "outcome".
The data in BRFSS is from complex sampling so you would likely be looking for the various survey procedures: Surveyfreq, Surveymeans, Surveyreg, Surveylogistic and Surveyphreg.
Since most of the questions in the BRFSS surveys have categorical responses, exceptions are age, height and weight and a few others, most of the analysis is done with tools like Surveyfreq and a simple table request like Sex*Question5 (obviously pseudocode) yields counts, rates and confidence intervals for percentage of response to question 5 controlling for Sex. or in your case gender*exercise level * diabetes * bmi level in Surveyfreq. Make sure to request column and row statistics as you may find one of those easier to read/interpret.
However, you want to read the documentation, the code book and download the read and format code provided. You will want to know which values for each question indicate Do not know and/or Refused to answer as you may want to exclude those responses. Also some of the calculated varaible such as related to BMI will have a code that indicates missing values as one or both of height and weight may not have been provided and you definitely want to exclude those from BMI analysis (typically the values are 99, 999, 99.9, 99,9999 etc, changes over time) as if left in they will seriously skew any result. There are also values in the data sets that are things like BMI categories.
The format code file helps make legible answers from the code values and may help understand some of the age group variables provided.
PS. I have worked on-and-off with BRFSS data for over 25 years and this is not a trivial exercise. The appropriate missing data use at a minimum is very important.
And don't be surprised if you get odd results by the time you look at 4 variables. The sample sizes for some of the variables are likely to be very small such as "low BMI" and having diabetes after controlling for gender and exercise.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.