I am trying to write code to score the Global Physical Activity Questionnaire from the NHANES 2017-2018 cycle. My end goal is to determine if vitamin D level is a risk factor for depression and if physical activity has a modifying effect on depression with vitamin D is low. I am using the PAQ_J dataset to determine the score. I have been referring to this community post to inform my code: How to code Physical Activity in NHANES - SAS Support Communities. When I run my own code, I am not getting any people with 0=no physical activity and I know that cannot be correct. What am I doing wrong?
Here is my code:
Data project2;
set project2;
totPA=sum((PAQ610*PAD615*8),(PAQ625*PAD630*4),(PAQ640*PAD645*4),(PAQ655*PAD660*8),(PAQ670*PAD675*4))/60;
label totPA='Physical activity total (MET-h/wk)';
if totPA=0 then PAintensity=0;
else if 1<=totPA<=48 then PAintensity=1;
else if totPA>48 then PAintensity=2;
else PAintensity=.;
run;
I will attach my dataset to this post as well. I am also not sure if I am doing enough to get true values of PA, instead of missing values.
This is my log:
2008 Data project2;
2009 set project2;
2010 totPA=sum((PAQ610*PAD615*8),(PAQ625*PAD630*4),(PAQ640*PAD645*4),(PAQ655*PAD660*8),(PAQ670*PAD675*4))/60;
2011 label totPA='Physical activity total (MET-h/wk)';
2012 if totPA=0 then PAintensity=0;
2013 else if 1<=totPA<=48 then PAintensity=1;
2014 else if totPA>48 then PAintensity=2;
2015 else PAintensity=.;
2016 run;
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1497 at 2010:7 4256 at 2010:18 4256 at 2010:25 3266 at 2010:36 3266 at 2010:43 4253 at 2010:54
4253 at 2010:61 4279 at 2010:72 4279 at 2010:79 3383 at 2010:90 3383 at 2010:97 1497 at 2010:101
NOTE: There were 5569 observations read from the data set WORK.PROJECT2.
NOTE: The data set WORK.PROJECT2 has 5569 observations and 28 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
Please help!
Thank you for showing your log, which may be answering your question. In particular, it includes the note
NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1497 at 2010:7 4256 at 2010:18 4256 at 2010:25 3266 at 2010:36 3266 at 2010:43 4253 at 2010:54 4253 at 2010:61 4279 at 2010:72 4279 at 2010:79 3383 at 2010:90 3383 at 2010:97 1497 at 2010:101
which says that at least 4,279 of your 5,569 observations have a missing value in columns 72 and 79. Using the statement copied below, I suspect that would be variables PAQ655 and/or PAD660.
2010 totPA=sum((PAQ610*PAD615*8),(PAQ625*PAD630*4),(PAQ640*PAD645*4),(PAQ655*PAD660*8),(PAQ670*PAD675*4))/60;
which means in turn that totPA will get assigned a missing value.
The raises the question of whether you should treat missing values for any of the PAQ variables in line 2010 as zeroes. After all, totPA can't be zero unless every one of the PAQ variables in the calculation are at zero. Do those variables ever have a zero value?
If not, perhaps you want to treat missing values as zero (look at the NHANES docs to see whether this is justified). If so, you could do something like
totPA=sum( (sum(0,PAQ610)*sum(0,PAD615)*8)
, (sum(0,PAQ625)*sum(0,PAD630)*4)
, (sum(0,PAQ640)*sum(0,PAD645)*4)
, (sum(0,PAQ655)*sum(0,PAD660)*8)
, (sum(0,PAQ670)*sum(0,PAD675)*4)
)
/60 ;
But I repeat, make sure, for your purposes, that it is valid to treating missing values as zeroes.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.