Hello,
I'm working on NHANES 2011-2018 complex survey dataset and I've been coding all week, so it's possible I'm just not understanding a potential simple mistake I made. I recoded several variables into categories--for example, race and age. Below is what I coded (this is after concatenating datasets) as an example:
if race=3 then raceCat=1;
else if race=4 then raceCat=2;
else if race=6 then raceCat=3;
else if race=1 or race=2 then raceCat=4;
else if race=7 then raceCat=5;
I am now trying to check for normality among some of my variables. I am using PROC UNIVARIATE for this. Below is the code:
proc sort;
by gender racecat;
run;
PROC UNIVARIATE data=datasetn plot normal;
where age >= 20;
by gender and racecat;
VAR waistcirc;
freq wt8yr_ng; *This is the weighting variable;
FORMAT gender SEXFMT. racecat RACEFMT. ;
title "Distribution of waist circumference gender and race: NHANES 2011-2018";
run;
I noticed that in the output, the generated results are not going through all combinations of gender and race categories. For this particular code, only gender 1 (male) and race category 1 (Non-Hispanic White) were generated.
[The screenshot is for the same program, but also includes 'age categories' in the by statement. As you can see, the program is only selecting one age category - 20 to 39 years old and there are no other category combination results after this 1 combo.].
I have also noticed the same problem when I ran a simple PROC FREQ procedure cross tabulating with a by statement -- only the first category of the variable in the by statement is used and the rest are ignored. Is there something I need to change in my settings? I'm very confused about why this is occurring.
Thank you in advance for your help!
Show your log for the Proc Univariate.
I bet you have something about a variable named AND in the log. Unless you actually have and mean to use a variable named AND on the By statement.
I will bet a small stack of $$$ that when you read your LOG that there is an ERROR in your proc univariate code. If you use AND in other procedures likely the same error.
394 proc sort data=sashelp.class out=work.class; 395 by sex age; 396 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS has 19 observations and 5 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 397 398 proc univariate data=work.class; 399 by sex and age; ERROR: Variable AND not found. 400 var height weight; 401 run;
If you actually have a variable named AND don't use that keyword on the by unless you mean it.
I would expect your BY statement to look like:
By gender racecat;
exactly the same as the By statement in Proc Sort. You do not use AND on a by statement to include multiple variables.
And your Univariate code does not include agecat anywhere. Showing an "example" that does not come from your code is not helpful and can be extremely misleading.
@akbarlam wrote:
Hello,
I'm working on NHANES 2011-2018 complex survey dataset and I've been coding all week, so it's possible I'm just not understanding a potential simple mistake I made. I recoded several variables into categories--for example, race and age. Below is what I coded (this is after concatenating datasets) as an example:
if race=3 then raceCat=1; else if race=4 then raceCat=2; else if race=6 then raceCat=3; else if race=1 or race=2 then raceCat=4; else if race=7 then raceCat=5;
I am now trying to check for normality among some of my variables. I am using PROC UNIVARIATE for this. Below is the code:
proc sort; by gender racecat; run; PROC UNIVARIATE data=datasetn plot normal; where age >= 20; by gender and racecat; VAR waistcirc; freq wt8yr_ng; *This is the weighting variable; FORMAT gender SEXFMT. racecat RACEFMT. ; title "Distribution of waist circumference gender and race: NHANES 2011-2018"; run;
I noticed that in the output, the generated results are not going through all combinations of gender and race categories. For this particular code, only gender 1 (male) and race category 1 (Non-Hispanic White) were generated.
[The screenshot is for the same program, but also includes 'age categories' in the by statement. As you can see, the program is only selecting one age category - 20 to 39 years old and there are no other category combination results after this 1 combo.].
I have also noticed the same problem when I ran a simple PROC FREQ procedure cross tabulating with a by statement -- only the first category of the variable in the by statement is used and the rest are ignored. Is there something I need to change in my settings? I'm very confused about why this is occurring.
Thank you in advance for your help!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.