BookmarkSubscribeRSS Feed
akbarlam
Calcite | Level 5

Hello,

I'm working on NHANES 2011-2018 complex survey dataset and I've been coding all week, so it's possible I'm just not understanding a potential simple mistake I made. I recoded several variables into categories--for example, race and age. Below is what I coded (this is after concatenating datasets) as an example:

if race=3 then raceCat=1;
else if race=4 then raceCat=2; 
else if race=6 then raceCat=3;
else if race=1 or race=2 then raceCat=4;
else if race=7 then raceCat=5;

I am now trying to check for normality among some of my variables. I am using PROC UNIVARIATE for this. Below is the code:

proc sort; 
by gender racecat; 
run;

PROC UNIVARIATE data=datasetn plot normal;
where age >= 20;
by gender and racecat;    
VAR waistcirc;    
freq wt8yr_ng; *This is the weighting variable;
FORMAT gender SEXFMT.  racecat RACEFMT. ;    
title "Distribution of waist circumference gender and race: NHANES 2011-2018";           
run;

I noticed that in the output, the generated results are not going through all combinations of gender and race categories. For this particular code, only gender 1 (male) and race category 1 (Non-Hispanic White) were generated.

akbarlam_0-1638518887052.png

[The screenshot is for the same program, but also includes 'age categories' in the by statement. As you can see, the program is only selecting one age category - 20 to 39 years old and there are no other category combination results after this 1 combo.].

 

I have also noticed the same problem when I ran a simple PROC FREQ procedure cross tabulating with a by statement -- only the first category of the variable in the by statement is used and the rest are ignored. Is there something I need to change in my settings? I'm very confused about why this is occurring.

 

Thank you in advance for your help!

1 REPLY 1
ballardw
Super User

Show your log for the Proc Univariate.

I bet you have something about a variable named AND in the log. Unless you actually have and mean to use a variable named AND on the By statement.
I will bet a small stack of $$$ that when you read your LOG that there is an ERROR in your proc univariate code. If you use AND in other procedures likely the same error.

394  proc sort data=sashelp.class out=work.class;
395     by sex age;
396  run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 19 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


397
398  proc univariate data=work.class;
399     by sex and age;
ERROR: Variable AND not found.
400     var height weight;
401  run;

If you actually have a variable named AND don't use that keyword on the by unless you mean it.

 

I would expect your BY statement to look like:

By gender racecat;

exactly the same as the By statement in Proc Sort. You do not use AND on a by statement to include multiple variables.

And your Univariate code does not include agecat anywhere. Showing an "example" that does not come from your code is not helpful and can be extremely misleading.

 


@akbarlam wrote:

Hello,

I'm working on NHANES 2011-2018 complex survey dataset and I've been coding all week, so it's possible I'm just not understanding a potential simple mistake I made. I recoded several variables into categories--for example, race and age. Below is what I coded (this is after concatenating datasets) as an example:

if race=3 then raceCat=1;
else if race=4 then raceCat=2; 
else if race=6 then raceCat=3;
else if race=1 or race=2 then raceCat=4;
else if race=7 then raceCat=5;

I am now trying to check for normality among some of my variables. I am using PROC UNIVARIATE for this. Below is the code:

proc sort; 
by gender racecat; 
run;

PROC UNIVARIATE data=datasetn plot normal;
where age >= 20;
by gender and racecat;    
VAR waistcirc;    
freq wt8yr_ng; *This is the weighting variable;
FORMAT gender SEXFMT.  racecat RACEFMT. ;    
title "Distribution of waist circumference gender and race: NHANES 2011-2018";           
run;

I noticed that in the output, the generated results are not going through all combinations of gender and race categories. For this particular code, only gender 1 (male) and race category 1 (Non-Hispanic White) were generated.

akbarlam_0-1638518887052.png

[The screenshot is for the same program, but also includes 'age categories' in the by statement. As you can see, the program is only selecting one age category - 20 to 39 years old and there are no other category combination results after this 1 combo.].

 

I have also noticed the same problem when I ran a simple PROC FREQ procedure cross tabulating with a by statement -- only the first category of the variable in the by statement is used and the rest are ignored. Is there something I need to change in my settings? I'm very confused about why this is occurring.

 

Thank you in advance for your help!


 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 300 views
  • 3 likes
  • 2 in conversation