BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi All, 

I'm trying to estimate the proportion of obesity (0,1 dummy variable) among my study population by a variable for a duration category(durcat) and other covariates. The first row in the table is meant to be the overall proportion of obesity by durcat levels. However, outputs from proc tabulate and proc freq do NOT match. So puzzling.I'm only interested in data among people belong to (agecat=5).

 

Why would it happen? Please help if you see what I'm doing wrong here. 

 

SAS Output from:

proc freq data=a;
tables durcat/list;
where agecat=5;
run;

durcat Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 160207 17.31 160207 17.31
2 86663 9.36 246870 26.67
3 73166 7.90 320036 34.57
4 279301 30.17 599337 64.74
5 326435 35.26 925772

100.00

 

 

Proc Tabulate output and proc freq output from below clodes:

proc tab vs proc freq.png

 

proc tabulate data=a order=internal;
var ob;
class durcat agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months 
      migrant_status hb_cat;
tables (All agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months 
        migrant_status hb_cat), 
	   (N colpctn*f=5.1) ob*(durcat)*(mean*f=percent7.1)/nocellmerge printmiss;
where agecat=5; run; proc freq data=a; tables ob/list; where durcat=5 and agecat=5; run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

PROC TABULATE is automatically removing some of the observations.  More specifically, any time any of the CLASS variables has a missing value (whether or not it is used in your table) the observation gets removed from the calculations.  You can change that by adding the MISSING option on the PROC statement.

View solution in original post

5 REPLIES 5
Astounding
PROC Star

PROC TABULATE is automatically removing some of the observations.  More specifically, any time any of the CLASS variables has a missing value (whether or not it is used in your table) the observation gets removed from the calculations.  You can change that by adding the MISSING option on the PROC statement.

Cruise
Ammonite | Level 13

@Astounding thank you, 

Using missing function in proc tab solved ignoring missing option and improved the descriptive numbers for the other covariates, such as migrant-status for example. However, I still get different outputs from proc tabulate anf proc freq on my main variable obesity by durcat. Any idea or hints? Typo in the screenshot. I meant with missing not within missing. 

 

IMPROVED.png

proc tabulate data=a missing order=internal;
var ob;
class durcat agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months
migrant_status hb_cat;
tables (All agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months
migrant_status hb_cat),
(N colpctn*f=5.1) ob*(durcat)*(mean*f=percent7.1)/nocellmerge printmiss;
format assist $assist. race1 race. birth_wt birth_wt. bf_months bf_months.;
where agecat=5 and geography not in ('99','nyc');
run;

proc freq data=a;
tables ob/list;
where durcat=5;
run;

Astounding
PROC Star

I'm not sure where the difference is coming from, but I know where to look.  Forget about the mean and look at N. PROC FREQ is processing a total of 326,435 observations, while PROC TABULATE is working with 372,667 observations.

Cruise
Ammonite | Level 13
Hmmm, good catch. Thanks
ballardw
Super User

@Cruise wrote:

Hi All, 

I'm trying to estimate the proportion of obesity (0,1 dummy variable) among my study population by a variable for a duration category(durcat) and other covariates. The first row in the table is meant to be the overall proportion of obesity by durcat levels. However, outputs from proc tabulate and proc freq do NOT match. So puzzling.I'm only interested in data among people belong to (agecat=5).

 

Why would it happen? Please help if you see what I'm doing wrong here. 

 

SAS Output from:

proc freq data=a;
tables durcat/list;
where agecat=5;
run;

durcat Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 160207 17.31 160207 17.31
2 86663 9.36 246870 26.67
3 73166 7.90 320036 34.57
4 279301 30.17 599337 64.74
5 326435 35.26 925772

100.00

 

 

Proc Tabulate output and proc freq output from below clodes:

proc tab vs proc freq.png

 

proc tabulate data=a order=internal;
var ob;
class durcat agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months 
      migrant_status hb_cat;
tables (All agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months 
        migrant_status hb_cat), 
	   (N colpctn*f=5.1) ob*(durcat)*(mean*f=percent7.1)/nocellmerge printmiss;
where agecat=5; run; proc freq data=a; tables ob/list; where durcat=5 and agecat=5; run;

 


Note that you said output from:

proc freq data=a;
tables durcat/list;
where agecat=5;
run;

And then posted code as:

proc freq data=a;
tables ob/list;
where durcat=5 and agecat=5;
run;


Any time you use a WHERE clause to reduce data it is a good idea to do so in the other procedure so they both start with the same base records.

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2245 views
  • 3 likes
  • 3 in conversation