Solved: Proc Tabulate doesn't match Proc Freq

Cruise · Posted 10-05-2018 07:46 AM

Hi All,

I'm trying to estimate the proportion of obesity (0,1 dummy variable) among my study population by a variable for a duration category(durcat) and other covariates. The first row in the table is meant to be the overall proportion of obesity by durcat levels. However, outputs from proc tabulate and proc freq do NOT match. So puzzling.I'm only interested in data among people belong to (agecat=5).

Why would it happen? Please help if you see what I'm doing wrong here.

SAS Output from:

proc freq data=a;
tables durcat/list;
where agecat=5;
run;

durcat	Frequency	Percent	Cumulative Frequency	Cumulative Percent
1	160207	17.31	160207	17.31
2	86663	9.36	246870	26.67
3	73166	7.90	320036	34.57
4	279301	30.17	599337	64.74
5	326435	35.26	925772	100.00

Proc Tabulate output and proc freq output from below clodes:

proc tab vs proc freq.png

proc tabulate data=a order=internal;
var ob;
class durcat agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months 
      migrant_status hb_cat;
tables (All agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months 
        migrant_status hb_cat), 
	   (N colpctn*f=5.1) ob*(durcat)*(mean*f=percent7.1)/nocellmerge printmiss;
where agecat=5;
run; 

proc freq data=a;
tables ob/list;
where durcat=5 and agecat=5;
run;

Astounding · Posted 10-05-2018 07:56 AM

PROC TABULATE is automatically removing some of the observations. More specifically, any time any of the CLASS variables has a missing value (whether or not it is used in your table) the observation gets removed from the calculations. You can change that by adding the MISSING option on the PROC statement.

View solution in original post

Astounding · Posted 10-05-2018 07:56 AM

PROC TABULATE is automatically removing some of the observations. More specifically, any time any of the CLASS variables has a missing value (whether or not it is used in your table) the observation gets removed from the calculations. You can change that by adding the MISSING option on the PROC statement.

Cruise · Posted 10-05-2018 08:12 AM

@Astounding thank you,

Using missing function in proc tab solved ignoring missing option and improved the descriptive numbers for the other covariates, such as migrant-status for example. However, I still get different outputs from proc tabulate anf proc freq on my main variable obesity by durcat. Any idea or hints? Typo in the screenshot. I meant with missing not within missing.

proc tabulate data=a missing order=internal;
var ob;
class durcat agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months
migrant_status hb_cat;
tables (All agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months
migrant_status hb_cat),
(N colpctn*f=5.1) ob*(durcat)*(mean*f=percent7.1)/nocellmerge printmiss;
format assist $assist. race1 race. birth_wt birth_wt. bf_months bf_months.;
where agecat=5 and geography not in ('99','nyc');
run;

proc freq data=a;
tables ob/list;
where durcat=5;
run;

Astounding · Posted 10-05-2018 09:03 AM

I'm not sure where the difference is coming from, but I know where to look. Forget about the mean and look at N. PROC FREQ is processing a total of 326,435 observations, while PROC TABULATE is working with 372,667 observations.

Cruise · Posted 10-05-2018 09:13 AM

Hmmm, good catch. Thanks

ballardw · Posted 10-05-2018 11:17 AM

@Cruise wrote:

Hi All,

I'm trying to estimate the proportion of obesity (0,1 dummy variable) among my study population by a variable for a duration category(durcat) and other covariates. The first row in the table is meant to be the overall proportion of obesity by durcat levels. However, outputs from proc tabulate and proc freq do NOT match. So puzzling.I'm only interested in data among people belong to (agecat=5).

Why would it happen? Please help if you see what I'm doing wrong here.

SAS Output from:

proc freq data=a;
tables durcat/list;
where agecat=5;
run;

durcat Frequency Percent Cumulative
Frequency Cumulative
Percent

1 160207 17.31 160207 17.31

2 86663 9.36 246870 26.67

3 73166 7.90 320036 34.57

4 279301 30.17 599337 64.74

5 326435 35.26 925772
100.00

Proc Tabulate output and proc freq output from below clodes:
proc tabulate data=a order=internal;
var ob;
class durcat agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months 
      migrant_status hb_cat;
tables (All agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months 
        migrant_status hb_cat), 
	   (N colpctn*f=5.1) ob*(durcat)*(mean*f=percent7.1)/nocellmerge printmiss;
where agecat=5;
run; 

proc freq data=a;
tables ob/list;
where durcat=5 and agecat=5;
run;

Note that you said output from:

proc freq data=a;
tables durcat/list;
where agecat=5;
run;

And then posted code as:

proc freq data=a;
tables ob/list;
where durcat=5 and agecat=5;
run;

Any time you use a WHERE clause to reduce data it is a good idea to do so in the other procedure so they both start with the same base records.

Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Registration is open

Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Re: Proc Tabulate doesn't match Proc Freq

Registration is open

SAS Training: Just a Click Away