Hi All,
I'm trying to estimate the proportion of obesity (0,1 dummy variable) among my study population by a variable for a duration category(durcat) and other covariates. The first row in the table is meant to be the overall proportion of obesity by durcat levels. However, outputs from proc tabulate and proc freq do NOT match. So puzzling.I'm only interested in data among people belong to (agecat=5).
Why would it happen? Please help if you see what I'm doing wrong here.
SAS Output from:
proc freq data=a;
tables durcat/list;
where agecat=5;
run;
durcat | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
1 | 160207 | 17.31 | 160207 | 17.31 |
2 | 86663 | 9.36 | 246870 | 26.67 |
3 | 73166 | 7.90 | 320036 | 34.57 |
4 | 279301 | 30.17 | 599337 | 64.74 |
5 | 326435 | 35.26 | 925772 |
100.00
|
Proc Tabulate output and proc freq output from below clodes:
proc tabulate data=a order=internal;
var ob;
class durcat agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months
migrant_status hb_cat;
tables (All agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months
migrant_status hb_cat),
(N colpctn*f=5.1) ob*(durcat)*(mean*f=percent7.1)/nocellmerge printmiss;
where agecat=5;
run;
proc freq data=a;
tables ob/list;
where durcat=5 and agecat=5;
run;
PROC TABULATE is automatically removing some of the observations. More specifically, any time any of the CLASS variables has a missing value (whether or not it is used in your table) the observation gets removed from the calculations. You can change that by adding the MISSING option on the PROC statement.
PROC TABULATE is automatically removing some of the observations. More specifically, any time any of the CLASS variables has a missing value (whether or not it is used in your table) the observation gets removed from the calculations. You can change that by adding the MISSING option on the PROC statement.
@Astounding thank you,
Using missing function in proc tab solved ignoring missing option and improved the descriptive numbers for the other covariates, such as migrant-status for example. However, I still get different outputs from proc tabulate anf proc freq on my main variable obesity by durcat. Any idea or hints? Typo in the screenshot. I meant with missing not within missing.
proc tabulate data=a missing order=internal;
var ob;
class durcat agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months
migrant_status hb_cat;
tables (All agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months
migrant_status hb_cat),
(N colpctn*f=5.1) ob*(durcat)*(mean*f=percent7.1)/nocellmerge printmiss;
format assist $assist. race1 race. birth_wt birth_wt. bf_months bf_months.;
where agecat=5 and geography not in ('99','nyc');
run;
proc freq data=a;
tables ob/list;
where durcat=5;
run;
I'm not sure where the difference is coming from, but I know where to look. Forget about the mean and look at N. PROC FREQ is processing a total of 326,435 observations, while PROC TABULATE is working with 372,667 observations.
@Cruise wrote:
Hi All,
I'm trying to estimate the proportion of obesity (0,1 dummy variable) among my study population by a variable for a duration category(durcat) and other covariates. The first row in the table is meant to be the overall proportion of obesity by durcat levels. However, outputs from proc tabulate and proc freq do NOT match. So puzzling.I'm only interested in data among people belong to (agecat=5).
Why would it happen? Please help if you see what I'm doing wrong here.
SAS Output from:
proc freq data=a;
tables durcat/list;
where agecat=5;
run;
durcat Frequency Percent Cumulative
FrequencyCumulative
Percent1 160207 17.31 160207 17.31 2 86663 9.36 246870 26.67 3 73166 7.90 320036 34.57 4 279301 30.17 599337 64.74 5 326435 35.26 925772 100.00
Proc Tabulate output and proc freq output from below clodes:
proc tabulate data=a order=internal; var ob; class durcat agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months migrant_status hb_cat; tables (All agecat race1 assist fam_size hh_smoking area area1 birth_wt bf_ever bf_months migrant_status hb_cat), (N colpctn*f=5.1) ob*(durcat)*(mean*f=percent7.1)/nocellmerge printmiss;
where agecat=5; run; proc freq data=a; tables ob/list; where durcat=5 and agecat=5; run;
Note that you said output from:
proc freq data=a;
tables durcat/list;
where agecat=5;
run;
And then posted code as:
proc freq data=a; tables ob/list; where durcat=5 and agecat=5; run;
Any time you use a WHERE clause to reduce data it is a good idea to do so in the other procedure so they both start with the same base records.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.