Hi all,
I'm trying to generate a report of cross-tabulated frequencies. However, one of the categorical variables (language) has a ton of different options. I would like a way for the report to only show the top 4 languages and group the rest into an "other" category. How would I do this? Here's my code so far:
PROC REPORT data= nodup_list nowd missing;
column motherdominantlanguage have_phone, (N pctn) Total;
define motherdominantlanguage / group "Language" order=freq descending;
define have_phone / across order = external "Phone Status";
define n / format = 8. "N" ;
define pctn / 'Percent' format=percent7.1 ;
rbreak after / dol skip summarize ;
compute total;
Total = sum(_c2_,_c4_);
endcomp;
title2 "Mother's Language Frequency";
RUN;
And here's a sample of my current output:
I'd really like to only have 5 language rows: top 4 languages and an other category!
Thanks!
Do the crosstab frequencies first (PROC FREQ), writing two intermediate data sets (NEED1 for crosstabs, and NEED2 for oneway freqs of motherdominantlanguage). Then left join the top 4 from need2 with need1 to make NEED. NEED will have the 8 cells of interest plus a frequency variable COUNT.
proc freq nodup_list order=freq noprint ;
tables motherdominantlanguage*have_phone
/ out=need1 (drop=percent) sparse missing ;
tables motherdominantlanguage
/ out=need2 ;
run;
proc sql;
create table need as select n1.* from
need2 (obs=4) as n2
left join
need1 as n1
on n2.motherdominantlanguage=n1.motherdominantlanguage;
quit;
AT this point you can run your proc report against dataset NEED, but add a
FREQ COUNT;
statement.
Just as the "obs=4" data set name option for need2 allows joining only on the 4 most frequent languages, you can also use the "firstobs=5" data set name option to perform a join on the remaining languages. Create a table NEED3 from this join, summing up the value of count for each have_phone value ("group by have_phone"), and creating a lable for that collection of languages ("other"=motherdominantlanguage).
Then you can do a union of NEED3 with the join produced using "obs=4":
proc sql noprint;
create table need3 as
select 'Other' as motherdominantlanguage,n1.have_phone,sum(n1.count) as count from
need2 (firstobs=5) as n2 /* was "firstobs=2" - editted to "firstobs=5" */
left join
need1 as n1
on n2.motherdominantlanguage=n1.motherdominantlanguage
group by have_phone;
create table need as select n1.* from
need2 (obs=4) as n2
left join
need1 as n1
on n2.motherdominantlanguage=n1.motherdominantlanguage
outer union
select * from need3;
quit;
Have you considered a custom format?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.