Hi all,
I'm trying to generate a report of cross-tabulated frequencies. However, one of the categorical variables (language) has a ton of different options. I would like a way for the report to only show the top 4 languages and group the rest into an "other" category. How would I do this? Here's my code so far:
PROC REPORT data= nodup_list nowd missing;
column motherdominantlanguage have_phone, (N pctn) Total;
define motherdominantlanguage / group "Language" order=freq descending;
define have_phone / across order = external "Phone Status";
define n / format = 8. "N" ;
define pctn / 'Percent' format=percent7.1 ;
rbreak after / dol skip summarize ;
compute total;
Total = sum(_c2_,_c4_);
endcomp;
title2 "Mother's Language Frequency";
RUN;
And here's a sample of my current output:
I'd really like to only have 5 language rows: top 4 languages and an other category!
Thanks!
Do the crosstab frequencies first (PROC FREQ), writing two intermediate data sets (NEED1 for crosstabs, and NEED2 for oneway freqs of motherdominantlanguage). Then left join the top 4 from need2 with need1 to make NEED. NEED will have the 8 cells of interest plus a frequency variable COUNT.
proc freq nodup_list order=freq noprint ;
tables motherdominantlanguage*have_phone
/ out=need1 (drop=percent) sparse missing ;
tables motherdominantlanguage
/ out=need2 ;
run;
proc sql;
create table need as select n1.* from
need2 (obs=4) as n2
left join
need1 as n1
on n2.motherdominantlanguage=n1.motherdominantlanguage;
quit;
AT this point you can run your proc report against dataset NEED, but add a
FREQ COUNT;
statement.
Just as the "obs=4" data set name option for need2 allows joining only on the 4 most frequent languages, you can also use the "firstobs=5" data set name option to perform a join on the remaining languages. Create a table NEED3 from this join, summing up the value of count for each have_phone value ("group by have_phone"), and creating a lable for that collection of languages ("other"=motherdominantlanguage).
Then you can do a union of NEED3 with the join produced using "obs=4":
proc sql noprint;
create table need3 as
select 'Other' as motherdominantlanguage,n1.have_phone,sum(n1.count) as count from
need2 (firstobs=5) as n2 /* was "firstobs=2" - editted to "firstobs=5" */
left join
need1 as n1
on n2.motherdominantlanguage=n1.motherdominantlanguage
group by have_phone;
create table need as select n1.* from
need2 (obs=4) as n2
left join
need1 as n1
on n2.motherdominantlanguage=n1.motherdominantlanguage
outer union
select * from need3;
quit;
Have you considered a custom format?
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.