05-30-2017 05:05 PM
I'm trying to generate a report of cross-tabulated frequencies. However, one of the categorical variables (language) has a ton of different options. I would like a way for the report to only show the top 4 languages and group the rest into an "other" category. How would I do this? Here's my code so far:
PROC REPORT data= nodup_list nowd missing; column motherdominantlanguage have_phone, (N pctn) Total; define motherdominantlanguage / group "Language" order=freq descending; define have_phone / across order = external "Phone Status"; define n / format = 8. "N" ; define pctn / 'Percent' format=percent7.1 ; rbreak after / dol skip summarize ; compute total; Total = sum(_c2_,_c4_); endcomp; title2 "Mother's Language Frequency"; RUN;
And here's a sample of my current output:
I'd really like to only have 5 language rows: top 4 languages and an other category!
05-30-2017 05:44 PM
Do the crosstab frequencies first (PROC FREQ), writing two intermediate data sets (NEED1 for crosstabs, and NEED2 for oneway freqs of motherdominantlanguage). Then left join the top 4 from need2 with need1 to make NEED. NEED will have the 8 cells of interest plus a frequency variable COUNT.
proc freq nodup_list order=freq noprint ; tables motherdominantlanguage*have_phone / out=need1 (drop=percent) sparse missing ; tables motherdominantlanguage / out=need2 ; run; proc sql; create table need as select n1.* from need2 (obs=4) as n2 left join need1 as n1 on n2.motherdominantlanguage=n1.motherdominantlanguage; quit;
AT this point you can run your proc report against dataset NEED, but add a
05-31-2017 10:49 AM - edited 06-01-2017 11:50 AM
Just as the "obs=4" data set name option for need2 allows joining only on the 4 most frequent languages, you can also use the "firstobs=5" data set name option to perform a join on the remaining languages. Create a table NEED3 from this join, summing up the value of count for each have_phone value ("group by have_phone"), and creating a lable for that collection of languages ("other"=motherdominantlanguage).
Then you can do a union of NEED3 with the join produced using "obs=4":
proc sql noprint; create table need3 as select 'Other' as motherdominantlanguage,n1.have_phone,sum(n1.count) as count from need2 (firstobs=5) as n2 /* was "firstobs=2" - editted to "firstobs=5" */ left join need1 as n1 on n2.motherdominantlanguage=n1.motherdominantlanguage group by have_phone; create table need as select n1.* from need2 (obs=4) as n2 left join need1 as n1 on n2.motherdominantlanguage=n1.motherdominantlanguage outer union select * from need3; quit;