Help using Base SAS procedures

PROC REPORT only show top frequencies

Reply
Occasional Contributor
Posts: 7

PROC REPORT only show top frequencies

Hi all,

 

I'm trying to generate a report of cross-tabulated frequencies. However, one of the categorical variables (language) has a ton of different options. I would like a way for the report to only show the top 4 languages and group the rest into an "other" category. How would I do this? Here's my code so far:

 

 

 

PROC REPORT data= nodup_list nowd missing;
 	column motherdominantlanguage have_phone, (N pctn) Total;
	define motherdominantlanguage / group "Language" order=freq descending;
	define have_phone / across order = external "Phone Status"; 
	define n / format = 8. "N" ;
	define pctn / 'Percent' format=percent7.1 ;
	rbreak after / dol skip summarize ; 
	compute total;
 		Total = sum(_c2_,_c4_);
 	endcomp; 
	title2 "Mother's Language Frequency";
RUN;

 

And here's a sample of my current output: 

language.PNG

I'd really like to only have 5 language rows: top 4 languages and an other category!

 

Thanks!

Trusted Advisor
Posts: 1,018

Re: PROC REPORT only show top frequencies

Do the crosstab frequencies first (PROC FREQ), writing two intermediate data sets (NEED1 for crosstabs, and NEED2 for oneway freqs of motherdominantlanguage).  Then left join the top 4 from need2 with need1 to make NEED.  NEED will have the 8 cells of interest plus a frequency variable COUNT.

 

proc freq nodup_list order=freq noprint ;
  tables motherdominantlanguage*have_phone 
    / out=need1 (drop=percent) sparse missing ;
  tables motherdominantlanguage      
    / out=need2 ;
run;

proc sql;
  create table need as select n1.* from
    need2 (obs=4) as n2 
      left join 
    need1 as n1
      on n2.motherdominantlanguage=n1.motherdominantlanguage;
quit;

AT this point you can run your proc report against dataset NEED, but add a

 

 

   FREQ COUNT;

 

statement.

Occasional Contributor
Posts: 7

Re: PROC REPORT only show top frequencies

Thank you so much! That was definitely helpful. Is there a way to combine all the other languages into an "others" category? Thanks again!
Trusted Advisor
Posts: 1,018

Re: PROC REPORT only show top frequencies

[ Edited ]

Just as the "obs=4" data set name option for need2 allows joining only on the 4 most frequent languages, you can also use the "firstobs=5" data set name option to perform a join on the remaining languages.  Create a table NEED3 from this join, summing up the value of count for each have_phone value ("group by have_phone"), and creating a lable for that collection of languages ("other"=motherdominantlanguage). 

 

Then you can do a union of NEED3 with the join produced using "obs=4":

proc sql noprint;
  create table need3 as
    select 'Other' as motherdominantlanguage,n1.have_phone,sum(n1.count) as count from 
    need2 (firstobs=5) as n2    /* was "firstobs=2" - editted to "firstobs=5" */
      left join
    need1 as n1
      on n2.motherdominantlanguage=n1.motherdominantlanguage
    group by have_phone;

  create table need as select n1.* from
    need2 (obs=4) as n2 
      left join 
    need1 as n1
      on n2.motherdominantlanguage=n1.motherdominantlanguage
    outer union 
    select * from need3;


quit;
Occasional Contributor
Posts: 7

Re: PROC REPORT only show top frequencies

Thank you so much! It worked perfectly! I actually wound up just stacking the final two tables; made it a bit easier. Smiley Happy
Super User
Posts: 19,770

Re: PROC REPORT only show top frequencies

Have you considered a custom format?

Ask a Question
Discussion stats
  • 5 replies
  • 134 views
  • 0 likes
  • 3 in conversation