Solved: Re: Proc Freq-Conditionals?

Greek · Posted 09-12-2013 09:57 AM

Good morning,

I am trying to group "noisy" insignificant observations into one single variable so that I can focus on the observations that really matter. I have been looking to see if I can use conditionals with proc freq. The logic I am trying to implement is the following:

"if a host appears less than five times group into 'other'"

Basically regroup all hosts that appear less than five times into a single variable named "other"

I get the table listed below, but I would like to group yahoo and aol into "other"

data host_list;

input host $;

datalines;

amazon

google

aol

yahoo

run;

proc freq data=host_list order=freq;

run;

amazon	5	38.46	5	38.46
google	5	38.46	10	76.92
yahoo	2	15.38	12	92.31
aol	1	7.69	13	100.00

Can anybody help?

Thank you

data_null__ · Posted 09-12-2013 10:58 AM

Something data driven wouldn't you say.

data host_list;
   input host $;
   datalines; 
amazon
amazon
amazon
amazon
amazon
google
google
google
google
google
aol
yahoo
yahoo

   run; 

proc freq data=host_list order=freq;
   tables host / out=counts;
   run; 
proc print; 
   run; 
proc sql noprint; 
   select host into :other separated by ', ' from counts where count lt 5; 
   run; 
data fold;
   set counts;
   if count lt 5 then host = 'Other'; 
   run; 
proc freq data=fold order=data;
   weight count;
   tables host;
   Footnote "NOTE: Other includes: &other"; 
   run; 
footnote;

Message was edited by: data _null_ driveN

View solution in original post

Linlin · Posted 09-12-2013 10:36 AM

I don't think you could do it in proc freq, you could add

if host in ('yahoo','aol') then host='other' in your data step.

ballardw · Posted 09-12-2013 10:43 AM

You could create a custom format that just assigns the unwanted to that category.

Proc format;

value $oth (upcase)

'YAHOO','AOL'='other';

run;

/* the UPCASE is in case your actual data has different capitalization of the words, the option compares the all caps version to assign the 'other' text */

proc freq data=host_list order=freq;

format host $oth.;

run;

Astounding · Posted 09-12-2013 10:57 AM

Automating the solution is the hard part. Here's an approach ... I'm hoping someone else can come up with the code because I don't think I'll have the time to do it.

1. Run your PROC FREQ, but don't print the results. Instead, send the results to an output data set.

2. Subset the output data set, taking all those observations having COUNT < 5.

3. Prepare that subset to become a format: add FMTNAME and LABEL.

4. Create the format using the step 3 results as a CNTLIN= data set.

5. Re-run PROC FREQ, applying the format.

There will be smaller issues ... what will happen if all the original counts are 5 or more? And the order of the rows in the table might change. (It's even possible that the Other category will have the highest count and will print first.)

But at least there's an approach to work with.

Good luck.

data_null__ · Posted 09-12-2013 10:58 AM

Something data driven wouldn't you say.

data host_list;
   input host $;
   datalines; 
amazon
amazon
amazon
amazon
amazon
google
google
google
google
google
aol
yahoo
yahoo

   run; 

proc freq data=host_list order=freq;
   tables host / out=counts;
   run; 
proc print; 
   run; 
proc sql noprint; 
   select host into :other separated by ', ' from counts where count lt 5; 
   run; 
data fold;
   set counts;
   if count lt 5 then host = 'Other'; 
   run; 
proc freq data=fold order=data;
   weight count;
   tables host;
   Footnote "NOTE: Other includes: &other"; 
   run; 
footnote;

Message was edited by: data _null_ driveN

Proc Freq-Conditionals?

Re: Proc Freq-Conditionals?

Re: Proc Freq-Conditionals?

Re: Proc Freq-Conditionals?

Re: Proc Freq-Conditionals?

Re: Proc Freq-Conditionals?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away