Help using Base SAS procedures

Proc Freq-Conditionals?

Accepted Solution Solved
Reply
Contributor
Posts: 45
Accepted Solution

Proc Freq-Conditionals?

Good morning,

I am trying to group "noisy" insignificant observations into one single variable so that I can focus on the observations that really matter. I have been looking to see if I can use conditionals with proc freq. The logic I am trying to implement is the following:

"if a host appears less than five times group into 'other'"

Basically regroup all hosts that appear less than five times into a single variable named "other"

I get the table listed below, but I would like to group yahoo and aol into "other"

data host_list;

input host $;

datalines;

amazon

amazon

amazon

amazon

amazon

google

google

google

google

google

aol

yahoo

yahoo

run;

proc freq data=host_list order=freq;

run;

amazon

5

  1. 38.46

5

  1. 38.46

google

5

  1. 38.46

10

  1. 76.92

yahoo

2

  1. 15.38

12

  1. 92.31

aol

1

  1. 7.69

13

  1. 100.00

Can anybody help?

Thank you


Accepted Solutions
Solution
‎09-12-2013 10:58 AM
Respected Advisor
Posts: 3,777

Re: Proc Freq-Conditionals?

Something data driven wouldn't you say.

data host_list;
   input host $;
   datalines;
amazon
amazon
amazon
amazon
amazon
google
google
google
google
google
aol
yahoo
yahoo

   run;

proc freq data=host_list order=freq;
   tables host / out=counts;
   run;
proc print;
  
run;
proc sql noprint;
  
select host into Smiley Surprisedther separated by ', ' from counts where count lt 5;
  
run;
data fold;
   set counts;
   if count lt 5 then host = 'Other';
  
run;
proc freq data=fold order=data;
   weight count;
   tables host;
   Footnote "NOTE: Other includes: &other";
  
run;
footnote;

Message was edited by: data _null_ driveN

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Re: Proc Freq-Conditionals?

I don't think you could do it in proc freq, you could add

if host in ('yahoo','aol') then host='other' in your data step.

Super User
Posts: 10,483

Re: Proc Freq-Conditionals?

You could create a custom format that just assigns the unwanted to that category.

Proc format;

value $oth (upcase)

'YAHOO','AOL'='other';

run;

/* the UPCASE is in case your actual data has different capitalization of the words, the option compares the all caps version to assign the 'other' text */

proc freq data=host_list order=freq;

format host $oth.;

run;

Super User
Posts: 5,081

Re: Proc Freq-Conditionals?

Automating the solution is the hard part.  Here's an approach ... I'm hoping someone else can come up with the code because I don't think I'll have the time to do it.

1. Run your PROC FREQ, but don't print the results.  Instead, send the results to an output data set.

2. Subset the output data set, taking all those observations having COUNT < 5.

3. Prepare that subset to become a format:  add FMTNAME and LABEL.

4. Create the format using the step 3 results as a CNTLIN= data set.

5. Re-run PROC FREQ, applying the format.

There will be smaller issues ... what will happen if all the original counts are 5 or more?  And the order of the rows  in the table might change.  (It's even possible that the Other category will have the highest count and will print first.)

But at least there's an approach to work with.

Good luck.

Solution
‎09-12-2013 10:58 AM
Respected Advisor
Posts: 3,777

Re: Proc Freq-Conditionals?

Something data driven wouldn't you say.

data host_list;
   input host $;
   datalines;
amazon
amazon
amazon
amazon
amazon
google
google
google
google
google
aol
yahoo
yahoo

   run;

proc freq data=host_list order=freq;
   tables host / out=counts;
   run;
proc print;
  
run;
proc sql noprint;
  
select host into Smiley Surprisedther separated by ', ' from counts where count lt 5;
  
run;
data fold;
   set counts;
   if count lt 5 then host = 'Other';
  
run;
proc freq data=fold order=data;
   weight count;
   tables host;
   Footnote "NOTE: Other includes: &other";
  
run;
footnote;

Message was edited by: data _null_ driveN

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 271 views
  • 9 likes
  • 5 in conversation