BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Greek
Obsidian | Level 7

Good morning,

I am trying to group "noisy" insignificant observations into one single variable so that I can focus on the observations that really matter. I have been looking to see if I can use conditionals with proc freq. The logic I am trying to implement is the following:

"if a host appears less than five times group into 'other'"

Basically regroup all hosts that appear less than five times into a single variable named "other"

I get the table listed below, but I would like to group yahoo and aol into "other"

data host_list;

input host $;

datalines;

amazon

amazon

amazon

amazon

amazon

google

google

google

google

google

aol

yahoo

yahoo

run;

proc freq data=host_list order=freq;

run;

amazon

5

  1. 38.46

5

  1. 38.46

google

5

  1. 38.46

10

  1. 76.92

yahoo

2

  1. 15.38

12

  1. 92.31

aol

1

  1. 7.69

13

  1. 100.00

Can anybody help?

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Something data driven wouldn't you say.

data host_list;
   input host $;
   datalines;
amazon
amazon
amazon
amazon
amazon
google
google
google
google
google
aol
yahoo
yahoo

   run;

proc freq data=host_list order=freq;
   tables host / out=counts;
   run;
proc print;
  
run;
proc sql noprint;
  
select host into :other separated by ', ' from counts where count lt 5;
  
run;
data fold;
   set counts;
   if count lt 5 then host = 'Other';
  
run;
proc freq data=fold order=data;
   weight count;
   tables host;
   Footnote "NOTE: Other includes: &other";
  
run;
footnote;

Message was edited by: data _null_ driveN

View solution in original post

4 REPLIES 4
Linlin
Lapis Lazuli | Level 10

I don't think you could do it in proc freq, you could add

if host in ('yahoo','aol') then host='other' in your data step.

ballardw
Super User

You could create a custom format that just assigns the unwanted to that category.

Proc format;

value $oth (upcase)

'YAHOO','AOL'='other';

run;

/* the UPCASE is in case your actual data has different capitalization of the words, the option compares the all caps version to assign the 'other' text */

proc freq data=host_list order=freq;

format host $oth.;

run;

Astounding
PROC Star

Automating the solution is the hard part.  Here's an approach ... I'm hoping someone else can come up with the code because I don't think I'll have the time to do it.

1. Run your PROC FREQ, but don't print the results.  Instead, send the results to an output data set.

2. Subset the output data set, taking all those observations having COUNT < 5.

3. Prepare that subset to become a format:  add FMTNAME and LABEL.

4. Create the format using the step 3 results as a CNTLIN= data set.

5. Re-run PROC FREQ, applying the format.

There will be smaller issues ... what will happen if all the original counts are 5 or more?  And the order of the rows  in the table might change.  (It's even possible that the Other category will have the highest count and will print first.)

But at least there's an approach to work with.

Good luck.

data_null__
Jade | Level 19

Something data driven wouldn't you say.

data host_list;
   input host $;
   datalines;
amazon
amazon
amazon
amazon
amazon
google
google
google
google
google
aol
yahoo
yahoo

   run;

proc freq data=host_list order=freq;
   tables host / out=counts;
   run;
proc print;
  
run;
proc sql noprint;
  
select host into :other separated by ', ' from counts where count lt 5;
  
run;
data fold;
   set counts;
   if count lt 5 then host = 'Other';
  
run;
proc freq data=fold order=data;
   weight count;
   tables host;
   Footnote "NOTE: Other includes: &other";
  
run;
footnote;

Message was edited by: data _null_ driveN

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 999 views
  • 9 likes
  • 5 in conversation