In a specific column I have many repeated informations like:
red
blue
yellow
white
red
blue
yellow
red
blue
yellow
Any thought how could proceed to rename a cell when it has a low frequency?
In the sample above I would like to rename white as other since it has frequency 1.
Thanks,
It helps if you post directly into the forum, not everyone can download files. You'll get answers faster.
*Sample data;
data Colours;
input colour$;
cards;
Blue
Yellow
Red
Yellow
Orange
Orange
Yellow
Orange
Blue
Gray
Orange
Yellow
White
Orange
Blue
White
White
Blue
Orange
White
Orange
Yellow
White
Blue
Orange
Yellow
Yellow
Orange
Orange
Yellow
;;;;
run;
*create table to create format;
proc freq data=Colours order=freq noprint;
table colour / out=format_base;
run;
*create rules for format;
data colour_fmt;
set format_base;
length start label $20.;
fmtname = 'colours_fmt';
type='C';
start=colour;
if count <5 then label='Other';
else label=colour;
run;
*create format;
proc format cntlin=colour_fmt;
run;
*Apply format;
proc freq data=colours order=freq;
table colour;
format colour $colours_fmt.;
run;
Try a FORMAT with Other category.
Anything not in the main list will fall into the OTHER category, similar to a default ELSE.
The spreadsheet has more then 100.000 lines and over 200 different variables. Any thought how could build the format code?
Your question is vague, so you’re going to get vague responses. If you have a fairly specific question with sample input/output data it’s easier to help you out.
You don’t have to type out a format manually, it can be data driven and you can use a CNTLIN dataset to specify the formats.
I put a sample in the attached .sas file.
I think is more clear. If not please let me know
It helps if you post directly into the forum, not everyone can download files. You'll get answers faster.
*Sample data;
data Colours;
input colour$;
cards;
Blue
Yellow
Red
Yellow
Orange
Orange
Yellow
Orange
Blue
Gray
Orange
Yellow
White
Orange
Blue
White
White
Blue
Orange
White
Orange
Yellow
White
Blue
Orange
Yellow
Yellow
Orange
Orange
Yellow
;;;;
run;
*create table to create format;
proc freq data=Colours order=freq noprint;
table colour / out=format_base;
run;
*create rules for format;
data colour_fmt;
set format_base;
length start label $20.;
fmtname = 'colours_fmt';
type='C';
start=colour;
if count <5 then label='Other';
else label=colour;
run;
*create format;
proc format cntlin=colour_fmt;
run;
*Apply format;
proc freq data=colours order=freq;
table colour;
format colour $colours_fmt.;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.