Hi I have a dataset consiting of large amount of variables (50+)
Some of those variables are text and some of them are numeric. I want to categorize every of those variables into 2 or 3 categories. So lets say the second column of my dataset which is age would be categorized into young (age<25) older (25<age<40) oldest (age>40).
Another variable which is gender would be categorized into male and female. And the third is number of cars (0=none, 1 car= 1, 2+ cars= multiple) and so on.
So the thing is every variable woud require writing different categorization criteria which will result in a quite long code if done manually.
I was thinking of creating a table where I would put a criteria how every variable should be split and then create conditions based on that. But I am not sure if it is the simplest approach and maybe you can give me some insight how to deal with this problem.
BTW im on SAS 9.4.
Take a look at formats. They can do a very nice job of recoding data.
Here's an example for Age, but you could use the same for all of the other variables.
Tom
data have;
input Age;
cards;
1
15
24
25
39
40
41
89
;
run;
proc format;
value Agef
low-24 = "Young"
25-39 = "Older"
40-high = "Oldest"
;
run;
data want;
set have;
length AgeReformat $20;
AgeReformat = put(Age, Agef.);
run;
And lets say I have a table where I keep conditions using which i have to cateogrize?
Like I have table in this format:
Category Condition
1 not missing(variable_name) and variable_name<=20
2 20<variable_name <=40
3 40<variable_name
Then based on the column condition I would categorize those values.
I agree with @TomKari but, depending upon what you want to do, you likely don't have to create new variables .. just create and apply the formats. e.g., using Tom's example:
data have; input Age; cards; 1 15 24 25 39 40 41 89 run; proc format; value Agef low-24 = "Young" 25-39 = "Older" 40-high = "Oldest" ; run; data want; set have; format Age Agef.; run; proc freq data=want; tables age; run;
Art, CEO, AnalystFinder.com
Yes, you can create formats from a table. See, e.g., www2.sas.com/proceedings/forum2007/068-2007.pdf
and then you can apply the same format(s) to as many variables as needed.
Art, CEO, AnalystFinder.com
You'll have to define the character categories yourself. Formats are a good option to do this as explained.
For numerics, this may be a good option to automate the intervals if you only want counts:
proc summary data=SASHELP.CLASS ;
var _numeric_ ;
output out=SUM min= p25= p50= p75= max= /autoname;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.