Hi I have a dataset consiting of large amount of variables (50+)
Some of those variables are text and some of them are numeric. I want to categorize every of those variables into 2 or 3 categories. So lets say the second column of my dataset which is age would be categorized into young (age<25) older (25<age<40) oldest (age>40).
Another variable which is gender would be categorized into male and female. And the third is number of cars (0=none, 1 car= 1, 2+ cars= multiple) and so on.
So the thing is every variable woud require writing different categorization criteria which will result in a quite long code if done manually.
I was thinking of creating a table where I would put a criteria how every variable should be split and then create conditions based on that. But I am not sure if it is the simplest approach and maybe you can give me some insight how to deal with this problem.
BTW im on SAS 9.4.
Take a look at formats. They can do a very nice job of recoding data.
Here's an example for Age, but you could use the same for all of the other variables.
Tom
data have;
input Age;
cards;
1
15
24
25
39
40
41
89
;
run;
proc format;
value Agef
low-24 = "Young"
25-39 = "Older"
40-high = "Oldest"
;
run;
data want;
set have;
length AgeReformat $20;
AgeReformat = put(Age, Agef.);
run;
And lets say I have a table where I keep conditions using which i have to cateogrize?
Like I have table in this format:
Category Condition
1 not missing(variable_name) and variable_name<=20
2 20<variable_name <=40
3 40<variable_name
Then based on the column condition I would categorize those values.
I agree with @TomKari but, depending upon what you want to do, you likely don't have to create new variables .. just create and apply the formats. e.g., using Tom's example:
data have; input Age; cards; 1 15 24 25 39 40 41 89 run; proc format; value Agef low-24 = "Young" 25-39 = "Older" 40-high = "Oldest" ; run; data want; set have; format Age Agef.; run; proc freq data=want; tables age; run;
Art, CEO, AnalystFinder.com
Yes, you can create formats from a table. See, e.g., www2.sas.com/proceedings/forum2007/068-2007.pdf
and then you can apply the same format(s) to as many variables as needed.
Art, CEO, AnalystFinder.com
You'll have to define the character categories yourself. Formats are a good option to do this as explained.
For numerics, this may be a good option to automate the intervals if you only want counts:
proc summary data=SASHELP.CLASS ;
var _numeric_ ;
output out=SUM min= p25= p50= p75= max= /autoname;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.