BookmarkSubscribeRSS Feed
pokepim
Calcite | Level 5

Hi I have a dataset consiting of large amount of variables (50+)

Some of those variables are text and some of them are numeric. I want to categorize every of those variables into 2 or 3 categories. So lets say the second column of my dataset which is age would be categorized into young (age<25) older (25<age<40) oldest (age>40).

Another variable which is gender would be categorized into male and female. And the third is number of cars (0=none, 1 car= 1, 2+ cars= multiple) and so on.

So the thing is every variable woud require writing different categorization criteria which will result in a quite long code if done manually.

I was thinking of creating a table where I would put a criteria how every variable should be split and then create conditions based on that. But I am not sure if it is the simplest approach and maybe you can give me some insight how to deal with this problem.

BTW im on SAS 9.4.

7 REPLIES 7
TomKari
Onyx | Level 15

Take a look at formats. They can do a very nice job of recoding data.

 

Here's an example for Age, but you could use the same for all of the other variables.

 

Tom

 

data have;
	input Age;
	cards;
1
15
24
25
39
40
41
89
;
run;

proc format;
	value Agef
		low-24 = "Young"
		25-39 = "Older"
		40-high = "Oldest"
	;
run;

data want;
	set have;
	length AgeReformat $20;
	AgeReformat = put(Age, Agef.);
run;
Reeza
Super User
Tom, I edited your post only to see if the text highlighting could be corrected. Apparently not 😞
pokepim
Calcite | Level 5

And lets say I have a table where I keep conditions using which i have to cateogrize?

Like I have table in this format:

Category Condition

1               not missing(variable_name) and variable_name<=20

2               20<variable_name <=40

3               40<variable_name

 

Then based on the column condition I would categorize those values.

art297
Opal | Level 21

I agree with @TomKari but, depending upon what you want to do, you likely don't have to create new variables .. just create and apply the formats. e.g., using Tom's example:

 

data have;
	input Age;
	cards;
1
15
24
25
39
40
41
89
run;

proc format;
	value Agef
		low-24 = "Young"
		25-39 = "Older"
		40-high = "Oldest"
	;
run;

data want;
	set have;
	format Age Agef.;
run;

proc freq data=want;
  tables age;
run;

Art, CEO, AnalystFinder.com

 

pokepim
Calcite | Level 5
Is it possible to parse conditions from another table and then use them to create format? As I have this kind of table created before and Im required to use it:

Category Condition
1 not missing(variable_name) and variable_name<=20
2 20<variable_name <=40
3 40<variable_name

Then based on the column condition I would categorize those values.
art297
Opal | Level 21

Yes, you can create formats from a table. See, e.g., www2.sas.com/proceedings/forum2007/068-2007.pdf

 

and then you can apply the same format(s) to as many variables as needed.

 

Art, CEO, AnalystFinder.com

 

ChrisNZ
Tourmaline | Level 20

You'll have to define the character categories yourself. Formats are a good option to do this as explained.

For numerics, this may be a good option to automate the intervals if you only want counts:


proc summary data=SASHELP.CLASS ; 
  var _numeric_  ;
  output out=SUM min= p25= p50= p75= max= /autoname;
run;

 

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2648 views
  • 2 likes
  • 5 in conversation