BookmarkSubscribeRSS Feed
pokepim
Calcite | Level 5

Hi I have a dataset consiting of large amount of variables (50+)

Some of those variables are text and some of them are numeric. I want to categorize every of those variables into 2 or 3 categories. So lets say the second column of my dataset which is age would be categorized into young (age<25) older (25<age<40) oldest (age>40).

Another variable which is gender would be categorized into male and female. And the third is number of cars (0=none, 1 car= 1, 2+ cars= multiple) and so on.

So the thing is every variable woud require writing different categorization criteria which will result in a quite long code if done manually.

I was thinking of creating a table where I would put a criteria how every variable should be split and then create conditions based on that. But I am not sure if it is the simplest approach and maybe you can give me some insight how to deal with this problem.

BTW im on SAS 9.4.

7 REPLIES 7
TomKari
Onyx | Level 15

Take a look at formats. They can do a very nice job of recoding data.

 

Here's an example for Age, but you could use the same for all of the other variables.

 

Tom

 

data have;
	input Age;
	cards;
1
15
24
25
39
40
41
89
;
run;

proc format;
	value Agef
		low-24 = "Young"
		25-39 = "Older"
		40-high = "Oldest"
	;
run;

data want;
	set have;
	length AgeReformat $20;
	AgeReformat = put(Age, Agef.);
run;
Reeza
Super User
Tom, I edited your post only to see if the text highlighting could be corrected. Apparently not 😞
pokepim
Calcite | Level 5

And lets say I have a table where I keep conditions using which i have to cateogrize?

Like I have table in this format:

Category Condition

1               not missing(variable_name) and variable_name<=20

2               20<variable_name <=40

3               40<variable_name

 

Then based on the column condition I would categorize those values.

art297
Opal | Level 21

I agree with @TomKari but, depending upon what you want to do, you likely don't have to create new variables .. just create and apply the formats. e.g., using Tom's example:

 

data have;
	input Age;
	cards;
1
15
24
25
39
40
41
89
run;

proc format;
	value Agef
		low-24 = "Young"
		25-39 = "Older"
		40-high = "Oldest"
	;
run;

data want;
	set have;
	format Age Agef.;
run;

proc freq data=want;
  tables age;
run;

Art, CEO, AnalystFinder.com

 

pokepim
Calcite | Level 5
Is it possible to parse conditions from another table and then use them to create format? As I have this kind of table created before and Im required to use it:

Category Condition
1 not missing(variable_name) and variable_name<=20
2 20<variable_name <=40
3 40<variable_name

Then based on the column condition I would categorize those values.
art297
Opal | Level 21

Yes, you can create formats from a table. See, e.g., www2.sas.com/proceedings/forum2007/068-2007.pdf

 

and then you can apply the same format(s) to as many variables as needed.

 

Art, CEO, AnalystFinder.com

 

ChrisNZ
Tourmaline | Level 20

You'll have to define the character categories yourself. Formats are a good option to do this as explained.

For numerics, this may be a good option to automate the intervals if you only want counts:


proc summary data=SASHELP.CLASS ; 
  var _numeric_  ;
  output out=SUM min= p25= p50= p75= max= /autoname;
run;

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1812 views
  • 2 likes
  • 5 in conversation