Hi I am trying to create age group variable from age variable which is a character variable. this code is not producing anything what is wrong with this code?
data STIREPOT.STI2020_Age_Grp; set STIREPOT.STI2020_D; length age_grp $ 6; IF Age='0' And age='4' THEN AGE_GRP='0-4'; IF Age='5' AND age='9' THEN AGE_GRP= '5-9'; IF Age='10' AND age='14' THEN AGE_GRP= '10-14'; IF Age='15' AND age='19' THEN AGE_GRP= '15-19'; IF Age='20' AND age='24' THEN AGE_GRP= '20-24'; IF Age='25' AND age='29' THEN AGE_GRP= '25-29'; IF Age='30' AND age='34' THEN AGE_GRP= '30-34'; IF Age='35' AND age='39' THEN AGE_GRP= '35-39'; IF Age='40' AND age='44' THEN AGE_GRP= '40-44'; IF Age='45' AND age='49' THEN AGE_GRP= '45-49'; IF Age='50' AND age='100' THEN AGE_GRP= '>=50'; run;
Hi.
Is Age a character or numeric variable? You don't want to use AND because Age can never be equal to two different values. If Age were numeric I would suggest something like:
If missing(Age) then Age_Grp = ' ';
else if Age le 4 then Age_Grp = '0 to 4';
else if Age le 8 then Age_Grp = '4 to 8';
I don't know if you want lt or le, it is up to you
IF Age='0' And age='4'
This is never true. Age cannot be both '0' and '4'.
Do yourself a favor, make age numeric and then things are a lot easier:
data STIREPOT.STI2020_Age_Grp;
set STIREPOT.STI2020_D;
length age_grp $ 6;
age1=input(age,3.); /* this makes a variable AGE1 which is numeric */
if 0<=age<=4 then age_grp='0-4';
else if 5<=age<=9 then ...
ADDING: creating this type of variable AGE_GRP character where values are '0-4' and so on, then these will not sort properly in most outputs because SAS will sort these alphabetically. If that's something you want, proper sorting, then apply a custom format to AGE1, and a lot of SAS procedures can be set to keep things in NUMERICAL order and the sorting will be correct.
Converting age to a numerical value and using a format could make things as simple as below.
proc format;
value age_grp(default=6)
0-4 = '0-4'
5-9 = '5-9'
....and so on ....
50-high = '>=50'
. = 'miss'
other = 'na'
;
run;
data stirepot.sti2020_age_grp;
set stirepot.sti2020_d;
age_grp=put(input(age,best32.),age_grp6.);
run;
I am of the same opinion as @Patrick. Whenever I see ranges being assigned to something continuous like age, I think Proc Format. See code and results, below. Proc Format has the advantage that if something comes along that's outside of your normal range, e.g. Ron who is 115 in my test data, you can still catch it by specifying HIGH as the upper end of the last range. You can also specify OTHER which is a catch all for anything that you didn't define. In my example, I have a negative age and an invalid (garbage data) age, both of which are handled well by the format.
Jim
DATA Test_Ages;
INPUT Name : $32. Age;
DATALINES;
Bob 5
Susan 10
Joyce 15
Michiko 20
Marjorie 25
Corie 30
Devendra 35
Sugantha 40
Venkata 45
DaiJun 50
Jasmin 55
Randell 60
Ron 115
@~a#!!Y2P yy2-
Future_Person -999
;
RUN;
PROC FORMAT;
VALUE Age_Grps
0 - 4 = '0-4'
5 - 9 = '5-9'
10 - 14 = '10-14'
15 - 19 = '15-19'
20 - 24 = '20-24'
25 - 29 = '25-29'
30 - 34 = '30-34'
35 - 39 = '35-39'
40 - 44 = '40-44'
45 - 49 = '45-49'
50 - HIGH = '>=50'
OTHER = 'Invalid'
;
RUN;
DATA Want;
SET Test_Ages;
Age_Grp = PUT(Age, Age_Grps.);
RUN;
Another vote for a Format tied to a numeric variable.
One of the advantages of formats is that you can have multiple formats and use the one that you want at the time a report or analysis procedure is run. I have age grouping formats for 5-year intervals, 10-year intervals and age groups related to various levels of target groups based on a service model.
You can create different formats for character values as well however to be reliable you have to list every single value because you will quickly find out with character values that "11" is normally before, i.e. "less than" the value "2". Ranges of numeric values are much easier to specify and get expected results.
Hi.
Is Age a character or numeric variable? You don't want to use AND because Age can never be equal to two different values. If Age were numeric I would suggest something like:
If missing(Age) then Age_Grp = ' ';
else if Age le 4 then Age_Grp = '0 to 4';
else if Age le 8 then Age_Grp = '4 to 8';
I don't know if you want lt or le, it is up to you
I might suggest "Learning SAS by Example, 2nd edition" or Getting Started with SAS Programming Using SAS Studio in the Cloud. Both by me (Ron Cody) Just go to support.sas.com/cody or enter "Ron Cody" in Amazon search. You will see examples similar to your question
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.