Read sample input of 5 rows & 3 columns (name , Gender, Age) using data lines into a temporary SAS dataset, and create new column Age-group with values ( >45, 18-44, 0-5,5-11,11-18 ) while reading the datalines into SAS datasets.
Based on age column,
Create a new column Age group
Based a=on age value derive age group below is example for Age value and corresponding age group value.
Age Agegroup
5 0-5
65 >45
90 >45
30 18-44
SOLUTION:
Surely your teacher and course material showed you how to do this.
What have you tried?
Please
I would use proc format to define a format, to avoid if-then-else, and then, after the input-statement, the input-function:
agegroup = input(Age, AgeGroupFmt.);
You have some nearly working pieces. Let's change a few things.
First, to use a format you have to first create it. So PROC FORMAT must be moved to before the DATA step.
Second, there is a syntax error in the PROC FORMAT. The way to get 45 and higher in the same group is to specify:
45 - high = '>45';
Third, expect that the first mention of a number determines which group it belongs to. So 5 will go into the "0-5" category, not into the "5-11" category.
Fourth, there is no need to create the variable A. You have AGE, and can use it:
aggrp=put(age, agegroup.);
You switched the PUT function to the INPUT function. Restore the PUT function.
You're still going to get errors with the code like that. You need to move aggrp before datalines statement but I also think your ranges in proc format are not good. What if someone has an age between 5 and 6 or between 11 and 12? All of those people are going to get left out with the way you've assigned the ranges. Please read the documentation.
https://documentation.sas.com/doc/en/vdmmlcdc/8.1/proc/n03qskwoints2an1ispy57plwrn9.htm
Also, it is ALWAYS a good idea to check your work. Please run the proc means that I use to check the derivation of aggrp.
proc format;
value agegroup
0-5='0-5'
6-11='5-11'
12-18='11-18'
19-44='18-44'
45-high='>45';
run;
data sample;
infile datalines;
input name $ gender $ age;
aggrp=put(age,agegroup.);
datalines;
Vinod M 10
Shalini F 18
Reena F 25
Rishi M 40
Sam M 55
;
proc print data=sample;
run;
proc means data=sample n nmiss min max;
var age;
class agegroup / missing;
run;
@u58780790 wrote:
proc format;
value agegroup
0-5='0-5'
6-11='5-11'
12-18='11-18'
19-44='18-44'
45-high='>45';
run;
data sample;
input name $ gender $ age;
datalines;
Vinod M 10
Shalini F 18
Reena F 25
Rishi M 40
Sam M 55
;
aggrp=input(age,agegroup.);
run;
Is this right ?
But it doesn't give the output... shows error at aggrp line....
The crucial point is the DATALINES statement, which is documented here; you will find the clue in there.
Those formats are not going to work the way you've written them. Please try my code.
proc format;
value agegroupf
1='0-5'
2='5-11'
3='11-18'
4='18-44'
5='>45';
run;
data task2;
set task;
format agegroup agegroupf.;
if age > .z then do;
if age lt 5 then agegroup=1;
else if 5 le age lt 11 then agegroup=2;
else if 11 le age lt 18 then agegroup=3;
else if 18 le age lt 45 then agegroup=4;
else if age ge 45 then agegroup=5;
end;
run;
title "Check agegroup derivation";
proc means data=task2 n nmiss min max;
var age;
class agegroup / missing;
run;
title;
Well, I put it there to only include non-missing age values.
I have shown a data step solution. If you don't want to do agegroup=1,2,3,4,5 and apply a format then just set agegroup = '0-5', '5-11' and so on and so forth.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.