Would you please look at what @Reeza just posted? A lot of people have given you solutions now.
There are reasons not to use the "data step" solutions but instead use a format directly.
Reason #1: Use the format when printing or using the variable means that you do not have to modify the data set.
Reason #2: Most reporting, analysis and graphing procedures/options will honor the groups created by the format.
Reason #3: You change the format associated with a variable without having to recreate the data set using code such as Proc Datasets.
Reason #4: If you have multiple variables with the same (or sometimes just close enough values) they can use the same format instead of creating multiple additional variables.
Reason #5: If the format definition is changed (but the name remains the same) the format groups (or spelling changes) are applied without any work on the data set at all.
There are more reasons but that gets you started.
Example below creates a data set to play with and two formats. Then use Proc Freq to count the members of the groups using the two different formats.
data junk; do i=1 to 100; age = rand('integer',85); output; end; run; proc format; value agegrpa 1-12 = 'Pre-teen' 13-17= 'Teen' 18-24= 'Young adult' 25-45= '25 to 45' 46-high='46+' ; value agegrpb 1-9 =' 1 to 9' 10-19 ='10 to 19' 20-29 ='20 to 29' 30-39 ='30 to 39' 40-49 ='40 to 49' 50-59 ='50 to 59' 60-69 ='60 to 69' 70-79 ='70 to 79' 80-89 ='80 to 89' ; run; proc freq data=junk; title "Using agegrpa"; table age; format age agegrpa.;; run; proc freq data=junk; title "Using agegrpb"; table age; format age agegrpb.;; run;
I've worked with projects where we worked with as many as 12 different age group sets because the data was used for multiple programs and the programs targeted different age groups and wanted reports based on the target groups.
One last response, modifying your latest program slightly:
proc format;
value agegroup
0-5='0-5'
6-11='5-11'
12-18='11-18'
19-44='18-44'
45-high='>45';
run;
data sample;
input name $ gender $ age;
aggrp = put(age, agegroup.);
datalines;
Vinod M 10
Shalini F 18
Reena F 25
Rishi M 40
Sam M 55
;
Once a semicolon marks the end of the datalines, you no longer need a run statement.
You missed a few concepts with your initial code:
Example is here.
data task;
input name $ gender $ age;
a=age;
length agegroup $10.;
if 0<a<=5 then agegroup="0-5";
else if 5<a<=11 then agegroup="5-11";
else if 11<a<=18 then agegroup ="11-18";
else if 18<a<=44 then agegroup="18-44";
else if a>45 then agegroup= ">45";
datalines;
Reena F 25
Shyam M 40
Deva M 53
John M 63
Mery F 9
;
run;
@u58780790 wrote:
Read sample input of 5 rows & 3 columns (name , Gender, Age) using data lines into a temporary SAS dataset, and create new column Age-group with values ( >45, 18-44, 0-5,5-11,11-18 ) while reading the datalines into SAS datasets.
Based on age column,
Create a new column Age group
Based a=on age value derive age group below is example for Age value and corresponding age group value.
Age Agegroup
5 0-5
65 >45
90 >45
30 18-44
SOLUTION:
data task;
input name $ gender $ age;
datalines;
Reena F 25
Shyam M 40
Deva M 53
John M 63
Mery F 9
;
a=age;
if 0<a<=5 then agegroup=0-5;
if 5<a<=11 then agegroup=5-11;
if 11<a<=18 then agegroup =11-18;
if 18<a<=44 then agegroup=18-44;
if a>45 then agegroup=>45;
run;I have tried in this way but its not coming. please help me out......
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.