Hello,
I have this dataset
data have;
input ID Case_Dx
1 S72080
2 812
3 S72100
4 813.2
5 820.2
6 808.4
7 805.6
8 S5251
9 S220
10 S320
11 806
12 S5262
;
I want to add a column to the dataset that groups the 'Case_dx' column into group A, B,C.
The groups are defined as follows
GroupA= Anything that starts with 'S720', 'S721' or 'S722' (up to 8 characters)
GroupB= Anything that starts with 'S525' or 'S526' (up to 8 characters)
GroupC= Anything that starts with '805', '806', 'S220', 'S320' or 'S221' (up to 8 characters)
I usually use this
proc format;
value $ Casetype
'S720', 'S721', 'S722' = 'A'
'S525', 'S526' = 'B'
'805', '806', 'S220', 'S320', 'S221' = 'C;
run;
data have;
set want;
Type= put(Case_DX, Casetype.);
RUN;
But in this case it doesn't work because of the approximate matches.
How can I go about this?
Thanks
You can use SUBSTR function:
data want;
set have;
if substr(Case_Dx,1,4) in ("S720","S721","S722") then Type="A";
else
if substr(Case_Dx,1,4) in ("S525","S526") then Type="B";
else
if substr(Case_Dx,1,3) in ("805","806") or substr(Case_Dx,1,4) in ("S220","S320","S221") then Type="C";
run;
Something like below should work.
type= put(upcase(substr(case_dx,1,4)), $casetype.);
Hello @Ad30,
You can use the IN operator with the colon modifier (see Character Comparisons):
data have;
input ID Case_Dx $;
cards;
1 S72080
2 812
3 S72100
4 813.2
5 820.2
6 808.4
7 805.6
8 S5251
9 S220
10 S320
11 806
12 S5262
;
data want;
set have;
if Case_Dx in: ('S720' 'S721' 'S722') then Type = 'A';
else if Case_Dx in: ('S525' 'S526') then Type = 'B';
else if Case_Dx in: ('805' '806' 'S220' 'S320' 'S221') then Type = 'C';
run;
If variable Case_Dx has a defined length of 8 characters (as is the case in the code above), the condition "up to 8 characters" is automatically satisfied. Otherwise insert
if length(Case_Dx)<=8 then
before the first IF statement in order to exclude strings like "S72012345" (9 characters) from the categorization.
data have; infile cards expandtabs; input ID Case_Dx :$40.; cards; 1 S72080 2 812 3 S72100 4 813.2 5 820.2 6 808.4 7 805.6 8 S5251 9 S220 10 S320 11 806 12 S5262 ; data want; set have; if prxmatch('/^(S720|S721|S722)/',strip(Case_Dx)) then group='A'; if prxmatch('/^(S525|S526)/',strip(Case_Dx)) then group='B'; if prxmatch('/^(805|806|S320|S221)/',strip(Case_Dx)) then group='C'; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.