Hello,
I have this dataset
data have;
input ID Case_Dx
1 S72080
2 812
3 S72100
4 813.2
5 820.2
6 808.4
7 805.6
8 S5251
9 S220
10 S320
11 806
12 S5262
;
I want to add a column to the dataset that groups the 'Case_dx' column into group A, B,C.
The groups are defined as follows
GroupA= Anything that starts with 'S720', 'S721' or 'S722' (up to 8 characters)
GroupB= Anything that starts with 'S525' or 'S526' (up to 8 characters)
GroupC= Anything that starts with '805', '806', 'S220', 'S320' or 'S221' (up to 8 characters)
I usually use this
proc format;
value $ Casetype
'S720', 'S721', 'S722' = 'A'
'S525', 'S526' = 'B'
'805', '806', 'S220', 'S320', 'S221' = 'C;
run;
data have;
set want;
Type= put(Case_DX, Casetype.);
RUN;But in this case it doesn't work because of the approximate matches.
How can I go about this?
Thanks
You can use SUBSTR function:
data want;
set have;
if substr(Case_Dx,1,4) in ("S720","S721","S722") then Type="A";
else
if substr(Case_Dx,1,4) in ("S525","S526") then Type="B";
else
if substr(Case_Dx,1,3) in ("805","806") or substr(Case_Dx,1,4) in ("S220","S320","S221") then Type="C";
run;
Something like below should work.
type= put(upcase(substr(case_dx,1,4)), $casetype.);
Hello @Ad30,
You can use the IN operator with the colon modifier (see Character Comparisons):
data have;
input ID Case_Dx $;
cards;
1 S72080
2 812
3 S72100
4 813.2
5 820.2
6 808.4
7 805.6
8 S5251
9 S220
10 S320
11 806
12 S5262
;
data want;
set have;
if Case_Dx in: ('S720' 'S721' 'S722') then Type = 'A';
else if Case_Dx in: ('S525' 'S526') then Type = 'B';
else if Case_Dx in: ('805' '806' 'S220' 'S320' 'S221') then Type = 'C';
run;
If variable Case_Dx has a defined length of 8 characters (as is the case in the code above), the condition "up to 8 characters" is automatically satisfied. Otherwise insert
if length(Case_Dx)<=8 then
before the first IF statement in order to exclude strings like "S72012345" (9 characters) from the categorization.
data have;
infile cards expandtabs;
input ID Case_Dx :$40.;
cards;
1 S72080
2 812
3 S72100
4 813.2
5 820.2
6 808.4
7 805.6
8 S5251
9 S220
10 S320
11 806
12 S5262
;
data want;
set have;
if prxmatch('/^(S720|S721|S722)/',strip(Case_Dx)) then group='A';
if prxmatch('/^(S525|S526)/',strip(Case_Dx)) then group='B';
if prxmatch('/^(805|806|S320|S221)/',strip(Case_Dx)) then group='C';
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.