BookmarkSubscribeRSS Feed
Ad30
Calcite | Level 5
 

Hello,

 I have this dataset

data have;
input ID  Case_Dx
1	S72080	
2	812	
3	S72100	
4	813.2	
5	820.2	
6	808.4	
7	805.6
8	S5251	
9	S220	
10	S320
11	806
12	S5262	
;

 

I want to add a column to the dataset that groups the 'Case_dx' column into group A, B,C.

The groups are defined as follows
GroupA= Anything that starts with 'S720', 'S721' or 'S722' (up to 8 characters)
GroupB= Anything that starts with 'S525' or 'S526' (up to 8 characters)
GroupC= Anything that starts with '805', '806', 'S220', 'S320' or 'S221' (up to 8 characters)

I usually use this

proc format;
    value $ Casetype
        'S720', 'S721', 'S722' =  'A'
		'S525', 'S526'  =  'B'
        '805', '806', 'S220', 'S320', 'S221'  =  'C;
	run;

data have;
set want;
Type= put(Case_DX, Casetype.);
RUN;

But in this case it doesn't work because of the approximate matches.
How can I go about this?
Thanks

 

4 REPLIES 4
V_Altomonte
SAS Employee

You can use SUBSTR function:

 

data want;
set have;
if substr(Case_Dx,1,4) in ("S720","S721","S722") then Type="A";
else
if substr(Case_Dx,1,4) in ("S525","S526") then Type="B";
else
if substr(Case_Dx,1,3) in ("805","806") or substr(Case_Dx,1,4) in ("S220","S320","S221") then Type="C";
run;

Patrick
Opal | Level 21

Something like below should work.

  type= put(upcase(substr(case_dx,1,4)), $casetype.);
FreelanceReinh
Jade | Level 19

Hello @Ad30,

 

You can use the IN operator with the colon modifier (see Character Comparisons):

data have;
input ID Case_Dx $;
cards;
1 S72080
2 812
3 S72100
4 813.2
5 820.2
6 808.4
7 805.6
8 S5251
9 S220
10 S320
11 806
12 S5262
;

data want;
set have;
if Case_Dx in: ('S720' 'S721' 'S722') then Type = 'A';
else if Case_Dx in: ('S525' 'S526') then Type = 'B';
else if Case_Dx in: ('805' '806' 'S220' 'S320' 'S221') then Type = 'C';
run;

If variable Case_Dx has a defined length of 8 characters (as is the case in the code above), the condition "up to 8 characters" is automatically satisfied. Otherwise insert

if length(Case_Dx)<=8 then

before the first IF statement in order to exclude strings like "S72012345" (9 characters) from the categorization.

Ksharp
Super User
data have;
infile cards expandtabs;
input ID  Case_Dx :$40.;
cards;
1 S72080 
2 812 
3 S72100 
4 813.2 
5 820.2 
6 808.4 
7 805.6
8 S5251 
9 S220 
10 S320
11 806
12 S5262 
;

data want;
 set have;
 if  prxmatch('/^(S720|S721|S722)/',strip(Case_Dx)) then group='A';
 if  prxmatch('/^(S525|S526)/',strip(Case_Dx)) then group='B';
 if  prxmatch('/^(805|806|S320|S221)/',strip(Case_Dx)) then group='C';
run;

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 838 views
  • 4 likes
  • 5 in conversation