Re: SAS assign variables to a category based on approximate matches of...

Ad30 · Posted 06-24-2022 01:30 AM

Hello,

I have this dataset

data have;
input ID  Case_Dx
1	S72080	
2	812	
3	S72100	
4	813.2	
5	820.2	
6	808.4	
7	805.6
8	S5251	
9	S220	
10	S320
11	806
12	S5262	
;

I want to add a column to the dataset that groups the 'Case_dx' column into group A, B,C.

The groups are defined as follows
GroupA= Anything that starts with 'S720', 'S721' or 'S722' (up to 8 characters)
GroupB= Anything that starts with 'S525' or 'S526' (up to 8 characters)
GroupC= Anything that starts with '805', '806', 'S220', 'S320' or 'S221' (up to 8 characters)

I usually use this

proc format;
    value $ Casetype
        'S720', 'S721', 'S722' =  'A'
		'S525', 'S526'  =  'B'
        '805', '806', 'S220', 'S320', 'S221'  =  'C;
	run;

data have;
set want;
Type= put(Case_DX, Casetype.);
RUN;

But in this case it doesn't work because of the approximate matches.
How can I go about this?
Thanks

V_Altomonte · Posted 06-24-2022 05:05 AM

You can use SUBSTR function:

data want;
set have;
if substr(Case_Dx,1,4) in ("S720","S721","S722") then Type="A";
else
if substr(Case_Dx,1,4) in ("S525","S526") then Type="B";
else
if substr(Case_Dx,1,3) in ("805","806") or substr(Case_Dx,1,4) in ("S220","S320","S221") then Type="C";
run;

Patrick · Posted 06-24-2022 05:15 AM

Something like below should work.

  type= put(upcase(substr(case_dx,1,4)), $casetype.);

FreelanceReinh · Posted 06-24-2022 05:22 AM

Hello @Ad30,

You can use the IN operator with the colon modifier (see Character Comparisons):

data have;
input ID Case_Dx $;
cards;
1 S72080
2 812
3 S72100
4 813.2
5 820.2
6 808.4
7 805.6
8 S5251
9 S220
10 S320
11 806
12 S5262
;

data want;
set have;
if Case_Dx in: ('S720' 'S721' 'S722') then Type = 'A';
else if Case_Dx in: ('S525' 'S526') then Type = 'B';
else if Case_Dx in: ('805' '806' 'S220' 'S320' 'S221') then Type = 'C';
run;

If variable Case_Dx has a defined length of 8 characters (as is the case in the code above), the condition "up to 8 characters" is automatically satisfied. Otherwise insert

if length(Case_Dx)<=8 then

before the first IF statement in order to exclude strings like "S72012345" (9 characters) from the categorization.

Ksharp · Posted 06-24-2022 07:19 AM

data have;
infile cards expandtabs;
input ID  Case_Dx :$40.;
cards;
1 S72080 
2 812 
3 S72100 
4 813.2 
5 820.2 
6 808.4 
7 805.6
8 S5251 
9 S220 
10 S320
11 806
12 S5262 
;

data want;
 set have;
 if  prxmatch('/^(S720|S721|S722)/',strip(Case_Dx)) then group='A';
 if  prxmatch('/^(S525|S526)/',strip(Case_Dx)) then group='B';
 if  prxmatch('/^(805|806|S320|S221)/',strip(Case_Dx)) then group='C';
run;

SAS assign variables to a category based on approximate matches of strings

Re: SAS assign variables to a category based on approximate matches of strings

Re: SAS assign variables to a category based on approximate matches of strings

Re: SAS assign variables to a category based on approximate matches of strings

Re: SAS assign variables to a category based on approximate matches of strings

SAS assign variables to a category based on approximate matches of strings

Re: SAS assign variables to a category based on approximate matches of strings

Re: SAS assign variables to a category based on approximate matches of strings

Re: SAS assign variables to a category based on approximate matches of strings

Re: SAS assign variables to a category based on approximate matches of strings

Registration is open