I would like to create an indicator at the row level for if a series of codes appears in a range of columns.
The logic should be:
If any of columns code_0-code_n have values starting with '345', '817' or 'I37' indicator=1, else indicator=0.
It's important that it is "starts with" rather than contains, because I do not want codes like "G345" to be counted.
Here is a small version of what the data looks like--in reality it is up to 24 codes.
data WORK.IMPORT; infile datalines dsd truncover; input person:$4. code_0:$4. code_1:$4. code_2:$4. code_3:$4. code_4:$4. code_5:$4.; datalines; Jane 345 I371 693 1125 G260 2461 Jake G345 8170 565 8172 8701 2717 Joan 890 8172 653 C980 3652 I370 Mark 410 149 K628 3450 229 S280 Mary 2797 D140 P635 1025 445 O980 ;;;;
I'm relatively new to SAS--I tried with this array, but it seemed to only count if the last code began with those strings:
array icd code_0-code_5;
do i = 1 to dim(icd);
if icd[i]=:'345' or icd[i]=:'817' or icd[i]=:'I37' the indicator=1;
else indicator=0;
end;
Thanks in advance for your help.
You don't want the ELSE in that type of search. If you want values of one and zero instead of one and missing then set the value to zero before the do loop. Also you can stop when you find one.
array icd code_0-code_5;
indicator=0;
do i = 1 to dim(icd) until (indication);
if icd[i]=:'345' or icd[i]=:'817' or icd[i]=:'I37' then indicator=1;
end;
You can also use the special format of the IN function to search all of the array variables at once.
array icd code_0-code_5;
indicator=('345' in: icd) or ('817' in: icd) or ('I37' in: icd);
Obs person code_0 code_1 code_2 code_3 code_4 code_5 indicator 1 Jane 345 I371 693 1125 G260 2461 1 2 Jake G345 8170 565 8172 8701 2717 1 3 Joan 890 8172 653 C980 3652 I370 1 4 Mark 410 149 K628 3450 229 S280 1 5 Mary 2797 D140 P635 1025 445 O980 0
data WORK.IMPORT;
infile datalines ;
input person:$4. code_0:$4. code_1:$4. code_2:$4. code_3:$4. code_4:$4. code_5:$4.;
datalines;
Jane 345 I371 693 1125 G260 2461
Jake G345 8170 565 8172 8701 2717
Joan 890 8172 653 C980 3652 I370
Mark 410 149 K628 3450 229 S280
Mary 2797 D140 P635 1025 445 O980
;;;;
data want;
set import;
array icd code_0-code_5;
indicator=0;
do over icd;
if icd in : ('345' '817' 'I37') then indicator =1;
end;
run;
Like this?
INDICATOR=prxmatch('/:(345|817|I37)/', ':'||catx(':',of CODE_0-CODE_5))>0;
You don't want the ELSE in that type of search. If you want values of one and zero instead of one and missing then set the value to zero before the do loop. Also you can stop when you find one.
array icd code_0-code_5;
indicator=0;
do i = 1 to dim(icd) until (indication);
if icd[i]=:'345' or icd[i]=:'817' or icd[i]=:'I37' then indicator=1;
end;
You can also use the special format of the IN function to search all of the array variables at once.
array icd code_0-code_5;
indicator=('345' in: icd) or ('817' in: icd) or ('I37' in: icd);
Obs person code_0 code_1 code_2 code_3 code_4 code_5 indicator 1 Jane 345 I371 693 1125 G260 2461 1 2 Jake G345 8170 565 8172 8701 2717 1 3 Joan 890 8172 653 C980 3652 I370 1 4 Mark 410 149 K628 3450 229 S280 1 5 Mary 2797 D140 P635 1025 445 O980 0
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.