Solved: Re: Search for codes with arrray - excluding cases

Chris_LK_87 · Posted 06-25-2020 09:42 AM

Hello!

At the moment I am working with a very wide dataset, that contains more than 500 columns (not exactly following Maxim 19).

In the dataset each row represents an individual and all their healthdata. dcode1, dcode2...dcode500.

Here is my dataset (simplified)

data have;
input
id$ dcode1$ dcode2$ dcode3$;
cards;
1 J01AA02 J01XX05 J01CA01
2 J01XX05 A10AB01 C03CA01
3 J01AA02 A10AB01 A10AB01
;
run;

I want to count how many individuals that have a diagnosis that starts with 'J01', if they have a diagnosis that starts with 'J01' they will be assigned a 1 in the new variable J01_flag, else will get 0. But if they have the diagnosis 'J01XX05' I want to exclude them from the count in J01_flag.

I want do a similar thing for every diagnosis that starts with 'C03'.

Data want;

id dcode1 dcode2 dcode3 J01_flag C03_flag

1 J01AA02 J01XX05 J01CA01 1 0

2 J01XX05 A10AB01 C03CA01 0 1

3 J01AA02 A10AB01 C03CA01 1 1

Is this possible to do with an array or any other function?

Astounding · Posted 06-25-2020 08:55 PM

If variety is the spice of life ...

data want;
set have;
array code(500) $ dcode1-dcode500;
J01_flag=0;
C03_flag=0;
do i = 1 to 500 until (J01_flag=1 and C03_flag=1);
   if code(i)=: 'J01' and code(i) ne 'J01XX05' then J01_flag=1;
   if code(i)=: 'C03' and code(i) ne 'C03XX05' then C03_flag=1;
end;
drop i;
run;

There are small parts of the logic that you may need to tweak, since the question left them to the imagination. If one observation contains both a diagnosis of J01XX05 and another diagnosis of J01CA01, it seems that that should count as J01_flag=1. But perhaps not ... easy enough to change if needed.

Also, the C03 "do not use" value seems like it should be C03XX05 but perhaps that is not correct as well.

View solution in original post

Jagadishkatam · Posted 06-25-2020 10:04 AM

please try below code

data want;
set have;
array code(3) $ dcode1 dcode2 dcode3;
do i = 1 to 3;
if prxmatch('m/^J01/oi',code(i)) and code(i) ne 'J01XX05' then J01_flag=1;
if prxmatch('m/^C03/oi',code(i)) and code(i) ne 'J01XX05' then C03_flag=1;
end;
drop i;
run;

Thanks,
Jag

AhmedAl_Attar · Posted 06-25-2020 10:19 AM

Here is my approach - avoiding looping

data have(drop=str);
input id$ dcode1 $ dcode2 $ dcode3 $;
array codes (*) $ dcode1-dcode3;
length str $32767;
str=cats(of codes[*]);
if (index(str,'J01XX05') gt 0) then
do;
	J01_flag = 0;
	C03_flag = 0;
end;
else
do;
	J01_flag = count(str,'J01')**0;
	C03_flag = count(str,'C03')**0;
end;
cards;
1 J01AA02 J01XX05 J01CA01
2  J01XX05 A10AB01 C03CA01
3  J01AA02 A10AB01 C03CA01
;
run;

Kurt_Bremser · Posted 06-25-2020 10:32 AM

Your code will produce false positives if the strings J01 or C03 appear within a dcode.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 06-25-2020 10:30 AM

According to your description, ID 1 should have j01_flag as zero, as it has dcode2 as "J01XX05"; anyway, this is my code suggestion:

data have;
input 
id$ dcode1$ dcode2$ dcode3$;
cards;
1 J01AA02 J01XX05 J01CA01
2 J01XX05 A10AB01 C03CA01
3 J01AA02 A10AB01 A10AB01
;

proc transpose data=have out=long (drop=_name_ rename=(col1=dcode));
by id;
var dcode:;
run;

data want (drop=dcode dc);
merge
  long
  long (in=no rename=(dcode=dc) where=(dc="J01XX05"))
;
by id;
retain
  j01_flag
  c03_flag
;
if first.id
then do;
  j01_flag = 0;
  c03_flag = 0;
end;
if index(dcode,'J01') = 1 then j01_flag = 1;
if index(dcode,'C03') = 1 then c03_flag = 1;
if last.id;
if no then j01_flag = 0;
run;

You can merge the result back to dataset have.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Astounding · Posted 06-25-2020 08:55 PM

If variety is the spice of life ...

data want;
set have;
array code(500) $ dcode1-dcode500;
J01_flag=0;
C03_flag=0;
do i = 1 to 500 until (J01_flag=1 and C03_flag=1);
   if code(i)=: 'J01' and code(i) ne 'J01XX05' then J01_flag=1;
   if code(i)=: 'C03' and code(i) ne 'C03XX05' then C03_flag=1;
end;
drop i;
run;

There are small parts of the logic that you may need to tweak, since the question left them to the imagination. If one observation contains both a diagnosis of J01XX05 and another diagnosis of J01CA01, it seems that that should count as J01_flag=1. But perhaps not ... easy enough to change if needed.

Also, the C03 "do not use" value seems like it should be C03XX05 but perhaps that is not correct as well.

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!