BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Chris_LK_87
Quartz | Level 8

Hello!

 

At the moment I am working with a very wide dataset, that contains more than 500 columns (not exactly following Maxim 19).

 

In the dataset each row represents an individual and all their healthdata.  dcode1, dcode2...dcode500. 

 

Here is my dataset  (simplified)

 

data have;
input
id$ dcode1$ dcode2$ dcode3$;
cards;
1 J01AA02 J01XX05 J01CA01
2  J01XX05 A10AB01 C03CA01
3  J01AA02 A10AB01 A10AB01
;
run;

 

 

I want to count how many individuals that have a diagnosis that starts with 'J01', if they have a diagnosis that starts with 'J01' they will be assigned a 1 in the new variable J01_flag, else will get 0. But if they have the diagnosis 'J01XX05' I want to exclude them from the count in J01_flag.

I want do a similar thing for every diagnosis that starts with 'C03'.

 

Data want; 

id dcode1 dcode2  dcode3  J01_flag C03_flag

1 J01AA02 J01XX05 J01CA01  1  0  

2 J01XX05 A10AB01 C03CA01 0 1

3 J01AA02 A10AB01 C03CA01 1 1

 

 

Is this possible to do with an array or any other function?

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

If variety is the spice of life ...

data want;
set have;
array code(500) $ dcode1-dcode500;
J01_flag=0;
C03_flag=0;
do i = 1 to 500 until (J01_flag=1 and C03_flag=1);
   if code(i)=: 'J01' and code(i) ne 'J01XX05' then J01_flag=1;
   if code(i)=: 'C03' and code(i) ne 'C03XX05' then C03_flag=1;
end;
drop i;
run;

There are small parts of the logic that you may need to tweak, since the question left them to the imagination.  If one observation contains both a diagnosis of J01XX05 and another diagnosis of J01CA01, it seems that that should count as J01_flag=1.  But perhaps not ... easy enough to change if needed.

 

Also, the C03 "do not use" value seems like it should be C03XX05 but perhaps that is not correct as well.

View solution in original post

5 REPLIES 5
Jagadishkatam
Amethyst | Level 16

please try below code

 

data want;
set have;
array code(3) $ dcode1 dcode2 dcode3;
do i = 1 to 3;
if prxmatch('m/^J01/oi',code(i)) and code(i) ne 'J01XX05' then J01_flag=1;
if prxmatch('m/^C03/oi',code(i)) and code(i) ne 'J01XX05' then C03_flag=1;
end;
drop i;
run;
Thanks,
Jag
AhmedAl_Attar
Rhodochrosite | Level 12

Here is my approach - avoiding looping

 

data have(drop=str);
input id$ dcode1 $ dcode2 $ dcode3 $;
array codes (*) $ dcode1-dcode3;
length str $32767;
str=cats(of codes[*]);
if (index(str,'J01XX05') gt 0) then
do;
	J01_flag = 0;
	C03_flag = 0;
end;
else
do;
	J01_flag = count(str,'J01')**0;
	C03_flag = count(str,'C03')**0;
end;
cards;
1 J01AA02 J01XX05 J01CA01
2  J01XX05 A10AB01 C03CA01
3  J01AA02 A10AB01 C03CA01
;
run;
Kurt_Bremser
Super User

According to your description, ID 1 should have j01_flag as zero, as it has dcode2 as "J01XX05"; anyway, this is my code suggestion:

data have;
input 
id$ dcode1$ dcode2$ dcode3$;
cards;
1 J01AA02 J01XX05 J01CA01
2 J01XX05 A10AB01 C03CA01
3 J01AA02 A10AB01 A10AB01
;

proc transpose data=have out=long (drop=_name_ rename=(col1=dcode));
by id;
var dcode:;
run;

data want (drop=dcode dc);
merge
  long
  long (in=no rename=(dcode=dc) where=(dc="J01XX05"))
;
by id;
retain
  j01_flag
  c03_flag
;
if first.id
then do;
  j01_flag = 0;
  c03_flag = 0;
end;
if index(dcode,'J01') = 1 then j01_flag = 1;
if index(dcode,'C03') = 1 then c03_flag = 1;
if last.id;
if no then j01_flag = 0;
run;

You can merge the result back to dataset have.

Astounding
PROC Star

If variety is the spice of life ...

data want;
set have;
array code(500) $ dcode1-dcode500;
J01_flag=0;
C03_flag=0;
do i = 1 to 500 until (J01_flag=1 and C03_flag=1);
   if code(i)=: 'J01' and code(i) ne 'J01XX05' then J01_flag=1;
   if code(i)=: 'C03' and code(i) ne 'C03XX05' then C03_flag=1;
end;
drop i;
run;

There are small parts of the logic that you may need to tweak, since the question left them to the imagination.  If one observation contains both a diagnosis of J01XX05 and another diagnosis of J01CA01, it seems that that should count as J01_flag=1.  But perhaps not ... easy enough to change if needed.

 

Also, the C03 "do not use" value seems like it should be C03XX05 but perhaps that is not correct as well.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 534 views
  • 4 likes
  • 5 in conversation