DATA Step, Macro, Functions and more

using array and substr

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 110
Accepted Solution

using array and substr

data have;
   input patientid    dx1 dx2 dx3 dx4   ;
datalines;1               250 223 224 444    5/5/2009 
1               555 666 120 2501    5/6/2008 
2               120  666  .  .     1/2/2007 
2               120  666  .  .     1/1/2007 
3               250   .   .  .     2/2/2004 
3               240 2502   .  .     3/3/2004 
3               2503  .    .  .     1/1/2004 ;
run;

Hi, I have the following database;  I want to create the following code: if substr for any of the four variables (dx1-dx4) starts with 250** (first three codes 250) then db=1, else db=0; output would be:

1               250 223 224 444    5/5/2009  1
1               555 666 120 2501    5/6/2008 1
2               120  666  .  .     1/2/2007  0
2               120  666  .  .     1/1/2007  0
3               250   .   .  .     2/2/2004  1
3               240 2502   .  .     3/3/2004  1
3               2503  .    .  .     1/1/2004  1

Accepted Solutions
Solution
‎07-07-2016 12:45 PM
Trusted Advisor
Posts: 1,204

Re: using array and substr

Hi,

 

SUBSTRN can be helpful in this situation:

 

data have;
input patientid dx1 dx2 dx3 dx4 date :mmddyy10.;
format date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 250 2502 . . 3/3/2004
3 2503 . . . 1/1/2004
;

 

data want(drop=i);
set have;
array x{*} dx:;
db=0;
do i=1 to dim(x);
if SUBSTRN(x{i},1,3) = '250' then do;
     db=1;
     leave;
     end;
end;
run;

View solution in original post


All Replies
Contributor
Posts: 39

Re: using array and substr

[ Edited ]

Rather than using character functions , can do it with a a numeric expression

 

data want;
set have;
array dx dx:;
db=0;
do over dx;
if int(dx/(10**(int(log10(dx)-2 ))))=250 then do; db=1;leave;end;
end;
run;

 

Alternatively , use character funcions

 

if substr(left(put(dx,best32.)),1,3)='250' then do; db=1;leave;end;

Frequent Contributor
Posts: 110

Re: using array and substr

Thank you for posting the code. Using the first one, let's assume I want to capture the first 4 numbers instead of the first 3 from the left  (2501 instead of only 250) in that case do I need to change anything in your code?

Thank you

Super User
Posts: 9,681

Re: using array and substr

It would be a lot easy , if you read them as Character.

 

data have;
   input patientid    (dx1 dx2 dx3 dx4)  ($);
datalines;
1               250 223 224 444    5/5/2009 
1               555 666 120 2501    5/6/2008 
2               120  666  .  .     1/2/2007 
2               120  666  .  .     1/1/2007 
3               250   .   .  .     2/2/2004 
3               240 2502   .  .     3/3/2004 
3               2503  .    .  .     1/1/2004 
;
run;

data want;
 set have;
 array x{*} $ dx:;
 flag=0;
 do i=1 to dim(x);
  if x{i} =: '250' /*Change it 2501*/ then do;flag=1;leave;end;
 end;
 drop i;
run;
PROC Star
Posts: 1,561

Re: using array and substr

For legibility and speed I'd write this as

 


data HAVE;
   input PATIENTID (DX1 - DX4)  ($);
datalines;
1               250 223 224 444    5/5/2009 
1               555 666 120 2501    5/6/2008 
2               120  666  .  .     1/2/2007 
2               120  666  .  .     1/1/2007 
3               250   .   .  .     2/2/2004 
3               240 2502   .  .     3/3/2004 
3               2503  .    .  .     1/1/2004 
;
run;

data WANT;
 set HAVE;
 FLAG=( DX1=:'250' | DX2=:'250' | DX3=:'250' | DX4=:'250' ); 
run;

 

If the variables have to be numeric then

 



data HAVE;
   input PATIENTID DX1 - DX4  ;
datalines;
1               250 223 224 444    5/5/2009 
1               555 666 120 2501    5/6/2008 
2               120  666  .  .     1/2/2007 
2               120  666  .  .     1/1/2007 
3               250   .   .  .     2/2/2004 
3               240 2502   .  .     3/3/2004 
3               2503  .    .  .     1/1/2004 
;
run;

data WANT;
 set HAVE;
 FLAG=( cat(DX1)=:'250' | cat(DX2)=:'250' | cat(DX3)=:'250' | cat(DX4)=:'250' ); 
run;
PROC Star
Posts: 1,561

Re: using array and substr

[ Edited ]

This will also avoid writing a loop and deals with both numbers and strings.

 


data WANT;
  set HAVE;
  FLAG=prxmatch('/\b250/',catx(' ', of DX1 - DX4 )) > 0; *find a word starting with 250;
run;

 

Contributor
Posts: 39

Re: using array and substr

Hi @lillymaginta.

With regards to your question about adapting the numeric expression to read '2501' , the answer is to use one of the the character options. The numeric solution was just interesting to me for another problem , but now I've learned about diagnostic codes from this thread I can see it wasn't suitable here.

 

 

Super User
Posts: 17,828

Re: using array and substr

I'm general, I highly recommend storing diagnosis codes as characters rather than numbers. You'll never do math with them. 

Additionally most coding systems have letters included, ICD9 and ICD10. 

Solution
‎07-07-2016 12:45 PM
Trusted Advisor
Posts: 1,204

Re: using array and substr

Hi,

 

SUBSTRN can be helpful in this situation:

 

data have;
input patientid dx1 dx2 dx3 dx4 date :mmddyy10.;
format date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 250 2502 . . 3/3/2004
3 2503 . . . 1/1/2004
;

 

data want(drop=i);
set have;
array x{*} dx:;
db=0;
do i=1 to dim(x);
if SUBSTRN(x{i},1,3) = '250' then do;
     db=1;
     leave;
     end;
end;
run;

Frequent Contributor
Posts: 110

Re: using array and substr

Thank you all for the replies! all codes are helpful.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 706 views
  • 12 likes
  • 6 in conversation