BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lillymaginta
Lapis Lazuli | Level 10
data have;
   input patientid    dx1 dx2 dx3 dx4   ;
datalines;1               250 223 224 444    5/5/2009 
1               555 666 120 2501    5/6/2008 
2               120  666  .  .     1/2/2007 
2               120  666  .  .     1/1/2007 
3               250   .   .  .     2/2/2004 
3               240 2502   .  .     3/3/2004 
3               2503  .    .  .     1/1/2004 ;
run;

Hi, I have the following database;  I want to create the following code: if substr for any of the four variables (dx1-dx4) starts with 250** (first three codes 250) then db=1, else db=0; output would be:

1               250 223 224 444    5/5/2009  1
1               555 666 120 2501    5/6/2008 1
2               120  666  .  .     1/2/2007  0
2               120  666  .  .     1/1/2007  0
3               250   .   .  .     2/2/2004  1
3               240 2502   .  .     3/3/2004  1
3               2503  .    .  .     1/1/2004  1
1 ACCEPTED SOLUTION

Accepted Solutions
stat_sas
Ammonite | Level 13

Hi,

 

SUBSTRN can be helpful in this situation:

 

data have;
input patientid dx1 dx2 dx3 dx4 date :mmddyy10.;
format date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 250 2502 . . 3/3/2004
3 2503 . . . 1/1/2004
;

 

data want(drop=i);
set have;
array x{*} dx:;
db=0;
do i=1 to dim(x);
if SUBSTRN(x{i},1,3) = '250' then do;
     db=1;
     leave;
     end;
end;
run;

View solution in original post

9 REPLIES 9
JohnHoughton
Quartz | Level 8

Rather than using character functions , can do it with a a numeric expression

 

data want;
set have;
array dx dx:;
db=0;
do over dx;
if int(dx/(10**(int(log10(dx)-2 ))))=250 then do; db=1;leave;end;
end;
run;

 

Alternatively , use character funcions

 

if substr(left(put(dx,best32.)),1,3)='250' then do; db=1;leave;end;

lillymaginta
Lapis Lazuli | Level 10

Thank you for posting the code. Using the first one, let's assume I want to capture the first 4 numbers instead of the first 3 from the left  (2501 instead of only 250) in that case do I need to change anything in your code?

Thank you

Ksharp
Super User

It would be a lot easy , if you read them as Character.

 

data have;
   input patientid    (dx1 dx2 dx3 dx4)  ($);
datalines;
1               250 223 224 444    5/5/2009 
1               555 666 120 2501    5/6/2008 
2               120  666  .  .     1/2/2007 
2               120  666  .  .     1/1/2007 
3               250   .   .  .     2/2/2004 
3               240 2502   .  .     3/3/2004 
3               2503  .    .  .     1/1/2004 
;
run;

data want;
 set have;
 array x{*} $ dx:;
 flag=0;
 do i=1 to dim(x);
  if x{i} =: '250' /*Change it 2501*/ then do;flag=1;leave;end;
 end;
 drop i;
run;
ChrisNZ
Tourmaline | Level 20

For legibility and speed I'd write this as

 


data HAVE;
   input PATIENTID (DX1 - DX4)  ($);
datalines;
1               250 223 224 444    5/5/2009 
1               555 666 120 2501    5/6/2008 
2               120  666  .  .     1/2/2007 
2               120  666  .  .     1/1/2007 
3               250   .   .  .     2/2/2004 
3               240 2502   .  .     3/3/2004 
3               2503  .    .  .     1/1/2004 
;
run;

data WANT;
 set HAVE;
 FLAG=( DX1=:'250' | DX2=:'250' | DX3=:'250' | DX4=:'250' ); 
run;

 

If the variables have to be numeric then

 



data HAVE;
   input PATIENTID DX1 - DX4  ;
datalines;
1               250 223 224 444    5/5/2009 
1               555 666 120 2501    5/6/2008 
2               120  666  .  .     1/2/2007 
2               120  666  .  .     1/1/2007 
3               250   .   .  .     2/2/2004 
3               240 2502   .  .     3/3/2004 
3               2503  .    .  .     1/1/2004 
;
run;

data WANT;
 set HAVE;
 FLAG=( cat(DX1)=:'250' | cat(DX2)=:'250' | cat(DX3)=:'250' | cat(DX4)=:'250' ); 
run;
ChrisNZ
Tourmaline | Level 20

This will also avoid writing a loop and deals with both numbers and strings.

 


data WANT;
  set HAVE;
  FLAG=prxmatch('/\b250/',catx(' ', of DX1 - DX4 )) > 0; *find a word starting with 250;
run;

 

JohnHoughton
Quartz | Level 8

Hi @lillymaginta.

With regards to your question about adapting the numeric expression to read '2501' , the answer is to use one of the the character options. The numeric solution was just interesting to me for another problem , but now I've learned about diagnostic codes from this thread I can see it wasn't suitable here.

 

 

Reeza
Super User

I'm general, I highly recommend storing diagnosis codes as characters rather than numbers. You'll never do math with them. 

Additionally most coding systems have letters included, ICD9 and ICD10. 

stat_sas
Ammonite | Level 13

Hi,

 

SUBSTRN can be helpful in this situation:

 

data have;
input patientid dx1 dx2 dx3 dx4 date :mmddyy10.;
format date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 250 2502 . . 3/3/2004
3 2503 . . . 1/1/2004
;

 

data want(drop=i);
set have;
array x{*} dx:;
db=0;
do i=1 to dim(x);
if SUBSTRN(x{i},1,3) = '250' then do;
     db=1;
     leave;
     end;
end;
run;

lillymaginta
Lapis Lazuli | Level 10
Thank you all for the replies! all codes are helpful.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 3655 views
  • 12 likes
  • 6 in conversation