data have;
input patientid dx1 dx2 dx3 dx4 ;
datalines;1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 240 2502 . . 3/3/2004
3 2503 . . . 1/1/2004 ;
run;
Hi, I have the following database; I want to create the following code: if substr for any of the four variables (dx1-dx4) starts with 250** (first three codes 250) then db=1, else db=0; output would be:
1 250 223 224 444 5/5/2009 1
1 555 666 120 2501 5/6/2008 1
2 120 666 . . 1/2/2007 0
2 120 666 . . 1/1/2007 0
3 250 . . . 2/2/2004 1
3 240 2502 . . 3/3/2004 1
3 2503 . . . 1/1/2004 1
Hi,
SUBSTRN can be helpful in this situation:
data have;
input patientid dx1 dx2 dx3 dx4 date :mmddyy10.;
format date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 250 2502 . . 3/3/2004
3 2503 . . . 1/1/2004
;
data want(drop=i);
set have;
array x{*} dx:;
db=0;
do i=1 to dim(x);
if SUBSTRN(x{i},1,3) = '250' then do;
db=1;
leave;
end;
end;
run;
Rather than using character functions , can do it with a a numeric expression
data want;
set have;
array dx dx:;
db=0;
do over dx;
if int(dx/(10**(int(log10(dx)-2 ))))=250 then do; db=1;leave;end;
end;
run;
Alternatively , use character funcions
if substr(left(put(dx,best32.)),1,3)='250' then do; db=1;leave;end;
Thank you for posting the code. Using the first one, let's assume I want to capture the first 4 numbers instead of the first 3 from the left (2501 instead of only 250) in that case do I need to change anything in your code?
Thank you
It would be a lot easy , if you read them as Character.
data have;
input patientid (dx1 dx2 dx3 dx4) ($);
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 240 2502 . . 3/3/2004
3 2503 . . . 1/1/2004
;
run;
data want;
set have;
array x{*} $ dx:;
flag=0;
do i=1 to dim(x);
if x{i} =: '250' /*Change it 2501*/ then do;flag=1;leave;end;
end;
drop i;
run;
For legibility and speed I'd write this as
data HAVE;
input PATIENTID (DX1 - DX4) ($);
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 240 2502 . . 3/3/2004
3 2503 . . . 1/1/2004
;
run;
data WANT;
set HAVE;
FLAG=( DX1=:'250' | DX2=:'250' | DX3=:'250' | DX4=:'250' );
run;
If the variables have to be numeric then
data HAVE;
input PATIENTID DX1 - DX4 ;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 240 2502 . . 3/3/2004
3 2503 . . . 1/1/2004
;
run;
data WANT;
set HAVE;
FLAG=( cat(DX1)=:'250' | cat(DX2)=:'250' | cat(DX3)=:'250' | cat(DX4)=:'250' );
run;
This will also avoid writing a loop and deals with both numbers and strings.
data WANT;
set HAVE;
FLAG=prxmatch('/\b250/',catx(' ', of DX1 - DX4 )) > 0; *find a word starting with 250;
run;
Hi @lillymaginta.
With regards to your question about adapting the numeric expression to read '2501' , the answer is to use one of the the character options. The numeric solution was just interesting to me for another problem , but now I've learned about diagnostic codes from this thread I can see it wasn't suitable here.
I'm general, I highly recommend storing diagnosis codes as characters rather than numbers. You'll never do math with them.
Additionally most coding systems have letters included, ICD9 and ICD10.
Hi,
SUBSTRN can be helpful in this situation:
data have;
input patientid dx1 dx2 dx3 dx4 date :mmddyy10.;
format date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 2501 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 250 2502 . . 3/3/2004
3 2503 . . . 1/1/2004
;
data want(drop=i);
set have;
array x{*} dx:;
db=0;
do i=1 to dim(x);
if SUBSTRN(x{i},1,3) = '250' then do;
db=1;
leave;
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.