Hello,
Like this?
data HAVE;
X1='3310'; X2='29040'; X3='V313'; X4='40590'; output;
X1='3310'; X2='29040'; X3='V313'; X4='40690'; output;
run;
data WANT;
set HAVE;
array X $ X1-X4;
HYPERTENSION=0;
do I=1 to 4;
if 40100<=input(X[I],?? 32.)<=40590 then HYPERTENSION =1;
end;
put _N_= HYPERTENSION=;
run;
I learned from @Reeza in another post that diagnostic codes are best treated as character.
I looked at the ICD-9 codes for hypertension and if I read them right then it looks like they can be 4-character codes as well as 5-character codes.
eg
4010 Malignant essential hypertension
4011 Benign essential hypertension
4019 Unspecified essential hypertension
40200 Malignant hypertensive heart disease without heart failure
40201 Malignant hypertensive heart disease with heart failure
40210 Benign hypertensive heart disease without heart failure
It also looks a simpler common denominator for hypertension codes is that they all begin with the same two characters , '40'
So you could look to @ChrisNZ post here, which also does away with the need to use an array
data WANT;
set HAVE;
HYPERA=prxmatch('/\b40/',catx(' ', of DX1 - DX19 )) > 0; *find a word starting with 40;
run;
You could if you wanted to specify that the first 3 characters had to be '401','402','403',404' or '405' by adding [1-5] in the pattern
data WANT;
set HAVE;
HYPERA=prxmatch('/\b40[1-5]/',catx(' ', of DX1 - DX19 )) > 0; *find a word starting with 40;
run;
If you can safely convert the codes to a number and use a numeric check, try looking into removing the letters with COMPRESS.
To also make it a number, nest it like this:
Num_DX=input(compress(DX,,"kd"),best.);
Here's my untested attempt:
data want; set have;
ARRAY diagn {5} dx1-dx15;
DO i = 1 TO 15;
num_dx = input(compress(diagn{i},,"kd"),best.);
HYPER='0';
IF num_dx >= 40100 and num_dx < 40590 then HYPER='1';
END;
run;
It's not clear if you want to assign HYPER a value of 0/1, vs. the diagnosis code that indicates hypertension. Going with the 0/1 result:
array dx {15};
hyper=0;
do _n_=1 to 15 until (hyper=1);
if ('40100' <=: dx{_n_} <=: '40490') then hyper=1;
* optionally: first_hypertension_code = dx{_n_};
end;
This will make the comparison based on the first five characters of dx1-dx15. If you have longer values in the data, such as 402999, these will also be treated as a match.
There are plenty of alternatives, if you have a file that contains a list of the hypertension diagnosis codes.
Continuing to plagiarise @ChrisNZ code as for the hypertension example
data WANT;
set HAVE;
DIABET=prxmatch('/\b250/',catx(' ', of DX1 - DX19 )) > 0; *find a word starting with 250;
run;
It depends on what might be in the data. For example, could 250 appear, or would there be a decimal point following: 250.
Is 2501 a possible code? Would it indicate Diabetes, or would it indicate some other diagnosis?
Would 250.93 be in the data, or could the decimal point be missing: 25093
It is possible that this would be the right comparison, but it depends on what might be in your data:
if ('250' <=: dx{_n_} <=: '250.93') then diabet=1;
If the data values are inconsistent and diagnoses overlap, a parsing solution (as has been suggested already) would be more accurate.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.