Re: Array with a Character Variable (Using a Do-Until Loop)

ginak · Posted 07-07-2016 10:38 PM

Hello,

I have a question. I have 15 character variables, DX1 through DX15, which look at ICD9 medical codes. So different numbers can be possible entries. Possible answers could be, say, 3310, 29040, V313, E234, etc.

For each person, I want to know if they have hypertension. Hypertension is between 401.00 through 405.90, but this is coded as 40100 through 40490. They could have hypertension listed in any of the DX1 through DX15 codes. It doesn't matter if they have, say, 40100 in DX1 and then their DX7 variable is 40490. They'll still have HYPERTENSION = 1.

This is the type of array I would want to use. So suppose for Person #1 I want to determine if they have hypertension. Which could be any of the numbers found between 40100 through 40490.

ARRAY diagn {15} dx1-dx15;
DO i = 1 TO 15 UNTIL (diagn{i} IN ('40100' THROUGH '40490'));
HYPER= diagn{i};
IF HYPER NOT IN ('40100' THROUGH '40490') THEN HYPER= '0';
END;

IF HYPER= '0' THEN HYPERA= 0;
IF HYPERNE '0' THEN HYPERA= 1;

Here's the problem... this is a character variable, so how do I indicate the 40100 through 40490? Do I have to find every single diagnosis code and list it out? This is too difficult.. like 40200, 40201, 40211, 40290, .., 40519 etc. I was thinking to just convert this to a numeric variable but the problem is that the DX variables could also equal say, E5001 or something that starts with an E or a V. Any suggestions?

Thank you :') I have attached the .rtf output file of the proc freq which looks at one of the 15 diagnosis variables, DX1. You can see how it's coded weird. Basically I think the last two digits on the right signify what comes after the decimal point. So the hypertension variable that I'm looking for has 5 digits as well, signifying somethign like "401.00"

I hope this isn't confusing, but please let me know if it is and I can clarify! Thanks 🙂

ChrisNZ · Posted 07-07-2016 11:15 PM

Like this?


data HAVE;
X1='3310'; X2='29040'; X3='V313'; X4='40590';  output;
X1='3310'; X2='29040'; X3='V313'; X4='40690';  output;
run;
data WANT;
set HAVE;
array  X $ X1-X4;
HYPERTENSION=0;
do I=1 to 4;
if 40100<=input(X[I],?? 32.)<=40590 then HYPERTENSION =1;
end;
put _N_= HYPERTENSION=;
run;

High-Performance SAS Coding - Third Edition

JohnHoughton · Posted 07-08-2016 08:55 AM

I learned from @Reeza in another post that diagnostic codes are best treated as character.

I looked at the ICD-9 codes for hypertension and if I read them right then it looks like they can be 4-character codes as well as 5-character codes.

eg

4010  Malignant essential hypertension
4011  Benign essential hypertension
4019  Unspecified essential hypertension
40200 Malignant hypertensive heart disease without heart failure
40201 Malignant hypertensive heart disease with heart failure
40210 Benign hypertensive heart disease without heart failure

It also looks a simpler common denominator for hypertension codes is that they all begin with the same two characters , '40'

So you could look to @ChrisNZ post here, which also does away with the need to use an array

data WANT;
  set HAVE;
  HYPERA=prxmatch('/\b40/',catx(' ', of DX1 - DX19 )) > 0; *find a word starting with 40;
run;

You could if you wanted to specify that the first 3 characters had to be '401','402','403',404' or '405' by adding [1-5] in the pattern

data WANT;
  set HAVE;
  HYPERA=prxmatch('/\b40[1-5]/',catx(' ', of DX1 - DX19 )) > 0; *find a word starting with 40;
run;

ginak · Posted 07-29-2016 03:33 PM

Thank you! I think so; but what would go in place of the "??" sorry, I didn't understand haha.

JBerry · Posted 07-07-2016 11:17 PM

If you can safely convert the codes to a number and use a numeric check, try looking into removing the letters with COMPRESS.

To also make it a number, nest it like this:

Num_DX=input(compress(DX,,"kd"),best.);

Here's my untested attempt:

data want; set have;

ARRAY diagn {5} dx1-dx15; 
	DO i = 1 TO 15;
		num_dx = input(compress(diagn{i},,"kd"),best.);
		HYPER='0';
		IF num_dx >= 40100 and num_dx < 40590 then HYPER='1';
	END;

run;

Astounding · Posted 07-08-2016 08:55 AM

It's not clear if you want to assign HYPER a value of 0/1, vs. the diagnosis code that indicates hypertension. Going with the 0/1 result:

array dx {15};

hyper=0;

do _n_=1 to 15 until (hyper=1);

if ('40100' <=: dx{_n_} <=: '40490') then hyper=1;

* optionally: first_hypertension_code = dx{_n_};

end;

This will make the comparison based on the first five characters of dx1-dx15. If you have longer values in the data, such as 402999, these will also be treated as a match.

There are plenty of alternatives, if you have a file that contains a list of the hypertension diagnosis codes.

ginak · Posted 07-29-2016 03:31 PM

Thank you very much! Yes this would work. The only thing that I wonder is, say, for Diabetes, you can have either '250' or '250.0' through '250.93' In that case, how would you specify? For this line:

if ('2500' <=: dx{_n_} <=: '25093') then diabet=1;

JohnHoughton · Posted 07-29-2016 04:10 PM

Continuing to plagiarise @ChrisNZ code as for the hypertension example

data WANT;
  set HAVE;
  DIABET=prxmatch('/\b250/',catx(' ', of DX1 - DX19 )) > 0; *find a word starting with 250;
run;

Astounding · Posted 08-01-2016 09:38 AM

It depends on what might be in the data. For example, could 250 appear, or would there be a decimal point following: 250.

Is 2501 a possible code? Would it indicate Diabetes, or would it indicate some other diagnosis?

Would 250.93 be in the data, or could the decimal point be missing: 25093

It is possible that this would be the right comparison, but it depends on what might be in your data:

if ('250' <=: dx{_n_} <=: '250.93') then diabet=1;

If the data values are inconsistent and diagnoses overlap, a parsing solution (as has been suggested already) would be more accurate.

SAS Innovate 2025: Register Today!

SAS Training: Just a Click Away