BookmarkSubscribeRSS Feed
ginak
Quartz | Level 8

Hello,

 
I have a question. I have 15 character variables, DX1 through DX15, which look at ICD9 medical codes. So different numbers can be possible entries. Possible answers could be, say, 3310, 29040, V313, E234, etc. 
 
For each person, I want to know if they have hypertension. Hypertension is between 401.00 through 405.90, but this is coded as 40100 through 40490. They could have hypertension listed in any of the DX1 through DX15 codes. It doesn't matter if they have, say, 40100 in DX1 and then their DX7 variable is 40490. They'll still have HYPERTENSION = 1.
 
This is the type of array I would want to use. So suppose for Person #1 I want to determine if they have hypertension. Which could be any of the numbers found between 40100 through 40490. 
 
ARRAY diagn {15} dx1-dx15;
DO i = 1 TO 15 UNTIL (diagn{i} IN ('40100' THROUGH '40490'));
HYPER= diagn{i};
IF HYPER NOT IN ('40100' THROUGH '40490') THEN HYPER= '0';
END;
 
IF HYPER= '0' THEN HYPERA= 0;
IF HYPERNE '0' THEN HYPERA= 1;
 
Here's the problem... this is a character variable, so how do I indicate the 40100 through 40490? Do I have to find every single diagnosis code and list it out? This is too difficult.. like 40200, 40201, 40211, 40290, .., 40519 etc. I was thinking to just convert this to a numeric variable but the problem is that the DX variables could also equal say, E5001 or something that starts with an E or a V. Any suggestions?
 
Thank you :') I have attached the .rtf output file of the proc freq which looks at one of the 15 diagnosis variables, DX1. You can see how it's coded weird. Basically I think the last two digits on the right signify what comes after the decimal point. So the hypertension variable that I'm looking for has 5 digits as well, signifying somethign like "401.00"
 
I hope this isn't confusing, but please let me know if it is and I can clarify! Thanks 🙂
 
8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

Like this?

 

 

 

 


data HAVE;
X1='3310'; X2='29040'; X3='V313'; X4='40590';  output;
X1='3310'; X2='29040'; X3='V313'; X4='40690';  output;
run;
data WANT;
set HAVE;
array  X $ X1-X4;
HYPERTENSION=0;
do I=1 to 4;
if 40100<=input(X[I],?? 32.)<=40590 then HYPERTENSION =1;
end;
put _N_= HYPERTENSION=;
run;
JohnHoughton
Quartz | Level 8

I learned from @Reeza in another post that diagnostic codes are best treated as character.

I looked at the ICD-9 codes for hypertension and if I read them right then it looks like they can be 4-character codes as well as 5-character codes.

eg 

4010  Malignant essential hypertension
4011  Benign essential hypertension
4019  Unspecified essential hypertension
40200 Malignant hypertensive heart disease without heart failure
40201 Malignant hypertensive heart disease with heart failure
40210 Benign hypertensive heart disease without heart failure

It also looks a simpler common denominator for hypertension codes is that they all begin with the same two characters , '40'

 

So you could look to @ChrisNZ post here, which also does away with the need to use an array

 

data WANT;
  set HAVE;
  HYPERA=prxmatch('/\b40/',catx(' ', of DX1 - DX19 )) > 0; *find a word starting with 40;
run;

 You could if you wanted to specify that the first 3 characters had to be '401','402','403',404' or '405' by adding [1-5] in the pattern

data WANT;
  set HAVE;
  HYPERA=prxmatch('/\b40[1-5]/',catx(' ', of DX1 - DX19 )) > 0; *find a word starting with 40;
run;
ginak
Quartz | Level 8
Thank you! I think so; but what would go in place of the "??" sorry, I didn't understand haha.
JBerry
Quartz | Level 8

If you can safely convert the codes to a number and use a numeric check, try looking into removing the letters with COMPRESS.

 

To also make it a number, nest it like this:

 

Num_DX=input(compress(DX,,"kd"),best.);

 

Here's my untested attempt:

data want; set have;

ARRAY diagn {5} dx1-dx15; 
	DO i = 1 TO 15;
		num_dx = input(compress(diagn{i},,"kd"),best.);
		HYPER='0';
		IF num_dx >= 40100 and num_dx < 40590 then HYPER='1';
	END;

run;
Astounding
PROC Star

It's not clear if you want to assign HYPER a value of 0/1, vs. the diagnosis code that indicates hypertension.  Going with the 0/1 result:

 

array dx {15};

hyper=0;

do _n_=1 to 15 until (hyper=1);

   if ('40100' <=: dx{_n_} <=: '40490') then hyper=1;

  * optionally:  first_hypertension_code = dx{_n_};

end;

 

This will make the comparison based on the first five characters of dx1-dx15.  If you have longer values in the data, such as 402999, these will also be treated as a match.

 

There are plenty of alternatives, if you have a file that contains a list of the hypertension diagnosis codes.

ginak
Quartz | Level 8
Thank you very much! Yes this would work. The only thing that I wonder is, say, for Diabetes, you can have either '250' or '250.0' through '250.93' In that case, how would you specify? For this line:

if ('2500' <=: dx{_n_} <=: '25093') then diabet=1;
JohnHoughton
Quartz | Level 8

Continuing to plagiarise @ChrisNZ code as for the hypertension example

 

data WANT;
  set HAVE;
  DIABET=prxmatch('/\b250/',catx(' ', of DX1 - DX19 )) > 0; *find a word starting with 250;
run;
Astounding
PROC Star

It depends on what might be in the data.  For example, could 250 appear, or would there be a decimal point following:  250.

 

Is 2501 a possible code?  Would it indicate Diabetes, or would it indicate some other diagnosis?

 

Would 250.93 be in the data, or could the decimal point be missing:  25093

 

It is possible that this would be the right comparison, but it depends on what might be in your data:

 

if ('250' <=: dx{_n_} <=: '250.93') then diabet=1;

 

If the data values are inconsistent and diagnoses overlap, a parsing solution (as has been suggested already) would be more accurate.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2620 views
  • 2 likes
  • 5 in conversation