BookmarkSubscribeRSS Feed
Wolverine
Pyrite | Level 9

I'm working with medical claims data.  For each healthcare encounter, up to 25 different diagnosis codes can be present.  There are about 30 different health conditions that I need to flag where ever they show up within the data (it doesn't matter which of the 25 diagnosis variables it matches).  Each health condition has several (usually 3-7) different diagnosis codes.  The list of health condition codes were given to me in an Excel file with a separate tab for each health condition of interest.  I then imported each of those tabs as a separate SAS file.

 

In the example file below, I only added 5 diagnosis variables and 2 health conditions for the sake of clarity.  The initial data file doesn't have the flag variables, and the output file should have the flag variables.

 

HEALTHCARE_ENCOUNTER_ID icd_dx1 icd_dx2 icd_dx3 icd_dx4 icd_dx5 CANCER_FLAG PREGNANCY_FLAG
12345 9993 2255 3366 4477 5588 1 0
23456 2358 5899 8881 7777 8850 0 1
34567 8546 5468 4811 8883 9991 1 1
45678 8326 . . . . 0 0

 

CODE DESCRIPTION
9991 Cervical cancer
9992 Pelvic cancer
9993 Breast cancer

 

CODE DESCRIPTION
8881 Normal pregnancy
8882 Atopic pregnancy
8883 Preclampsia

 

I haven't gotten very far with my syntax.  What I have so far attempts to compare the range of diagnosis codes against the list of codes for each condition.  This is within a macro (where &sheet. represents the condition name), so it will create a separate file for each condition, and then I will have to merge all the files together (along with all healthcare episodes that didn't match on any of the conditions).  It's clunky and I'm not sure if a range will work in a Proc SQL step:

 

PROC SQL;
Create table temp.flag_&sheet.
as Select *
From temp.healthcare_encounters
WHERE icd_dx1-25 in (select HEDIS_DX_code from temp.Hyst_Codes_&sheet.);
QUIT;

2 REPLIES 2
Wolverine
Pyrite | Level 9

In the syntax above, "HEDIS_DX_code" should have been changed to "Code".

ballardw
Super User

Hint: Provide some example data in the form of a data step that behaves like yours as far as mixes of values and all the rules about what is kept. The data step so we don't have to guess about variable types.

Then work through that (should be a small enough set to do manually) and show use the desired result.

 

Note: multiple similar value like your DX variables and applying similar rules to each variable typically points to an ARRAY based solution which means Data step as SQL doesn't provide any short cut ways to handle list of variables.

 

Turning the Excel lists into Formats would be one way approach something like this.

 

data have;
  input HEALTHCARE_ENCOUNTER_ID $	icd_dx1 	icd_dx2 	icd_dx3 	icd_dx4 	icd_dx5;
datalines;
12345 	9993 	2255 	3366 	4477 	5588 
23456 	2358 	5899 	8881 	7777 	8850 
34567 	8546 	5468 	4811 	8883 	9991 
45678 	8326 	. 	. 	. 	. 	0 	0
;

proc format library=work;
   value iscancer
   9991, 9992, 9993='Is Cancer'
   other= 'Not Cancer'
   ;
   value ispregnant
   8881, 8882,8883 = 'Is Pregnant'
   other= 'Not pregnant'
   ;
run;

data want;
   set have;
   array dx (*) Icd_dx: ;
   do i=1 to dim(dx);
      IsCancer   = max(IsCancer, put(dx[i],iscancer.)='Is Cancer');
      IsPregnant = max(IsPregnant, put(dx[i],IsPregnant.)='Is Pregnant');
   end;
   drop i;
run;

Datasets can create formats if the list of values and/or number of lists is "large" and don't want to type out all the formats.

The biggest trick involved once you have the formats is the Max() statement. SAS will return 1 for true or 0 for false for the "put(variable,format.)="some text") result. The max means that largest value created gets kept as the variables get parsed.

Note that the only thing need to process more variables is to have them on the ARRAY statement. So if you have a hundred variables named with the Icd_dx convention they all get processed with the Icd_dx: list. All variable starting with Icd_dx would be included. Or if you want a subset of them Icd_dx1 - Icd_dx30 for example. These variable lists coupled with the Array are why I would not attempt to use SQL.

 

If you place the format into a permanent library, possibly where you are working with this data and that that library to the FMTSEARCH path then the formats are permanent and you don't need to run the code every time to use them.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1365 views
  • 0 likes
  • 2 in conversation