Re: Issue with Determining Frequencies using Two-Dimensional Arrays

laksenov · Posted 01-16-2018 10:55 AM

Hi all,

I'm new to SAS University Edition and I would greatly appreciate your help. I'm having difficulty finding a solution to a problem I have with two dimensional arrays. I am trying to determine how many times particular medications were given in the Emergency Department (ED) and whether they were given in the ED, given at ED discharge, or both.

The dataset I am working with has information regarding up to 8 medications given during a patient visit to the ED (med1-med8). Each medication is identified by multiple 5-digit codes. For instance, the medication tramadol can be coded as 95050, 96041, 00319, or 08329. This is coded as a character variable. Furthermore, variables gpmed1-gpmed8 describe if the corresponding medication in med1-med8 was given in the ED (1), as a prescription at discharge (2), both (3) or blank/NA (-7,-9,8,9). So for instance, if med1 = 95050 and gamed = 2, then Tramadol was given at ED discharge for that particular visit.

This is the SAS code I used to determine frequency for a particular medication, Tramadol in this example:

data example.working;
	set example.working;
	array meds(8) med1-med8;
	array gpmeds(8) gpmed1-gpmed8;
	do i = 1 to 8;
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) = 1 then Tramadol = 1; 
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) = 2 then Tramadol = 2; 
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) = 3 then Tramadol = 3; 
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) in (-7,-9,8,9) then Tramadol = 4; 
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) in ( . ) then Tramadol = 5; 
        end;	
run;

proc freq data=example.working; 
	tables Tramadol;
run;

This gives me the frequencies of Tramadol being given in the ED, at discharge, or both.

The issue that I found (after looking at the dataset manually) is that sometimes a medication is coded with a different 5 digit ID code for the same patient visit (or observation). For example, one instance of the medication (med1) would indicate that was given in the ED (gpmed1 = 1) and another instance of the medication (med5) using a different ID code would indicate it was given at discharge (gpmed5=2). The code above would then register Tramadol as if it was given only at discharge rather than given at both. So, some medications are coded twice with different IDs for the same observation when they really should be coded once with gpmed=3, indicating it was given both in the ED and at discharge.

My question is how could I modify my code so it finds and takes into account the same medications that were coded with different IDs for the same observation and truly indicate if it was given in the ED, at discharge, or both.

Thank you so much in advance for your help.

Reeza · Posted 01-16-2018 11:18 AM

It would probably help if you post some sample data and example output you want to go alone with this code.

@laksenov wrote:

Hi all,

I'm new to SAS University Edition and I would greatly appreciate your help. I'm having difficulty finding a solution to a problem I have with two dimensional arrays. I am trying to determine how many times particular medications were given in the Emergency Department (ED) and whether they were given in the ED, given at ED discharge, or both.

The dataset I am working with has information regarding up to 8 medications given during a patient visit to the ED (med1-med8). Each medication is identified by multiple 5-digit codes. For instance, the medication tramadol can be coded as 95050, 96041, 00319, or 08329. This is coded as a character variable. Furthermore, variables gpmed1-gpmed8 describe if the corresponding medication in med1-med8 was given in the ED (1), as a prescription at discharge (2), both (3) or blank/NA (-7,-9,8,9). So for instance, if med1 = 95050 and gamed = 2, then Tramadol was given at ED discharge for that particular visit.

This is the SAS code I used to determine frequency for a particular medication, Tramadol in this example:
data example.working;
	set example.working;
	array meds(8) med1-med8;
	array gpmeds(8) gpmed1-gpmed8;
	do i = 1 to 8;
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) = 1 then Tramadol = 1; 
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) = 2 then Tramadol = 2; 
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) = 3 then Tramadol = 3; 
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) in (-7,-9,8,9) then Tramadol = 4; 
	if meds(i) in ('95050','96041','00001','03319','08329') 
		and gpmeds(i) in ( . ) then Tramadol = 5; 
        end;	
run;

proc freq data=example.working; 
	tables Tramadol;
run;
This gives me the frequencies of Tramadol being given in the ED, at discharge, or both.

The issue that I found (after looking at the dataset manually) is that sometimes a medication is coded with a different 5 digit ID code for the same patient visit (or observation). For example, one instance of the medication (med1) would indicate that was given in the ED (gpmed1 = 1) and another instance of the medication (med5) using a different ID code would indicate it was given at discharge (gpmed5=2). The code above would then register Tramadol as if it was given only at discharge rather than given at both. So, some medications are coded twice with different IDs for the same observation when they really should be coded once with gpmed=3, indicating it was given both in the ED and at discharge.

My question is how could I modify my code so it finds and takes into account the same medications that were coded with different IDs for the same observation and truly indicate if it was given in the ED, at discharge, or both.

Thank you so much in advance for your help.

laksenov · Posted 01-16-2018 11:42 AM

Of course. Here is an example dataset for the drug Meperidine and the code that goes alone with it.

data example.exampleworking;
	set example.example;
	array meds(8) med1-med8;
	array gpmeds(8) gpmed1-gpmed8;
	do i = 1 to 8;	
	if meds(i) in ('96045','18760','18985','00200','08785','10130') 
		and gpmeds(i) = 1 then Meperidine = 1; 
	if meds(i) in ('96045','18760','18985','00200','08785','10130') 
		and gpmeds(i) = 2 then Meperidine = 2; 
	if meds(i) in ('96045','18760','18985','00200','08785','10130') 
		and gpmeds(i) = 3 then Meperidine = 3; 
	if meds(i) in ('96045','18760','18985','00200','08785','10130') 
		and gpmeds(i) in (-7,-9,8,9) then Meperidine = 4; 
	if meds(i) in ('96045','18760','18985','00200','08785','10130') 
		and gpmeds(i) in ( . ) then Meperidine = 5; 
	end; 
run; 

proc freq data=example.exampleworking; 
	tables Meperidine/nocum nopercent norow; 
run;

There are a total of 110 rows where Meperidine was given. However, of those 110 rows, 2 of them have Meperidine coded twice with a different ID number. With the current code, those two visits indicate that the Meperidine was given at discharge rather than both. So a single visit had 2x Meperidine, one for ED only and the other for ED discharge.

Thank you!

ballardw · Posted 01-16-2018 12:21 PM

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

If we have to convert your text file to a SAS data set we may make choices that result in variables of different types or characteristics. The macro means that we have a subset of your data and since it is text you should be easily able to replace any sensitive information with dummy values.

Note that you should only supply variables actually related to this question.

Instead of multiple if/then statements when the IF is the same you might examine this code for how to use SELECT.

Also you may want to examine more possible results of your Meperidine variable which would perhaps identify your cases with two or more codes:

data example.exampleworking;
	set example.example;
	array meds(8) med1-med8;
	array gpmeds(8) gpmed1-gpmed8;
   array m   (8)      Meperidine1-Meperidine8;
	do i = 1 to 8;	
	if meds(i) in ('96045','18760','18985','00200','08785','10130') then select gpmeds(i);
		When (1)  m(i) = 1; 
		When (2  m(i) = 2; 
		When (3  m(i) = 3; 
		When (-7,-9,8,9)  m(i) = 4; 
		When ( . )  m(i) = 5; 
      otherwise;
	end; 
run;

Or since you don't indicate how to differentiate between ED and ED discharge you may want to provide additional descriptions of your need.

laksenov · Posted 01-16-2018 05:07 PM

Thank you for your response and showing me how to properly post my data in a forum.

From your advice, I created a simplified dataset that can better illustrate what I am having trouble with:

data WORK.SASEXAMPLE;
  infile datalines dsd truncover;
  input OBS:4. MED1:$5. MED2:$5. MED3:$5. GPMED1:$5. GPMED2:$5. GPMED3:$5.;
datalines4;
1,'100','200','10',1,2,3
2,'100','10','10',1,1,3
3,'10','100','10',1,3,2
4,'10','200','10',1,1,1
;;;;

For this example, the medication of interest, MedicationX is coded either as '100' or '200' in MED1-MED3. ('10' codes for other medication I am not interested in). From this, I can tell if the medication was given in the ED, at discharge, or both by the corresponding value of GPMED1-GPMED3, where 1=ED, 2=Discharge (Rx), and 3=Both. So for the second row, the medication '100' in MED1 was given in the ED (GPMED1=1).

What I want to do is find out the frequencies of Medication X given in the ED, at Discharge (Rx) or both. I ran the following code:

*GPMED1-3:
	1=Med given in ED
	2=Med given as prescription at discharge (Rx)
	3=Med biven both in ED and Rx;

data WORK.SASEXAMPLE;
	length MedicationX $ 6;
	set WORK.SASEXAMPLE;
	array meds(3) med1-med3;
	array gpmeds(3) gpmed1-gpmed3;
	do i = 1 to 3;	
	if meds(i) in ('100','200')
		and gpmeds(i) = 1 then MedicationX = 'ED'; 
	if meds(i) in ('100','200')
		and gpmeds(i) = 2 then MedicationX = 'Rx'; 
	if meds(i) in ('100','200')
		and gpmeds(i) = 3 then MedicationX = 'Both'; 
	end; 
run; 

proc freq data=WORK.SASEXAMPLE; 
	tables MedicationX/nocum nopercent norow; 
run;

Which results in the following frequencies:

Both: 1

ED: 2

Rx: 1

The Issue: My code above does not accurately capture what is happening in observation #1, where Medication X is actually given both in ED and at discharge (Rx). The code counts the medication as given at discharge. The true frequencies are:

Both: 2

ED: 2

RX:0

Medication X is identified in MED1 as '100' and in MED2 as '200' and GPMED1 = 1 (ED) and GPMED2 = 2 (Rx). So we can see that Medication X was given both in the ED and at Discharge (Rx), however it is not coded as such and instead coded as given at discharge. This is an issue with the dataset I am working with. It ideally should be identified once in MED1-MED3 and coded as being given for both (=3) in GPMED1-GPMED3

My Question: How can I accurately represent the frequencies of Medication X and when it was given (ED, Rx, or Both) when I have an issue as I described above?

Thank you for your help in advance.

ballardw · Posted 01-17-2018 02:02 PM

@laksenov wrote:

My Question: How can I accurately represent the frequencies of Medication X and when it was given (ED, Rx, or Both) when I have an issue as I described above?

Thank you for your help in advance.

I really hope you do not have very many multiple coded values to consider as one as these approaches get very fragile quickly.

For your example data given this works:

data WORK.SASEXAMPLE2;
   length MedicationX $ 6;
	set WORK.SASEXAMPLE;
	array meds(3) med1-med3;
	array gpmeds(3) gpmed1-gpmed3;
   /* this searches for two different values of gpmed associated with
      drug 100 and 200
   */
   gpmed_100 = whichc('100',of meds(*));
   gpmed_200 = whichc('200',of meds(*));
   if gpmed_100 > 0 and gpmed_200>0 and gpmeds(gpmed_100) ne gpmeds(gpmed_200)
      then MedicationX='Both';
   else 	do i = 1 to 3;	
   	if meds(i) in ('100','200')
   		and gpmeds(i) = 1 then MedicationX = 'ED'; 
   	if meds(i) in ('100','200')
   		and gpmeds(i) = 2 then MedicationX = 'Rx'; 
   	if meds(i) in ('100','200')
   		and gpmeds(i) = 3 then MedicationX = 'Both'; 
	end; 
run;
  
proc freq data=WORK.SASEXAMPLE2; 
	tables MedicationX/nocum nopercent norow; 
run;

Large economy sized hint: when working on testing recoding of variables it can be a very poor idea to use the same data set name for input and output. If you change variables that exist on the first go around subsequent tests may be using incorrect variables. Maybe not an issue with this project yet but habitual use will lead you down a rabbit hole eventually trying to determine "where did that value come from".

The change I made is to find the values of gpmed associated with the meds value. The WHICHC function searches multiple values, in this case the array, and returns the position (1 to n) if found or 0 if not found. There is also a corresponding WHICHN available.

Assuming that the ONLY case where both med 100 and 200 occur and have different gpmed values means one was administered at admission and the other at release then we can assign a "Both" value. This will work if one has gpmed combinations of 1 and 2, 1 and 3, and 2 and 3.

This where my comment about multiple valued. If you actually have another drug, say 300 then you have 3 gpmed values to search for and have to determine which combinations get which coding (3 pairs to compare by the way, 4 drugs =6 pairs to compare IF actual order of appearance in the data isn't important). If you have more possible codes you are getting into some possibly moderately serious combination and permutations to consider.

And if you want a printed table that shows 0 for Rx in the above situation here is one way (others are possible of course)

data work.classes;
   input MedicationX $6.;
datalines;
ED
Rx
Both
;
run;

proc tabulate data=WORK.SASEXAMPLE2
     classdata=work.classes;
     class MedicationX;
     table MedicationX,
           n
           /misstext='0'
     ;
run;

Issue with Determining Frequencies using Two-Dimensional Arrays