Solved: Re: How to select non-prescribed drug use (insurance claims data), by ...

Student77 · Posted 01-31-2022 02:24 AM

Hello,

I am stuck trying to write the correct code for a project where my sample is people who have tested positive for both drug_A and drug_B. (So all testA=1 and all testB=1 in the dataset). In a perfect world, this means that all patients should also have a prescription for both drug A and drug B. However there's bound to be some misuse of such drugs, meaning among these people positive-testers, some may not have had a corresponding prescription for the correct window of time.

In these insurance claims data, important variables include:

fst_date (date of the drug test)

fill_date (if a prescription was filled)

indicator for drugA

indicator for drugB,

and my calculated date, "end_date". this is the day an Rx was filled + (days_supply) - 1

Drug A and Drug B may remain in the body for a number of days, so let's say a person took their last prescribed pill, then tested positive a week later, but no longer have an active Rx--it would appear that they're misusing. For that reason, I included a 7 and 30-day buffer for the time each drug may stay in the system in the code below.

  if drugA=1 then do;
	if (end_dt + 7) >= fst_dt > fill_dt then a_mis=0; *no drugA misuse;
                        if (end_dt + 7) < fst_dt then a_mis=1; *drugA misuse, because we expect it to be out of the system by then;
     end;

	if drugB=1 then do; 
     if (end_dt + 30) >= fst_dt > fill_dt then b_mis=0; * no drugB misuse;
                        if (end_dt + 30) < fst_dt then b_mis=1; *misuse of drugB; 
      end;

I thought this would cover all cases but I had a lot that were missing - maybe for those who have a fill_date long before an fst_dt.? I'm not sure what else the above code misses. That's one issue.

Where I am having most trouble is managing the data structure and counting the misuse correctly. see data below. There are multiple records for a person, because they have multiple fill dates for the same drug, (which may differ by dosages/days supply etc-but that's fine). All I need in the end is a final dataset with one line (per patient, per fst_dt) indicating whether they misuse drugA or not, and another variable indicating whether they misuse drugB or not. Just those 4 variables but I do not know how to choose whether a patient had a corresponding Rx (by looking in previous records within the same person, for any acceptable date range).

data enrolled;
input patid fst_dt :YYMMDD10. fill_dt :YYMMDD10. rfl_nbr drugA drugB end_dt :YYMMDD10. days_sup;
format fst_dt YYMMDD10. fill_dt YYMMDD10. end_dt YYMMDD10.;

cards;
12345	2018-04-11	2018-09-17	0	1	.	2018-10-01	15
12345	2018-11-13	2018-10-01	0	1	.	2018-10-15	15
12345	2018-11-13	2019-10-12	0	1	.	2019-11-10	30
12345	2018-11-13	2019-11-12	0	1	.	2019-12-11	30
12345	2018-11-13	2019-12-12	0	1	.	2020-01-10	30
12345	2018-11-13	2018-07-05	5	.	1	2018-08-03	30
12345	2018-11-13	2018-08-10	6	.	1	2018-09-08	30
12345	2018-11-13	2018-11-07	1	.	1	2018-12-06	30
66783	2018-07-17	2017-10-21	0	1	.	2017-11-19	30
66783	2018-10-01	2017-12-19	0	1	.	2018-01-17	30
66783	2019-02-15	2018-01-19	0	1	.	2018-02-17	30
66783	2019-09-11	2018-02-17	0	1	.	2018-03-18	30
66783	2019-09-11	2018-03-19	0	1	.	2018-04-17	30
66783	2019-09-11	2018-04-17	0	1	.	2018-05-16	30
66783	2019-09-11	2018-05-17	0	1	.	2018-06-15	30
66783	2019-09-11	2018-06-18	0	1	.	2018-07-17	30
66783	2019-09-11	2018-07-17	0	1	.	2018-08-15	30
66783	2019-09-11	2018-07-31	0	1	.	2018-08-29	30
66783	2019-09-11	2018-08-16	0	1	.	2018-09-14	30
66783	2019-09-11	2018-09-04	0	1	.	2018-10-03	30
66783	2019-09-11	2018-09-14	0	1	.	2018-10-13	30
66783	2019-09-11	2018-10-03	1	1	.	2018-11-01	30
66783	2019-09-11	2018-10-13	0	1	.	2018-11-11	30
66783	2019-09-11	2018-10-15	0	1	.	2018-10-17	3
66783	2019-09-11	2018-10-31	2	1	.	2018-11-29	30
66783	2019-09-11	2018-11-12	0	1	.	2018-12-11	30
66783	2019-09-11	2018-12-04	0	1	.	2019-01-02	30
66783	2019-09-11	2018-12-12	0	1	.	2019-01-10	30
66783	2019-09-11	2019-01-03	1	1	.	2019-02-01	30
66783	2019-09-11	2019-01-10	0	1	.	2019-02-08	30
66783	2019-09-11	2019-02-02	2	1	.	2019-03-03	30
66783	2019-09-11	2019-02-26	0	1	.	2019-02-28	3
66783	2019-09-11	2019-03-03	0	1	.	2019-04-01	30
66783	2019-09-11	2019-03-31	0	1	.	2019-04-29	30
66783	2019-09-11	2019-04-30	0	1	.	2019-05-29	30
66783	2019-09-11	2019-05-31	0	1	.	2019-06-29	30
66783	2019-09-11	2019-07-01	1	1	.	2019-07-30	30
66783	2019-09-11	2019-07-31	2	1	.	2019-08-29	30
66783	2019-09-11	2019-08-30	0	1	.	2019-09-28	30
66783	2019-09-11	2019-11-10	0	1	.	2019-12-09	30
66783	2019-09-11	2019-12-10	0	1	.	2020-01-08	30
66783	2019-09-11	2019-12-31	0	1	.	2020-01-02	3
66783	2019-09-11	2018-07-06	0	.	1	2018-08-04	30
66783	2019-09-11	2018-08-04	1	.	1	2018-09-02	30
66783	2019-09-11	2018-11-01	0	.	1	2018-11-05	5
66783	2019-09-11	2018-12-05	1	.	1	2019-01-03	30
66783	2019-09-11	2019-03-01	0	.	1	2019-03-04	4
66783	2019-09-11	2019-03-05	0	.	1	2019-03-14	10
66783	2019-09-11	2019-05-23	0	.	1	2019-05-23	1
54632	2017-03-28	2019-12-13	0	1	.	2019-12-19	7

;

run;

Here is an example of what I'm trying to do with no success. I hope this makes sense, it is difficult to explain.

Thank you

Student77 · Posted 01-31-2022 05:22 PM

GOT IT!


proc sql;
		create table misuse as
		select patid, fst_dt,  min(a_mis) as a_mis2, min(b_mis) as b_mis2  
		      from udt.enrolled
		      group by patid, fst_dt;
		quit; run;

View solution in original post

blueskyxyz · Posted 01-31-2022 03:48 AM

Sorry, I don't totally understand your question,
so I guess that you want to get unique mis-use records per patid per fst_dt?
if not, could you mind to explain it clearly, filter condition or what you want?

/*get unique mis-use records per patid per fst_dt*/
data a;
    set test1;
    where a_mis=1;
    keep patid fst_dt a_mis;
    proc sort nodupkey; by patid fst_dt ;
run;

data b;
    set test1;
    where b_mis=1;
    keep patid fst_dt b_mis;
    proc sort nodupkey; by patid fst_dt ;
run;

blueskyxyz · Posted 01-31-2022 03:51 AM

/*reupload the code*/

data test1;
	set enrolled;
	if drugA=1 then
		do;
			if (end_dt + 7) >= fst_dt > fill_dt then
				a_mis=0;
			*no drugA misuse;
			if (end_dt + 7) < fst_dt then
				a_mis=1;
			*drugA misuse, because we expect it to be out of the system by then;
		end;

	if drugB=1 then
		do;
			if (end_dt + 30) >= fst_dt > fill_dt then
				b_mis=0;
			* no drugB misuse;
			if (end_dt + 30) < fst_dt then
				b_mis=1;
			*misuse of drugB;
		end;
run;

/*get unique mis-use record per patid per fst_dt*/
data a;
    set test1;
    where a_mis=1;
    keep patid fst_dt a_mis;
    proc sort nodupkey; by patid fst_dt ;
run;

data b;
    set test1;
    where b_mis=1;
    keep patid fst_dt b_mis;
    proc sort nodupkey; by patid fst_dt ;
run;

Student77 · Posted 01-31-2022 10:00 AM

yes that is right - I want to assume misuse unless they've got an Rx in the correct date range to include the drug test date (fst-dt). I am testing this code now still, my SAS is so slow

s_lassen · Posted 01-31-2022 04:06 AM

I would reorganize the data a bit, to have something like this:

data enrolled;                                                       
input patid fst_dt :YYMMDD10. fill_dt :YYMMDD10. rfl_nbr drug :$1.   
      end_dt :YYMMDD10. days_sup;                                    
format fst_dt YYMMDD10. fill_dt YYMMDD10. end_dt YYMMDD10.;          
                                                                     
cards;                                                               
12345 2018-04-11 2018-09-17 0 a 2018-10-01 15                        
12345 2018-11-13 2018-10-01 0 a 2018-10-15 15                        
12345 2018-11-13 2019-10-12 0 a 2019-11-10 30                        
12345 2018-11-13 2019-11-12 0 a 2019-12-11 30                        
12345 2018-11-13 2019-12-12 0 a 2020-01-10 30                        
12345 2018-11-13 2018-07-05 5 b 2018-08-03 30                        
12345 2018-11-13 2018-08-10 6 b 2018-09-08 30                        
12345 2018-11-13 2018-11-07 1 b 2018-12-06 30                        
66783 2018-07-17 2017-10-21 0 a 2017-11-19 30                        
66783 2018-10-01 2017-12-19 0 a 2018-01-17 30                        
66783 2019-02-15 2018-01-19 0 a 2018-02-17 30                        
66783 2019-09-11 2018-02-17 0 a 2018-03-18 30                        
66783 2019-09-11 2018-03-19 0 a 2018-04-17 30                        
66783 2019-09-11 2018-04-17 0 a 2018-05-16 30                        
66783 2019-09-11 2018-05-17 0 a 2018-06-15 30                        
66783 2019-09-11 2018-06-18 0 a 2018-07-17 30                        
66783 2019-09-11 2018-07-17 0 a 2018-08-15 30                        
66783 2019-09-11 2018-07-31 0 a 2018-08-29 30                        
66783 2019-09-11 2018-08-16 0 a 2018-09-14 30                        
66783 2019-09-11 2018-09-04 0 a 2018-10-03 30                        
66783 2019-09-11 2018-09-14 0 a 2018-10-13 30                        
66783 2019-09-11 2018-10-03 1 a 2018-11-01 30
66783 2019-09-11 2018-10-13 0 a 2018-11-11 30   
66783 2019-09-11 2018-10-15 0 a 2018-10-17 3    
66783 2019-09-11 2018-10-31 2 a 2018-11-29 30   
66783 2019-09-11 2018-11-12 0 a 2018-12-11 30   
66783 2019-09-11 2018-12-04 0 a 2019-01-02 30   
66783 2019-09-11 2018-12-12 0 a 2019-01-10 30   
66783 2019-09-11 2019-01-03 1 a 2019-02-01 30   
66783 2019-09-11 2019-01-10 0 a 2019-02-08 30   
66783 2019-09-11 2019-02-02 2 a 2019-03-03 30   
66783 2019-09-11 2019-02-26 0 a 2019-02-28 3    
66783 2019-09-11 2019-03-03 0 a 2019-04-01 30   
66783 2019-09-11 2019-03-31 0 a 2019-04-29 30   
66783 2019-09-11 2019-04-30 0 a 2019-05-29 30   
66783 2019-09-11 2019-05-31 0 a 2019-06-29 30   
66783 2019-09-11 2019-07-01 1 a 2019-07-30 30   
66783 2019-09-11 2019-07-31 2 a 2019-08-29 30   
66783 2019-09-11 2019-08-30 0 a 2019-09-28 30   
66783 2019-09-11 2019-11-10 0 a 2019-12-09 30   
66783 2019-09-11 2019-12-10 0 a 2020-01-08 30   
66783 2019-09-11 2019-12-31 0 a 2020-01-02 3    
66783 2019-09-11 2018-07-06 0 b 2018-08-04 30   
66783 2019-09-11 2018-08-04 1 b 2018-09-02 30   
66783 2019-09-11 2018-11-01 0 b 2018-11-05 5    
66783 2019-09-11 2018-12-05 1 b 2019-01-03 30   
66783 2019-09-11 2019-03-01 0 b 2019-03-04 4    
66783 2019-09-11 2019-03-05 0 b 2019-03-14 10   
66783 2019-09-11 2019-05-23 0 b 2019-05-23 1
54632 2017-03-28 2019-12-13 0 a 2019-12-19 7  
;run;

That is, replace the two markers for drug A and drug B with a DRUG variable.

The rest should not be so hard, then:

proc sort data=enrolled;                     
  by patid drug fst_dt;                      
run;                                         
data want;                                   
  misuse=1;                                  
  do until(last.fst_dt);                     
    set enrolled;                            
    by patid drug fst_dt;                    
    if (end_dt + 7) >= fst_dt >=fill_dt then 
      misuse=0;                              
    end;                                     
run;

The idea is simply to assume misuse, until we find a prescription that covers the test date. Changing the markers to a DRUG variable also has the advantage that you do not have to write in more code and more variables when you get more drugs (e.g. drug C or D, or even E, though I do not think you can get Ecstasy on prescription at the moment).

Just one question: what does the RFL_NBR variable do? To me it sounds suspiciously like a refill-number, meaning that the patient may be able to get drugs more times on the same prescription. In that case, the END_DATE should (I assume) be calculated not as FILL_DATE+DAYS_SUP, but as FILL_DATE+(1+RFL_NBR)*DAYS_SUP.

Student77 · Posted 01-31-2022 09:59 AM

yes rfl_nbr is refill number! I have put it to the side for now, but thought if I was to consider the refills, then I could add it to my end-date calculation is that right? so I could do end_date= fill_date+(days_sup*rfl_br)-1

but it get's tricky because maybe a person would not get the refills back to back, there could be variable time in between someone filling an Rx. However with these drugs, it's assumed they're needed daily, round the clock.

The only issue is that one drug has end-date +7 but the other has end_date +30 - is there a way to code it that way after combining drug a and b into one drug var?

Thanks for the help

Student77 · Posted 01-31-2022 04:11 PM

Ok I tried both codes and I still have problems. I did make progress though, and now just need to select from the multiple records. I will try my best to explain:

I need to be able to differentiate a_misuse from b_misuse. then in the final consolidated dataset (that I need help getting), I'll create a general misuse variable, where if both drugs are misused then misuse=1, if A is misused only, then=2, if B misused only then misuse=3 and if neither misused then misuse=0.

But let me backtrack one moment:

I now defined the end date as suggested, giving the widest window:

end_dt= fill_dt + (1+RFL_NBR)*DAYS_SUP;

then, I created a general "end" variable that differs by drug, where:

for drugA=1 end=end_dt+7 *remains in system 7 days max;

for drugB=1 end=end_dt+30; *remains in the system longer;

In the data, there are 3 possibilities for each drug, and one was being missed in previous code:

----(fill_date)-------[fst_dt]------(end)--- = no misuse, as fst_dt is in the "compliance range"

----(fill_date)------(end)------[fst_dt]---- = misuse

but there is also:

[fst_dt]-------(fill_dt)----(end) = uncertain - drug test was done after Rx window. They may have had it before data is able to capture? so I can't assume misuse or not. so misuse (a_mis or b_mis will = . in these cases).

So using some of both your codes, what I did was get per record, whether a_mis=0, 1, or . and whether b_mis= 0, 1, . Is there now a way I can select (maybe a proc sql select (min) ?) the lowest value for a_mis, and lowest for b_mis, BY patient and fst_dt? Meaning if someone has records for a_miss=0, 1, and . for a given fst_dt, I want to select the 0 first. if no 0 then select 1. and if they're all missing then keep as missing.

This has to be done for each, a_mis and b_mis.

Here is the current data structure:

data enrolled;
input patid fst_dt :YYMMDD10. fill_dt :YYMMDD10. drugA drugB end :YYMMDD10. a_mis b_mis;
format fst_dt YYMMDD10. fill_dt YYMMDD10. end YYMMDD10.;

cards;
1234	2018-04-11	2018-09-17	1	.	2018-10-12	.	.
1234	2018-11-13	2018-10-01	1	.	2018-10-26	1	.
1234	2018-11-13	2019-10-12	1	.	2019-11-21	.	.
1234	2018-11-13	2019-11-12	1	.	2019-12-22	.	.
1234	2018-11-13	2019-12-12	1	.	2020-01-21	.	.
1234	2018-11-13	2018-07-05	.	1	2019-01-31	.	0
1234	2018-11-13	2018-08-10	.	1	2019-04-07	.	0
1234	2018-11-13	2018-11-07	.	1	2019-02-05	.	0
5678	2018-07-17	2017-10-21	1	.	2017-11-30	1	.
5678	2018-10-01	2017-12-19	1	.	2018-01-28	1	.
5678	2019-02-15	2018-01-19	1	.	2018-02-28	1	.
5678	2019-09-11	2018-09-14	1	.	2018-10-24	1	.
5678	2019-09-11	2019-05-31	1	.	2019-07-10	1	.
5678	2019-09-11	2019-07-01	1	.	2019-09-09	1	.
5678	2019-09-11	2019-07-31	1	.	2019-11-08	0	.
5678	2019-09-11	2019-08-30	1	.	2019-10-09	0	.
5678	2019-09-11	2019-11-10	1	.	2019-12-20	.	.
5678	2019-09-11	2019-12-10	1	.	2020-01-19	.	.
5678	2019-09-11	2019-12-31	1	.	2020-01-13	.	.
5678	2019-09-11	2018-07-06	.	1	2018-09-04	.	1
5678	2019-09-11	2018-08-04	.	1	2018-11-02	.	1
5678	2019-09-11	2018-11-01	.	1	2018-12-06	.	1
;


run;

Thanks so much, I hope this is the last step!

Student77 · Posted 01-31-2022 05:22 PM

GOT IT!


proc sql;
		create table misuse as
		select patid, fst_dt,  min(a_mis) as a_mis2, min(b_mis) as b_mis2  
		      from udt.enrolled
		      group by patid, fst_dt;
		quit; run;

How to select non-prescribed drug use (insurance claims data), by patient and fst-date

Re: How to select non-prescribed drug use (insurance claims data), by patient and fst-date

Re: How to select non-prescribed drug use (insurance claims data), by patient and fst-date

Re: How to select non-prescribed drug use (insurance claims data), by patient and fst-date

Re: How to select non-prescribed drug use (insurance claims data), by patient and fst-date

Re: How to select non-prescribed drug use (insurance claims data), by patient and fst-date

Re: How to select non-prescribed drug use (insurance claims data), by patient and fst-date

Re: How to select non-prescribed drug use (insurance claims data), by patient and fst-date

Re: How to select non-prescribed drug use (insurance claims data), by patient and fst-date

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!