BookmarkSubscribeRSS Feed
sarahallvik
Calcite | Level 5

Hello!

I am working with a prescription data file, where each row contains a unique patient identifier (individual_id), prescriber identifier (prescriber_num), drug information, and date (date_dispensed). I need to identify patients who receive prescriptions from four or more prescribers in any given 365 day period. My thought was to look back 365 days from every prescription, and count the number of prescribers in that time period, flagging patients that meet the four unqiue prescriber threshold and the date of the prescription at the start of the look-back. Any ideas on how to do this? I've been playing around with a data step with a hash object and proc sql, but to no avail.

data mine;
  set pdmp;
  	format h_date_seen mmddyy10.;
  by individual_id;
  length h_date_seen 8 ;
  if _n_ = 1 then do;
    declare hash Prescriber();
    rc = prescriber.definekey('h_prescriber_num');
    rc = prescriber.definedata('h_prescriber_num','h_date_seen');
    prescriber.definedone();
    dcl hiter i('prescriber');
  end;

  if first.individual_id then prescriber.clear();

     h_prescriber_num = prescriber_num;
     if prescriber.find() then do;
       h_date_seen = date_dispensed;
       prescriber.add();
     end;
     else do;
       h_date_seen = date_dispensed;
       prescriber.replace();
     end;
     presc_Ascend = 0;
     rc = i.first();
     do while(i.next() = 0);
       if prescriber_num ne h_prescriber_num and Date_Dispensed - h_date_seen le 365 then presc_Ascend = 1;  
     end;
run;

proc print data=mine (obs=20); run;

proc sql;
  create table final as
  select *,max(presc_cnt_pre,0) as presc_Percent_pre
  from (select prescriber_NUM, 
               count(distinct case presc_Ascend when 1 then individual_id else '' end) as presc_Cnt_pre,
          from mine
         group by prescriber_NUM);
quit;

I'm working in SAS 9.3, W32_7PRO platform on Windows V6.1 operating system.

 

 

Thank you!!

Sara

8 REPLIES 8
Haikuo
Onyx | Level 15
Would be nice if you can provide a set of sample on both incoming and outcome data.
sarahallvik
Calcite | Level 5

The input data looks like this:

data sample;
	input individual_id prescriber_num date_dispensed;
datalines;
1 101 1/1/2012 
1 564 5/1/2012 
1 88 6/15/2012 
1 101 10/4/2012 
1 3351 4/1/2013 
2 1199 5/1/2012 
2 1199 6/1/2012 
2 1199 7/1/2012 
2 1811 8/1/2012 
3 646 4/1/2012 
3 646 5/1/2012 
3 7752 6/1/2012
3 505 7/1/2012 
3 505 8/1/2012 
3 646 4/1/2013 
3 505 5/1/2013 
3 871 5/15/2013 
;
run;

 

 

I want the output data to look like this, where "prov_cnt_back" is the number of unique prescribers in the 365 days prior to the date_dispensed.

data out;
	input individual_id prescriber_num date_dispensed prov_cnt_back;
datalines;
1 101 1/1/2012 1 
1 564 5/1/2012 2 
1 88 6/15/2012 3 
1 101 10/4/2012 3 
1 3351 4/1/2013 4 
2 1199 5/1/2012 1 
2 1199 6/1/2012 1 
2 1199 7/1/2012 1 
2 1811 8/1/2012 2 
3 646 4/1/2012 1 
3 646 5/1/2012 1 
3 7752 6/1/2012 2 
3 505 7/1/2012 3 
3 505 8/1/2012 3 
3 646 4/1/2013 3 
3 505 5/1/2013 3 
3 871 5/15/2013 4
;
run;

 

 

Thank you!! Sara

Haikuo
Onyx | Level 15

Here is a 'quick and dirty' SQL approach, it may have performance issue when dealing with huge table. Data step can also achieve the same result, but code could get complex.

data sample;
	input individual_id prescriber_num date_dispensed :mmddyy10.;
	format date_dispensed mmddyy10.;
datalines;
1 101 1/1/2012 
1 564 5/1/2012 
1 88 6/15/2012 
1 101 10/4/2012 
1 3351 4/1/2013 
2 1199 5/1/2012 
2 1199 6/1/2012 
2 1199 7/1/2012 
2 1811 8/1/2012 
3 646 4/1/2012 
3 646 5/1/2012 
3 7752 6/1/2012
3 505 7/1/2012 
3 505 8/1/2012 
3 646 4/1/2013 
3 505 5/1/2013 
3 871 5/15/2013 
;
run;

proc sql;
create table want as
select *, (select count(distinct prescriber_num) from sample 
               where individual_id=a.individual_id
                 and date_dispensed between a.date_dispensed and intnx('year',a.date_dispensed,-1,'s')
		   ) as prov_cnt_back

from sample a
;
quit;
sarahallvik
Calcite | Level 5

Thank you! But with 15 million records to process, the code could not complete the task when left to run for 18 hours. I think a data step is going to be more appropriate... I've been playing around with lag statements, which could work, but is there a way to look back at an unlimited number of previous records for each individual_id?

data want;
	set sample;
	by individual_id;
		lagdr1 = lag1(prescriber_num);
		lagdr2 = lag2(prescriber_num);
		lagdr3 = lag3(prescriber_num);
		lagdr4 = lag4(prescriber_num);
		lagdr5 = lag5(prescriber_num);
		lagdr6 = lag6(prescriber_num);
		lagdr7 = lag7(prescriber_num);
		lagdr8 = lag8(prescriber_num);
		lagdr9 = lag9(prescriber_num);

	if prescriber_num ne lagdr1 and 
		prescriber_num ne lagdr2 and
		prescriber_num ne lagdr3 and
		prescriber_num ne lagdr4 and
		prescriber_num ne lagdr5 and
		prescriber_num ne lagdr6 and
		prescriber_num ne lagdr7 and
		prescriber_num ne lagdr8 and
		prescriber_num ne lagdr9 then uniquedr = 1;
								else uniquedr = 0;
run;

Haikuo
Onyx | Level 15

Sarah,

Here are some data step code. Please note:

1. if you have 18M records, and data came as being presented, you only need to upload 3 variables to Hash table, which is roughly 3*8*18million/1024/1024 = 412M bytes, so theoretically if you have Giga + byte memory (which is common), you should be able to fit the whole thing into Hash table. If not, we will have to break it up by ID, and it will take 2XPass, meaning it will take at leaset double amount of time to finish. 

2. The code can still be tweaked to be more efficient, in theory, such as using Hiter plus SETCUR method, however, I am NOT sure how much leverage you can benefit from it.

 

data want;
	if _n_=1 then
		do;
			if 0 then
				set sample;
			dcl hash id(dataset:'sample (keep=individual_id prescriber_num date_dispensed rename=(prescriber_num=_pre date_dispensed=_date))', multidata : 'y');
			id.definekey('individual_id');
			id.definedata('individual_id', '_pre', '_date');
			id.definedone();
			dcl hash pre();
			pre.definekey('_pre');
			pre.definedone();
			call missing(_pre,_date);
		end;

	set sample;
	rc=id.find();
        _beg=intnx('year',date_dispensed,-1,'s');
	do rc=0 by 0 while (rc=0);
		if _beg <=  _date <= date_dispensed then
			rc=pre.add();
		rc=id.find_next();
	end;

	prov_cnt_back=pre.num_items;
	rc=pre.clear();
	drop rc _:;
run;

Update: To call INTNX() only once per obs instead of up to 365 times, maybe saving your some time. 

azfvftrbhtbetbab
Calcite | Level 5

data sample;

input individual_id prescriber_num date_dispensed mmddyy10.;

format date_dispensed date9.;

datalines;

1 101 1/1/2012

1 564 5/1/2012

1 88 6/15/2012

1 101 10/4/2012

1 3351 4/1/2013

2 1199 5/1/2012

2 1199 6/1/2012

2 1199 7/1/2012

2 1811 8/1/2012

3 646 4/1/2012

3 646 5/1/2012

3 7752 6/1/2012

3 505 7/1/2012

3 505 8/1/2012

3 646 4/1/2013

3 505 5/1/2013

3 871 5/15/2013

;

run;

proc sql;

create table work.out as

select s1.*, Count(s2.individual_id) + 1 as prov_cnt_back

from sample s1

left join sample s2 on

s1.individual_id = s2.individual_id

and s1.prescriber_num ~= s2.prescriber_num

and s1.date_dispensed - s2.date_dispensed between 0 and 365

left join sample s3 on

s2.individual_id = s3.individual_id

and s2.prescriber_num = s3.prescriber_num

and s1.date_dispensed - s3.date_dispensed between 0 and 365

and s3.date_dispensed < s2.date_dispensed

where s2.individual_id is null or (s2.individual_id is not null and s3.individual_id is null)

GROUP BY s1.individual_id, s1.prescriber_num, s1.date_dispensed;

run;

 

This code should work, although you may run into performance issues as you did with the other proc sql code.

 

How fast are you expecting this to be?

s_lassen
Meteorite | Level 14

Sara,

if your data are sorted by id and date, this should work:

data mine;
  set pdmp;
  by individual_id;
  format h_date_seen first_date mmddyy10.;
  if first.individual_id then do;
    if _n_ = 1 then do;
      declare hash Prescriber();
      rc = prescriber.definekey('h_prescriber_num');
      rc = prescriber.definedata('h_prescriber_num','h_date_seen');
      prescriber.definedone();
      dcl hiter i('prescriber');
    end;
    else
      prescriber.clear();
  end;
  h_prescriber_num = prescriber_num;
  if prescriber.find() then do;
    h_date_seen = date_dispensed;
    prescriber.add();
  end;
  else do;
    h_date_seen = date_dispensed;
    prescriber.replace();
  end;
  if prescriber.num_items>=1 then do;
    first_date=intnx('YEAR',date_dispensed,-1,'SAME');
    rc = i.first();
    do until(i.next());
      if h_date_seen<first_date then
        prescriber.remove();
    end;
  end;
  prov_cnt_back=prescriber.num_items;
drop first_date rc h_:; run;

I am not sure how it will stack against your 25 million records, but give it a try!

 

Jim_G
Pyrite | Level 9

Sara

 

Here is an approach.

Sort by patient ID, script_dt

Set up an array of prescribers (50) for an Id   and hold start script date

 

On first.ID clear the prescriber array, start date

With each obs check the new script_date with the start script date

   If < 365 check to see if prescriber in array

   If new prescriber add prescriber to array and add 1 to tot_scripts

   If tot_scrits gt 4 then set flag

   If last.ID the output..

 

 

To do a rolling 365 over several years, you might want to create a 2 dimensional array and keep the script dates and prescribers for each patient.

 

Jim

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1772 views
  • 0 likes
  • 5 in conversation