Hello!
I am working with a prescription data file, where each row contains a unique patient identifier (individual_id), prescriber identifier (prescriber_num), drug information, and date (date_dispensed). I need to identify patients who receive prescriptions from four or more prescribers in any given 365 day period. My thought was to look back 365 days from every prescription, and count the number of prescribers in that time period, flagging patients that meet the four unqiue prescriber threshold and the date of the prescription at the start of the look-back. Any ideas on how to do this? I've been playing around with a data step with a hash object and proc sql, but to no avail.
data mine;
set pdmp;
format h_date_seen mmddyy10.;
by individual_id;
length h_date_seen 8 ;
if _n_ = 1 then do;
declare hash Prescriber();
rc = prescriber.definekey('h_prescriber_num');
rc = prescriber.definedata('h_prescriber_num','h_date_seen');
prescriber.definedone();
dcl hiter i('prescriber');
end;
if first.individual_id then prescriber.clear();
h_prescriber_num = prescriber_num;
if prescriber.find() then do;
h_date_seen = date_dispensed;
prescriber.add();
end;
else do;
h_date_seen = date_dispensed;
prescriber.replace();
end;
presc_Ascend = 0;
rc = i.first();
do while(i.next() = 0);
if prescriber_num ne h_prescriber_num and Date_Dispensed - h_date_seen le 365 then presc_Ascend = 1;
end;
run;
proc print data=mine (obs=20); run;
proc sql;
create table final as
select *,max(presc_cnt_pre,0) as presc_Percent_pre
from (select prescriber_NUM,
count(distinct case presc_Ascend when 1 then individual_id else '' end) as presc_Cnt_pre,
from mine
group by prescriber_NUM);
quit;
I'm working in SAS 9.3, W32_7PRO platform on Windows V6.1 operating system.
Thank you!!
Sara
The input data looks like this:
data sample;
input individual_id prescriber_num date_dispensed;
datalines;
1 101 1/1/2012
1 564 5/1/2012
1 88 6/15/2012
1 101 10/4/2012
1 3351 4/1/2013
2 1199 5/1/2012
2 1199 6/1/2012
2 1199 7/1/2012
2 1811 8/1/2012
3 646 4/1/2012
3 646 5/1/2012
3 7752 6/1/2012
3 505 7/1/2012
3 505 8/1/2012
3 646 4/1/2013
3 505 5/1/2013
3 871 5/15/2013
;
run;
I want the output data to look like this, where "prov_cnt_back" is the number of unique prescribers in the 365 days prior to the date_dispensed.
data out;
input individual_id prescriber_num date_dispensed prov_cnt_back;
datalines;
1 101 1/1/2012 1
1 564 5/1/2012 2
1 88 6/15/2012 3
1 101 10/4/2012 3
1 3351 4/1/2013 4
2 1199 5/1/2012 1
2 1199 6/1/2012 1
2 1199 7/1/2012 1
2 1811 8/1/2012 2
3 646 4/1/2012 1
3 646 5/1/2012 1
3 7752 6/1/2012 2
3 505 7/1/2012 3
3 505 8/1/2012 3
3 646 4/1/2013 3
3 505 5/1/2013 3
3 871 5/15/2013 4
;
run;
Thank you!! Sara
Here is a 'quick and dirty' SQL approach, it may have performance issue when dealing with huge table. Data step can also achieve the same result, but code could get complex.
data sample;
input individual_id prescriber_num date_dispensed :mmddyy10.;
format date_dispensed mmddyy10.;
datalines;
1 101 1/1/2012
1 564 5/1/2012
1 88 6/15/2012
1 101 10/4/2012
1 3351 4/1/2013
2 1199 5/1/2012
2 1199 6/1/2012
2 1199 7/1/2012
2 1811 8/1/2012
3 646 4/1/2012
3 646 5/1/2012
3 7752 6/1/2012
3 505 7/1/2012
3 505 8/1/2012
3 646 4/1/2013
3 505 5/1/2013
3 871 5/15/2013
;
run;
proc sql;
create table want as
select *, (select count(distinct prescriber_num) from sample
where individual_id=a.individual_id
and date_dispensed between a.date_dispensed and intnx('year',a.date_dispensed,-1,'s')
) as prov_cnt_back
from sample a
;
quit;
Thank you! But with 15 million records to process, the code could not complete the task when left to run for 18 hours. I think a data step is going to be more appropriate... I've been playing around with lag statements, which could work, but is there a way to look back at an unlimited number of previous records for each individual_id?
data want;
set sample;
by individual_id;
lagdr1 = lag1(prescriber_num);
lagdr2 = lag2(prescriber_num);
lagdr3 = lag3(prescriber_num);
lagdr4 = lag4(prescriber_num);
lagdr5 = lag5(prescriber_num);
lagdr6 = lag6(prescriber_num);
lagdr7 = lag7(prescriber_num);
lagdr8 = lag8(prescriber_num);
lagdr9 = lag9(prescriber_num);
if prescriber_num ne lagdr1 and
prescriber_num ne lagdr2 and
prescriber_num ne lagdr3 and
prescriber_num ne lagdr4 and
prescriber_num ne lagdr5 and
prescriber_num ne lagdr6 and
prescriber_num ne lagdr7 and
prescriber_num ne lagdr8 and
prescriber_num ne lagdr9 then uniquedr = 1;
else uniquedr = 0;
run;
Sarah,
Here are some data step code. Please note:
1. if you have 18M records, and data came as being presented, you only need to upload 3 variables to Hash table, which is roughly 3*8*18million/1024/1024 = 412M bytes, so theoretically if you have Giga + byte memory (which is common), you should be able to fit the whole thing into Hash table. If not, we will have to break it up by ID, and it will take 2XPass, meaning it will take at leaset double amount of time to finish.
2. The code can still be tweaked to be more efficient, in theory, such as using Hiter plus SETCUR method, however, I am NOT sure how much leverage you can benefit from it.
data want;
if _n_=1 then
do;
if 0 then
set sample;
dcl hash id(dataset:'sample (keep=individual_id prescriber_num date_dispensed rename=(prescriber_num=_pre date_dispensed=_date))', multidata : 'y');
id.definekey('individual_id');
id.definedata('individual_id', '_pre', '_date');
id.definedone();
dcl hash pre();
pre.definekey('_pre');
pre.definedone();
call missing(_pre,_date);
end;
set sample;
rc=id.find();
_beg=intnx('year',date_dispensed,-1,'s');
do rc=0 by 0 while (rc=0);
if _beg <= _date <= date_dispensed then
rc=pre.add();
rc=id.find_next();
end;
prov_cnt_back=pre.num_items;
rc=pre.clear();
drop rc _:;
run;
Update: To call INTNX() only once per obs instead of up to 365 times, maybe saving your some time.
data sample;
input individual_id prescriber_num date_dispensed mmddyy10.;
format date_dispensed date9.;
datalines;
1 101 1/1/2012
1 564 5/1/2012
1 88 6/15/2012
1 101 10/4/2012
1 3351 4/1/2013
2 1199 5/1/2012
2 1199 6/1/2012
2 1199 7/1/2012
2 1811 8/1/2012
3 646 4/1/2012
3 646 5/1/2012
3 7752 6/1/2012
3 505 7/1/2012
3 505 8/1/2012
3 646 4/1/2013
3 505 5/1/2013
3 871 5/15/2013
;
run;
proc sql;
create table work.out as
select s1.*, Count(s2.individual_id) + 1 as prov_cnt_back
from sample s1
left join sample s2 on
s1.individual_id = s2.individual_id
and s1.prescriber_num ~= s2.prescriber_num
and s1.date_dispensed - s2.date_dispensed between 0 and 365
left join sample s3 on
s2.individual_id = s3.individual_id
and s2.prescriber_num = s3.prescriber_num
and s1.date_dispensed - s3.date_dispensed between 0 and 365
and s3.date_dispensed < s2.date_dispensed
where s2.individual_id is null or (s2.individual_id is not null and s3.individual_id is null)
GROUP BY s1.individual_id, s1.prescriber_num, s1.date_dispensed;
run;
This code should work, although you may run into performance issues as you did with the other proc sql code.
How fast are you expecting this to be?
Sara,
if your data are sorted by id and date, this should work:
data mine;
set pdmp;
by individual_id;
format h_date_seen first_date mmddyy10.;
if first.individual_id then do;
if _n_ = 1 then do;
declare hash Prescriber();
rc = prescriber.definekey('h_prescriber_num');
rc = prescriber.definedata('h_prescriber_num','h_date_seen');
prescriber.definedone();
dcl hiter i('prescriber');
end;
else
prescriber.clear();
end;
h_prescriber_num = prescriber_num;
if prescriber.find() then do;
h_date_seen = date_dispensed;
prescriber.add();
end;
else do;
h_date_seen = date_dispensed;
prescriber.replace();
end;
if prescriber.num_items>=1 then do;
first_date=intnx('YEAR',date_dispensed,-1,'SAME');
rc = i.first();
do until(i.next());
if h_date_seen<first_date then
prescriber.remove();
end;
end;
prov_cnt_back=prescriber.num_items;
drop first_date rc h_:;
run;
I am not sure how it will stack against your 25 million records, but give it a try!
Sara
Here is an approach.
Sort by patient ID, script_dt
Set up an array of prescribers (50) for an Id and hold start script date
On first.ID clear the prescriber array, start date
With each obs check the new script_date with the start script date
If < 365 check to see if prescriber in array
If new prescriber add prescriber to array and add 1 to tot_scripts
If tot_scrits gt 4 then set flag
If last.ID the output..
To do a rolling 365 over several years, you might want to create a 2 dimensional array and keep the script dates and prescribers for each patient.
Jim
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.