Solved: Re: Concomitant medication use

m_o · Posted 04-26-2017 01:38 PM

Hello,

I am trying to define concomitant use of drugs.

I have a great solution to my problem with a SAS data step at the link below:

https://communities.sas.com/t5/Base-SAS-Programming/Concomitant-drug-medication-use/m-p/351710#M8188...

However, the only issue is that I have a very large dataset (~600K rows) and those codes require a ton of memory and my computer crashes. I wonder if anybody could suggest an alternative, perhaps a proc sql solution, that would not require as much memory and computational capacity.

Thank you very much! Any help would be greatly appreciated!

Katie

thomp7050 · Posted 05-01-2017 08:17 AM

This is not exactly elegant but it works:

data have;  
input ID DRUG $ START_DT MMDDYY10. DAYS_SUPP  END_DT MMDDYY10. ;
FORMAT START_DT END_DT MMDDYY10.;
cards; 
1	A	2/17/10	30	3/19/10
1	B	5/6/09	30	6/5/09
1	C	7/9/11	60	9/7/11
1	E	3/1/10	90	5/30/10
1	B	1/1/09	90	4/1/09
1	D	2/1/09	30	3/3/09
1	C	5/6/12	90	8/4/12
2	B	4/1/12	60	5/31/12
2	A	7/1/10	30	7/31/10
2	C	8/3/10	90	11/1/10
2	D	11/1/13	90	1/30/14
2	E	12/5/13	90	3/5/14
2	A	2/1/11	90	5/2/11
;
RUN;

PROC SQL;
CREATE TABLE HAVE AS
SELECT MONOTONIC() AS ROWID, * FROM HAVE;
QUIT;

%let fromDate = 01Jan2000;
%let toDate = 31DEC2016;
data alldates;
  length key 8; 
  do date = "&fromDate"d to "&toDate"d; 
    key + 1; 
    RSA_WorkdayInd = ( 2 <= weekday(date) <= 6);
    WeekendInd = (not RSA_WorkdayInd);
    Calendar_Week_Number = week(date, "V");
    DayOfWeek = put(date, downame3.);
    Fin_Year = year(intnx("year.7", date, 0));
    output; 
  end; 

  format 
    date date9.
  ;
run; 

PROC SQL;
CREATE TABLE OVERLAP AS
SELECT HAVE.*, HAVE2.ROWID AS ROWID2, HAVE2.ID AS ID2, HAVE2.START_DT AS START2, HAVE2.DAYS_SUPP AS DAYS2, HAVE2.END_DT AS END2 
, (SELECT COUNT(*) FROM alldates WHERE DATE >= HAVE.START_DT AND DATE >= HAVE2.START_DT AND DATE <= HAVE.END_DT AND DATE <= HAVE2.END_DT) AS OVERLAP
FROM HAVE, HAVE AS HAVE2
WHERE HAVE.ROWID NE HAVE2.ROWID
AND HAVE.ID = HAVE2.ID
AND (HAVE.START_DT <= HAVE2.END_DT) and (HAVE2.START_DT <= HAVE.END_DT);
QUIT;

View solution in original post

Reeza · Posted 04-26-2017 01:45 PM

600K rows should not be causing issues.

Can you identify which step causes the issues?

m_o · Posted 04-26-2017 05:03 PM

The first step. Thank you.

PGStats · Posted 04-26-2017 11:57 PM

The first step can be slightly optimized as:

data days(sortedby=id);
length drug $1;
set have;
do day = start_dt to end_dt;
    output;
    end;
format day yymmdd10.;
keep id drug day;
run;

The sortedby= dataset option might improve the sort performance of the following step.

Droping the useless variable dum saves space. (It was a remnant of a previous version of the code)

Defining the length of drug as 1 character will also save space.

See if you can get the first step and sort to work. If so, we will try to optimize the rest.

PG

m_o · Posted 05-01-2017 01:13 AM

Thank you very much for your response. I tried as you suggested but my SAS stopped working after processing about a billion rows. I might go the proc sql route instead. Thanks again!

Astounding · Posted 04-26-2017 02:01 PM

Regarding your link ...

The first step is a good idea at least in theory. But it does expand the data set to perhaps 40M rows. Is that where the process breaks down?

The second step is also a good idea (sorting the 40M rows). Again, is this the source of the problem?

I'm hoping the problems lie in the DATA step that follows sorting. Questions for that:

Are the drug names really A, B, C, D, E? (If not, how many characters are required for each name?)
Do you know the maximum number of drugs per patient? (It might be 5, but I just wanted to check.)
If there is a combination of AB for 10 days, then CD for the next 20 days, does that still count as a block of 30 days with 2+ drugs? (If not, a little more explanation would be in order.)

m_o · Posted 04-26-2017 04:27 PM

Thank you for your reply. Yes, the process breaks down at the first step due to the large number of rows it tries to generate. Regarding your second point, if a patient has been on combination AB for 10 days and combination CD for 20 days, then I would disregard those combinations because in both cases, the patient has been on a unique drug combination for <30 days. However, if a patient has been on AB 10 days but ABC for >=30 days, then I will count ABC.

I am not sure about the maximum number of overlaps. Probably about 5-6. I will shorten the names of the drugs to single letters or single digits in place of the full drug name. Thank you very much!

Reeza · Posted 04-26-2017 06:50 PM

I still don't understand how it would crash. 40 million rows shouldn't be too much, it's annoying and large - takes about <20 mins to process analytically our sites on a standard desktop. If its failing at the sort, you could change the sortsize option to help with that issue.

thomp7050 · Posted 04-26-2017 02:49 PM

For your consideration, proc sql code that displays number of days overlap among single-subject drug periods that overlap:

data have;  
input ID DRUG $ START_DT MMDDYY10. DAYS_SUPP  END_DT MMDDYY10. ;
FORMAT START_DT END_DT MMDDYY10.;
cards; 
1	A	2/17/10	30	3/19/10
1	B	5/6/09	30	6/5/09
1	C	7/9/11	60	9/7/11
1	E	3/1/10	90	5/30/10
1	B	1/1/09	90	4/1/09
1	D	2/1/09	30	3/3/09
1	C	5/6/12	90	8/4/12
2	B	4/1/12	60	5/31/12
2	A	7/1/10	30	7/31/10
2	C	8/3/10	90	11/1/10
2	D	11/1/13	90	1/30/14
2	E	12/5/13	90	3/5/14
2	A	2/1/11	90	5/2/11
;
RUN;

PROC SQL;
CREATE TABLE HAVE AS
SELECT MONOTONIC() AS ROWID, * FROM HAVE;
QUIT;

PROC SQL;
CREATE TABLE OVERLAP AS
SELECT HAVE.*, HAVE2.ROWID AS ROWID2, HAVE2.ID AS ID2, HAVE2.START_DT AS START2, HAVE2.DAYS_SUPP AS DAYS2, HAVE2.END_DT AS END2 
, CASE WHEN HAVE.START_DT > HAVE2.START_DT THEN HAVE2.END_DT - HAVE.START_DT  ELSE  HAVE.END_DT -  HAVE2.START_DT END AS OVERLAP
FROM HAVE, HAVE AS HAVE2
WHERE HAVE.ROWID NE HAVE2.ROWID
AND HAVE.ID = HAVE2.ID
AND (HAVE.START_DT <= HAVE2.END_DT) and (HAVE2.START_DT <= HAVE.END_DT);
QUIT;

m_o · Posted 04-26-2017 06:48 PM

Thank you for your response. This is very helpful. I noticed an error in your output in that, in rows 3 and 4 of the output, the actual overlap duration between the 2 drugs are between 2/1/2009 and 3/3/2009, which is 30 days, not 59 days. I was wondering how I could use CASE WHEN to take care of the other two scenarios where duration of drugA can be completely within duration of drugB and vice versa.

thomp7050 · Posted 05-01-2017 08:17 AM

This is not exactly elegant but it works:

data have;  
input ID DRUG $ START_DT MMDDYY10. DAYS_SUPP  END_DT MMDDYY10. ;
FORMAT START_DT END_DT MMDDYY10.;
cards; 
1	A	2/17/10	30	3/19/10
1	B	5/6/09	30	6/5/09
1	C	7/9/11	60	9/7/11
1	E	3/1/10	90	5/30/10
1	B	1/1/09	90	4/1/09
1	D	2/1/09	30	3/3/09
1	C	5/6/12	90	8/4/12
2	B	4/1/12	60	5/31/12
2	A	7/1/10	30	7/31/10
2	C	8/3/10	90	11/1/10
2	D	11/1/13	90	1/30/14
2	E	12/5/13	90	3/5/14
2	A	2/1/11	90	5/2/11
;
RUN;

PROC SQL;
CREATE TABLE HAVE AS
SELECT MONOTONIC() AS ROWID, * FROM HAVE;
QUIT;

%let fromDate = 01Jan2000;
%let toDate = 31DEC2016;
data alldates;
  length key 8; 
  do date = "&fromDate"d to "&toDate"d; 
    key + 1; 
    RSA_WorkdayInd = ( 2 <= weekday(date) <= 6);
    WeekendInd = (not RSA_WorkdayInd);
    Calendar_Week_Number = week(date, "V");
    DayOfWeek = put(date, downame3.);
    Fin_Year = year(intnx("year.7", date, 0));
    output; 
  end; 

  format 
    date date9.
  ;
run; 

PROC SQL;
CREATE TABLE OVERLAP AS
SELECT HAVE.*, HAVE2.ROWID AS ROWID2, HAVE2.ID AS ID2, HAVE2.START_DT AS START2, HAVE2.DAYS_SUPP AS DAYS2, HAVE2.END_DT AS END2 
, (SELECT COUNT(*) FROM alldates WHERE DATE >= HAVE.START_DT AND DATE >= HAVE2.START_DT AND DATE <= HAVE.END_DT AND DATE <= HAVE2.END_DT) AS OVERLAP
FROM HAVE, HAVE AS HAVE2
WHERE HAVE.ROWID NE HAVE2.ROWID
AND HAVE.ID = HAVE2.ID
AND (HAVE.START_DT <= HAVE2.END_DT) and (HAVE2.START_DT <= HAVE.END_DT);
QUIT;

m_o · Posted 05-01-2017 11:01 AM

Thank you! This works! I will try to modify the sql code to also identify three or more overlaps, but this is wonderful. Thank you very much!! 🙂

m_o · Posted 05-01-2017 06:17 PM

Thank you very much for your help once again. I was wondering if there is a way to modify the sql code so that it outputs 2 additional columns containing the 2 dates that were used to compute that difference/overlapping days. I can do it for two combinations in a simple separate data step, but I found that it gets quite hairy when considering 3-5 combinations. Thank you very much!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away