05-01-2017 06:34 PM
I was wondering if I could get your help with modifying the PROC SQL code, marked as solved here:
so that it outputs 2 additional columns containing the 2 dates that were used to compute the variable 'overlap' days. I can do it for two drug combinations in a simple separate data step, but I found that the code would get quite hairy when comparing across start and end dates for 3-5 different drugs. Any help would be much appreciated! Thank you very much!
05-01-2017 08:04 PM
Aren't those the dates already included in your current code? I had to adjust the input statement so that it read the dates correctly:
data have; infile cards dlm=',' dsd; informat START_DT MMDDYY8.; informat END_DT MMDDYY8.; input ID DRUG $ START_DT DAYS_SUPP END_DT; FORMAT START_DT END_DT MMDDYY10.; cards; 1,A,2/17/10,30,3/19/10 1,B,5/6/09,30,6/5/09 1,C,7/9/11,60,9/7/11 1,E,3/1/10,90,5/30/10 1,B,1/1/09,90,4/1/09 1,D,2/1/09,30,3/3/09 1,C,5/6/12,90,8/4/12 2,B,4/1/12,60,5/31/12 2,A,7/1/10,30,7/31/10 2,C,8/3/10,90,11/1/10 2,D,11/1/13,90,1/30/14 2,E,12/5/13,90,3/5/14 2,A,2/1/11,90,5/2/11 ; RUN; PROC SQL; CREATE TABLE HAVE AS SELECT MONOTONIC() AS ROWID, * FROM HAVE ; QUIT; %let fromDate = 01Jan2000; %let toDate = 31DEC2016; data alldates; length key 8; do date = "&fromDate"d to "&toDate"d; key + 1; RSA_WorkdayInd = ( 2 <= weekday(date) <= 6); WeekendInd = (not RSA_WorkdayInd); Calendar_Week_Number = week(date, "V"); DayOfWeek = put(date, downame3.); Fin_Year = year(intnx("year.7", date, 0)); output; end; format date date9. ; run; PROC SQL; CREATE TABLE OVERLAP AS SELECT HAVE.*, HAVE2.ROWID AS ROWID2, HAVE2.ID AS ID2, HAVE2.START_DT AS START2, HAVE2.DAYS_SUPP AS DAYS2, HAVE2.END_DT AS END2, (SELECT COUNT(*) FROM alldates WHERE DATE >= HAVE.START_DT AND DATE >= HAVE2.START_DT AND DATE <= HAVE.END_DT AND DATE <= HAVE2.END_DT) AS OVERLAP FROM HAVE, HAVE AS HAVE2 WHERE HAVE.ROWID NE HAVE2.ROWID AND HAVE.ID = HAVE2.ID AND (HAVE.START_DT <= HAVE2.END_DT) and (HAVE2.START_DT <= HAVE.END_DT) ; QUIT;
Art, CEO, AnalystFinder.com
05-01-2017 09:08 PM - edited 05-01-2017 11:53 PM
Thank you very much for your response. It is just that in some cases, the 'overlap' column is the difference between have.start_dt and have2.end_t, or between have2.start_dt and have.end_dt, or have.start_dt and have.end_dt or have2.start_dt and have2.end_dt. To illustrate:
overlap is the difference between dates a and b
a. _______ b.
overlap is the difference between dates c and d
overlap is the difference between dates c and b
overlap is the difference between dates a and d
I was wondering if there is a way to output the two dates that were used to compute the overlapping days. Thank you very much for any help you could give!
05-02-2017 05:11 PM - edited 05-02-2017 05:12 PM
Thank you very much for your response. I think your code seems to be exactly what I need. Curiously, the 'overlap' variable calculated using the long sql code above agrees with your code (i.e. the difference between the two dates identified using min and max) agrees about 99% of the time but not a 100%. I may have to troubleshoot why that's happening but your code seems to be more accurate for calculating 'overlap' Thank you again!
05-02-2017 12:19 AM - edited 05-02-2017 12:44 AM
Does the following do what you want?
data want; set overlap; array dif(4) ab cd cb ad; dif(1)=end_dt-start_dt+1; dif(2)=end2-start2+1; dif(3)=end_dt-start2+1; dif(4)=end2-start_dt+1; x=vname(dif(whichn(overlap,of dif(*)))); run;
and, if so, you could also get the same result by adding one more step to your proc sql code, namely:
PROC SQL; CREATE TABLE OVERLAP AS SELECT HAVE.*, HAVE2.ROWID AS ROWID2, HAVE2.ID AS ID2, HAVE2.START_DT AS START2, HAVE2.DAYS_SUPP AS DAYS2, HAVE2.END_DT AS END2, (SELECT COUNT(*) FROM alldates WHERE DATE >= HAVE.START_DT AND DATE >= HAVE2.START_DT AND DATE <= HAVE.END_DT AND DATE <= HAVE2.END_DT) AS OVERLAP FROM HAVE, HAVE AS HAVE2 WHERE HAVE.ROWID NE HAVE2.ROWID AND HAVE.ID = HAVE2.ID AND (HAVE.START_DT <= HAVE2.END_DT) and (HAVE2.START_DT <= HAVE.END_DT) ; create table want as select *, case when end_dt-start_dt+1=overlap then 'ab' when end2-start2+1=overlap then 'cd' when end_dt-start2+1=overlap then 'cb' when end2-start_dt+1=overlap then 'ad' else 'uk' end as result from overlap ; QUIT;
Art, CEO, AnalystFinder.com
05-02-2017 05:05 PM - edited 05-02-2017 05:14 PM
Thank you so much Art for your help! I am trying to extract the actual dates that were used to compute 'overlap' so that I would have two more columns containing the overlap start date and the overlap end date. The post by PGStats seems to be working. Thank you again for your help! Appreciate it!
Need further help from the community? Please ask a new question.