Hi all,
Please help me to code this. This is the dataset below-
I have 2 drugs being used for each ID i.e., SGA and DM. I have multiple start date and end dates for each ID and drug. I want to see if the DM_start falls into any of the SGA_start and SGA_end under each id. That is DM_start of period 3 can fall in the SGA_start and SGA_end of period1.
And also calculate the number of overlapping days for each time period. I want to know the no. of days for each orange period.
Thank you all in advance,
Any help would be appreciated.
data sga;
infile cards expandtabs;
input id (sga_start sga_end) (:mmddyy10.);
format sga_start sga_end mmddyy10. ;
cards;
1 1/1/17 7/1/17
1 10/1/17 12/1/17
2 2/1/17 4/1/17
2 9/1/17 11/1/17
;
data dm;
infile cards expandtabs;
input id (dm_start dm_end) (:mmddyy10.);
format dm_start dm_end mmddyy10. ;
cards;
1 4/1/17 5/1/17
1 6/1/17 12/1/17
2 3/1/17 4/1/17
2 9/1/17 12/1/17
;
data SGA2;
set sga;
label='SGA';
do date=SGA_START to SGA_END;
output;
end;
keep id date label;
format date date9.;
run;
data DM2;
set dm;
label='DM ';
do date=DM_START to DM_END;
output;
end;
keep id date label;
format date date9.;
run;
data temp;
set SGA2 DM2;
run;
proc sort data=temp nodupkey;
by id date label;
run;
data temp2;
do until(last.date);
set temp;
by id date;
length tag $ 40;
tag=catx('|',tag,label);
end;
drop label;
run;
data temp3;
set temp2;
by id tag notsorted;
if first.tag or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select id,group,tag,min(date) as start format=mmddyy10.,max(date) as end format=mmddyy10.,count(*) as days
from temp3
group by id,group,tag
having findc(tag,'|')
;
quit;
You need to describe what that picture means a lot better. For example, your first three observations have the exact same SGA start and end for period values of 1, 2 and 3. I have hard time understanding just that bit at all.
All of the SGA start/end pairs shown as examples are start and end on the same day so your picture of 'overlap' doesn't seem to apply to any SGA values.
If you want help with real code then you need to provide something we can code with.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
This does not have to be actual data but should behave in similar manner.
Also, if this picture is from a set you built from different sources it may not be structured well to analyze as you want and perhaps examples of the sets before this was build is more appropriate.
Thank you for the reply.
Yes, in the picture SGA_start and end is just one for those ID, but there are different IDs which have a different SGA _start and end.
So I want to check each DM_start date-- if it falls between each of the SGA_start and SGA_end date of that observation.
data WORK.DM_SGA1;
infile datalines dsd truncover;
input ENROLID:32. SVCDATE:MMDDYY10. INDEX_SGA:MMDDYY10. SGA_START:MMDDYY10. SGA_END:MMDDYY10. INDEX_DM:MMDDYY10. DM_START:MMDDYY10. DM_END:MMDDYY10.;
format SVCDATE MMDDYY10. INDEX_SGA MMDDYY10. SGA_START MMDDYY10. SGA_END MMDDYY10. INDEX_DM MMDDYY10. DM_START MMDDYY10. DM_END MMDDYY10.;
label ENROLID="Enrollee ID" SVCDATE="Date Service Incurred";
datalines;
27264303 07/18/2017 01/15/2017 05/27/2019 05/27/2019 02/20/2017 07/18/2017 07/18/2017
27264303 11/03/2018 01/15/2017 05/27/2019 05/27/2019 02/20/2017 11/03/2018 11/03/2018
27264303 05/06/2019 01/15/2017 05/27/2019 05/27/2019 02/20/2017 05/06/2019 05/06/2019
28584803 03/07/2017 03/07/2017 12/09/2019 12/09/2019 03/07/2017 03/07/2017 03/07/2017
28584803 08/28/2019 03/07/2017 12/09/2019 12/09/2019 03/07/2017 08/28/2019 08/28/2019
28584803 11/26/2019 03/07/2017 12/09/2019 12/09/2019 03/07/2017 11/26/2019 11/26/2019
93277105 05/08/2018 01/16/2017 05/23/2017 05/23/2017 02/06/2018 05/08/2018 05/08/2018
93277105 11/05/2018 01/16/2017 01/07/2018 01/07/2018 02/06/2018 11/05/2018 11/05/2018
93277105 01/11/2019 01/16/2017 05/01/2018 05/01/2018 02/06/2018 01/11/2019 01/11/2019
93277105 09/29/2019 01/16/2017 07/27/2018 07/27/2018 02/06/2018 09/29/2019 09/29/2019
93277105 12/16/2019 01/16/2017 09/28/2018 09/28/2018 02/06/2018 12/16/2019 12/16/2019
93277105 12/19/2018 01/16/2017 12/19/2018 12/19/2018 02/06/2018 12/16/2019 12/16/2019
93277105 08/04/2019 01/16/2017 08/04/2019 08/04/2019 02/06/2018 12/16/2019 12/16/2019
93277105 12/18/2019 01/16/2017 12/18/2019 12/18/2019 02/06/2018 12/16/2019 12/16/2019
97434102 12/06/2017 09/05/2017 12/27/2017 12/27/2017 09/05/2017 12/06/2017 12/06/2017
;;;;
The sample data you've posted needs rework to define periods >0 and cases with overlaps.
Once you created useful sample data you could try if below code gets you closer to what you want.
data work.have;
infile datalines dsd dlm=' ' truncover;
input ENROLID:32. SVCDATE:MMDDYY10. INDEX_SGA:MMDDYY10. SGA_START:MMDDYY10. SGA_END:MMDDYY10. INDEX_DM:MMDDYY10. DM_START:MMDDYY10. DM_END:MMDDYY10.;
format SVCDATE INDEX_SGA SGA_START SGA_END INDEX_DM DM_START DM_END date9.;
label ENROLID="Enrollee ID" SVCDATE="Date Service Incurred";
datalines4;
27264303 07/18/2017 01/15/2017 05/27/2019 05/27/2019 02/20/2017 07/18/2017 07/18/2017
27264303 11/03/2018 01/15/2017 05/27/2019 05/27/2019 02/20/2017 11/03/2018 11/03/2018
27264303 05/06/2019 01/15/2017 05/27/2019 05/27/2019 02/20/2017 05/06/2019 05/06/2019
28584803 03/07/2017 03/07/2017 12/09/2019 12/09/2019 03/07/2017 03/07/2017 03/07/2017
28584803 08/28/2019 03/07/2017 12/09/2019 12/09/2019 03/07/2017 08/28/2019 08/28/2019
28584803 11/26/2019 03/07/2017 12/09/2019 12/09/2019 03/07/2017 11/26/2019 11/26/2019
93277105 05/08/2018 01/16/2017 05/23/2017 05/23/2017 02/06/2018 05/08/2018 05/08/2018
93277105 11/05/2018 01/16/2017 01/07/2018 01/07/2018 02/06/2018 11/05/2018 11/05/2018
93277105 01/11/2019 01/16/2017 05/01/2018 05/01/2018 02/06/2018 01/11/2019 01/11/2019
93277105 09/29/2019 01/16/2017 07/27/2018 07/27/2018 02/06/2018 09/29/2019 09/29/2019
93277105 12/16/2019 01/16/2017 09/28/2018 09/28/2018 02/06/2018 12/16/2019 12/16/2019
93277105 12/19/2018 01/16/2017 12/19/2018 12/19/2018 02/06/2018 12/16/2019 12/16/2019
93277105 08/04/2019 01/16/2017 08/04/2019 08/04/2019 02/06/2018 12/16/2019 12/16/2019
93277105 12/18/2019 01/16/2017 12/18/2019 12/18/2019 02/06/2018 12/16/2019 12/16/2019
97434102 12/06/2017 09/05/2017 12/27/2017 12/27/2017 09/05/2017 12/06/2017 12/06/2017
;;;;
data work.prep;
length row_id 8;
set work.prep;
row_id+1;
keep row_id ENROLID sga_: dm_:;
run;
proc sql feedback;
/* create table want as*/
select
l.*
,case
when nmiss(l.SGA_END,r.DM_END,l.SGA_START,r.DM_START)=0 then
min(l.SGA_END,r.DM_END) - max(l.SGA_START,r.DM_START)
else 0
end as overlap_days
/* ,l.SGA_START as l_sga_start format=date9.*/
/* ,l.SGA_END as l_sga_end format=date9.*/
/* ,r.SGA_START as r_sga_start format=date9.*/
/* ,r.SGA_END as r_sga_end format=date9.*/
/* ,l.row_id as l_row_id*/
/* ,r.row_id as r_row_id*/
from work.prep l
left join work.prep r
on
l.ENROLID=r.ENROLID
and min(l.SGA_END,r.DM_END) - max(l.SGA_START,r.DM_START)>0
and l.row_id<=r.row_id
order by l.row_id
;
quit;
Please note that in principle a DM period could overlap with multiple SGA periods (defined in multiple rows). Above code will return one row per overlap (or just the source row if no overlap).
If you prefer to store this data in a single row within multiple variables then you would need to transpose the result back to a single row by row_id.
I already defined periods as having gaps >60 days for both SGA and DM. So the screen shot I am showing is after taking the gaps of >60 days into account and I need to do cases with overlap >30 days.
I did the identification of fill dates and gaps of 60 days for drug SGA in a dataset X and the same DM in a dataset Y and merged both. The result is the dataset screenshot which I posted.
However, The above code did not work. As I see the dataset., there are overlaps but it has given me zero overlaps for all observations. Is there another way to do it?
As I see the dataset., there are overlaps but it has given me zero overlaps for all observations. Is there another way to do it?
If above statement is related to the code sample I shared earlier:
The overlaps are all zero because it uses the data you shared via data step and there start and end dates have always the same date and though there aren't any overlaps. You can amend your sample data (data have; step) so they contain overlapping periods and then test again if the script now returns the overlaps you're after.
Is there another way to do it?
If you are after some sample code then you need to
1) provide sample data via data step (Have) ,
2) show the desired result based on the sample data (Want),
3) explain the logic required to get from have to want.
Thanks for your help.
Going a step back to see if I did it correctly...So, I have 2 datasets- one with drug sga with their start and end dates (SGA_Start and SGA_end) I have imposed the criteria of maximum gap of 60 days and got these multiple start and end dates for each ID.
data WORK.SGA_INDEX5;
infile datalines dsd truncover;
input ENROLID:32. INDEX_SGA:MMDDYY10. PERIOD_SGA:32. SGA_START:MMDDYY10. SGA_END:MMDDYY10.;
format INDEX_SGA MMDDYY10. SGA_START MMDDYY10. SGA_END MMDDYY10.;
label ENROLID="Enrollee ID";
datalines;
13442404 02/12/2018 1 02/12/2018 02/12/2018
13442404 02/12/2018 2 08/01/2019 08/01/2019
13442404 02/12/2018 3 11/07/2019 11/07/2019
13764305 02/07/2018 1 02/07/2018 02/07/2018
13764305 02/07/2018 2 05/10/2018 05/10/2018
13764305 02/07/2018 3 08/04/2018 08/04/2018
13764305 02/07/2018 4 11/08/2018 11/08/2018
13764305 02/07/2018 5 02/14/2019 02/14/2019
13764305 02/07/2018 6 05/15/2019 05/15/2019
13764305 02/07/2018 7 08/13/2019 08/13/2019
13764305 02/07/2018 8 11/14/2019 11/14/2019
14105204 02/17/2018 1 09/19/2018 09/19/2018
14105204 02/17/2018 2 01/21/2019 01/21/2019
14958903 01/23/2018 1 07/15/2018 07/15/2018
14958903 01/23/2018 2 10/09/2018 10/09/2018
14958903 01/23/2018 3 12/11/2019 12/11/2019
15715903 06/04/2018 1 06/04/2018 06/04/2018
15818504 10/02/2019 2 12/13/2019 12/13/2019
26990304 02/11/2017 1 12/30/2019 12/30/2019
27174506 01/13/2017 1 06/11/2018 06/11/2018
27193504 06/20/2017 1 06/20/2017 06/20/2017
27193504 06/20/2017 2 08/27/2017 08/27/2017
27193504 06/20/2017 3 12/30/2019 12/30/2019
27264303 01/15/2017 1 05/27/2019 05/27/2019
27274805 05/25/2017 1 06/09/2017 06/09/2017
;;;;
And the second dataset is the drug DM with their start and end dates (DM_Start and DM_END). I have used the same criteria of maximum of 60 days gap and got these multiple start and end dates of these drug.
data WORK.DM_INDEX5;
infile datalines dsd truncover;
input ENROLID:32. INDEX_DM:MMDDYY10. PERIOD_dm:32. DM_START:MMDDYY10. DM_END:MMDDYY10.;
format INDEX_DM MMDDYY10. DM_START MMDDYY10. DM_END MMDDYY10.;
label ENROLID="Enrollee ID";
datalines;
27264303 02/20/2017 1 07/18/2017 07/18/2017
27264303 02/20/2017 2 11/03/2018 11/03/2018
27264303 02/20/2017 3 05/06/2019 05/06/2019
28584803 03/07/2017 1 03/07/2017 03/07/2017
28584803 03/07/2017 2 08/28/2019 08/28/2019
28584803 03/07/2017 3 11/26/2019 11/26/2019
33471402 07/17/2019 1 12/20/2019 12/20/2019
93277105 02/06/2018 1 05/08/2018 05/08/2018
93277105 02/06/2018 2 11/05/2018 11/05/2018
93277105 02/06/2018 3 01/11/2019 01/11/2019
93277105 02/06/2018 4 09/29/2019 09/29/2019
93277105 02/06/2018 5 12/16/2019 12/16/2019
97434102 09/05/2017 1 12/06/2017 12/06/2017
99153403 01/16/2017 1 02/21/2017 02/21/2017
99153403 01/16/2017 2 07/24/2017 07/24/2017
100012102 08/08/2017 1 08/08/2017 08/08/2017
100012102 08/08/2017 2 11/14/2017 11/14/2017
100274803 01/16/2017 1 01/16/2017 01/16/2017
100303908 03/19/2017 2 03/19/2017 03/19/2017
100303908 03/19/2017 3 12/04/2017 12/04/2017
101032604 03/30/2017 1 04/28/2017 04/28/2017
101032604 03/30/2017 2 08/30/2017 08/30/2017
101032604 03/30/2017 3 11/25/2017 11/25/2017
102580704 01/12/2017 1 04/26/2017 04/26/2017
102580704 01/12/2017 2 09/15/2017 09/15/2017
;;;;
I am looking for the all the durations between dm_start and dm_end dates which fall between any of the sga_start and sga_end.
data I have | ||||
id | sga_start | sga_end | ||
1 | 1/1/17 | 7/1/17 | ||
1 | 10/1/17 | 12/1/17 | ||
2 | 2/1/17 | 4/1/17 | ||
2 | 9/1/17 | 11/1/17 | ||
id | dm_start | dm_end | ||
1 | 4/1/17 | 5/1/17 | ||
1 | 6/1/17 | 12/1/17 | ||
2 | 3/1/17 | 4/1/17 | ||
2 | 9/1/17 | 12/1/17 | ||
data I want | ||||
id | start | end | day | |
1 | overlap1 | 4/1/17 | 5/1/17 | 30 |
overlap2 | 6/1/17 | 7/1/17 | 91 | |
overlap3 | 10/1/17 | 12/1/17 | 60 | |
2 | overlap1 | 3/1/17 | 4/1/17 | 30 |
overlap2 | 9/1/17 | 11/1/17 | 61 |
data sga;
infile cards expandtabs;
input id (sga_start sga_end) (:mmddyy10.);
format sga_start sga_end mmddyy10. ;
cards;
1 1/1/17 7/1/17
1 10/1/17 12/1/17
2 2/1/17 4/1/17
2 9/1/17 11/1/17
;
data dm;
infile cards expandtabs;
input id (dm_start dm_end) (:mmddyy10.);
format dm_start dm_end mmddyy10. ;
cards;
1 4/1/17 5/1/17
1 6/1/17 12/1/17
2 3/1/17 4/1/17
2 9/1/17 12/1/17
;
data SGA2;
set sga;
label='SGA';
do date=SGA_START to SGA_END;
output;
end;
keep id date label;
format date date9.;
run;
data DM2;
set dm;
label='DM ';
do date=DM_START to DM_END;
output;
end;
keep id date label;
format date date9.;
run;
data temp;
set SGA2 DM2;
run;
proc sort data=temp nodupkey;
by id date label;
run;
data temp2;
do until(last.date);
set temp;
by id date;
length tag $ 40;
tag=catx('|',tag,label);
end;
drop label;
run;
data temp3;
set temp2;
by id tag notsorted;
if first.tag or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select id,group,tag,min(date) as start format=mmddyy10.,max(date) as end format=mmddyy10.,count(*) as days
from temp3
group by id,group,tag
having findc(tag,'|')
;
quit;
It worked. Thank you
Hi,
I have one follow up question regarding this,
I want to know the duration between "SGA_START" and "start". When I merge dataset 'want' with 'sga2', the time period is not overlapping horizontally and is not capturing all the overlap periods, Can you help me with the code.
Thank you
So you don't take into account of variable ENROLID ?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.