In CM data, I want to find out if we have any medication with overlapping dates with eachother.
data test;
input subjid cmseq cmtrt cmstdtc cmendtc ;
data lines;
101 1 Paracetomal 2019-10-20 2019-10-30
101 2 Acitretin 2019-10-21 2019-11-01
101 3 Apixaban 2019-11-05 2019-11-10
102 1 Paracetomal 2019-01-01 2019-01-30
102 2 Acitretin 2019-01-05 2019-01-10
102 3 Apixaban 2019-01-09 2019-02-10
;
run;
in the above example, for subject#102, at 09jan, we have all 3 medication on going and
should come in the output. where as subject #101, have only 2 medication time is overlapping.
If you want a list of all medications that overlap with at least one other medication, you can approach it by comparing each medication to every other medication for the same subject and seeing if they overlap. In PROC SQL you could do it like this:
data test;
length cmtrt $20;
input subjid cmseq cmtrt $ cmstdtc cmendtc ;
informat cmstdtc cmendtc e8601da.;
format cmstdtc cmendtc e8601da.;
datalines;
101 1 Paracetomal 2019-10-20 2019-10-30
101 2 Acitretin 2019-10-21 2019-11-01
101 3 Apixaban 2019-11-05 2019-11-10
102 1 Paracetomal 2019-01-01 2019-01-30
102 2 Acitretin 2019-01-05 2019-01-10
102 3 Apixaban 2019-01-09 2019-02-10
;
run;
proc sql;
create table overlapping1 as
select distinct a.subjid, a.cmseq, a.cmtrt, a.cmstdtc, a.cmendtc
from test as a
full join test as b
on a.subjid = b.subjid
having a.cmstdtc < b.cmstdtc < a.cmendtc or b.cmstdtc < a.cmstdtc < b.cmendtc
order by subjid, cmseq
;
quit;
The data step solution is much more involved, but perhaps more efficient if each subject has many concomitant medications:
data overlapping2(keep=subjid cmseq cmtrt cmstdtc cmendtc);
set test(rename=(subjid=subjid1 cmseq=cmseq1 cmtrt=cmtrt1 cmstdtc=cmstdtc1 cmendtc=cmendtc1));
by subjid1;
if not last.subjid1 then do;
i = _n_ + 1;
do while (i <= n);
set test(rename=(subjid=subjid2 cmseq=cmseq2 cmtrt=cmtrt2 cmstdtc=cmstdtc2 cmendtc=cmendtc2)) point=i nobs=n;
if subjid1 ne subjid2 then leave;
if cmstdtc1 < cmstdtc2 < cmendtc1 or cmstdtc2 < cmstdtc1 < cmendtc2 then do;
subjid = subjid1;
cmseq = cmseq1;
cmtrt = cmtrt1;
cmstdtc = cmstdtc1;
cmendtc = cmendtc1;
format cmstdtc cmendtc e8601da.;
output;
subjid = subjid2;
cmseq = cmseq2;
cmtrt = cmtrt2;
cmstdtc = cmstdtc2;
cmendtc = cmendtc2;
output;
end;
i + 1;
if last.subjid2 then leave;
end;
end;
run;
proc sort data=overlapping2 nodup;
by subjid cmseq;
run;
Either of those will yield the same result.
Note that I fixed up your input data step and changed the dates to numeric variables. Let me know if you want me to explain my code at all.
First step: make sure that your example data step code will actually run.
The input statement as written expects the variables cmtrt cmstdtc cmendtc to be numeric. Cmtrt is obviously a character value and would have to be read with a character informat long enough, at a minimum, to hold 11 characters.
The dates, to be treated as date values and possibly have any overlap would need to be read with and informat that looks like it should be the yymmdd10. It is also a good idea to assign a date type format to the variables so you can tell.
Also the keyword is "datalines;" no space.
It is also a good idea on this forum to paste code, or Log entries, in a text box opened on the forum using the </> icon that appears above the message windows. The forum software will reformat text pasted into the message windows and can result in code that will not run.
So your data step should look more like:
data test; input subjid cmseq cmtrt :$11. cmstdtc :yymmdd10. cmendtc :YYmmdd10.; format cmstdtc cmendtc yymmdd10.; datalines; 101 1 Paracetomal 2019-10-20 2019-10-30 101 2 Acitretin 2019-10-21 2019-11-01 101 3 Apixaban 2019-11-05 2019-11-10 102 1 Paracetomal 2019-01-01 2019-01-30 102 2 Acitretin 2019-01-05 2019-01-10 102 3 Apixaban 2019-01-09 2019-02-10 ; run;
Next you need to indicate how you expect your output to look. There a many ways that we might report an overlap. I suspect that my first choice wouldn't match your expectations. It is a good idea to provide rules instead of examples. I might come up with something that only works when there are exactly 3 records for each subjid that would not work with subjid's that have more or fewer records.
@sasi_naidu2005 wrote:
In CM data, I want to find out if we have any medication with overlapping dates with eachother.data test;
input subjid cmseq cmtrt cmstdtc cmendtc ;
data lines;
101 1 Paracetomal 2019-10-20 2019-10-30
101 2 Acitretin 2019-10-21 2019-11-01
101 3 Apixaban 2019-11-05 2019-11-10
102 1 Paracetomal 2019-01-01 2019-01-30
102 2 Acitretin 2019-01-05 2019-01-10
102 3 Apixaban 2019-01-09 2019-02-10
;
run;in the above example, for subject#102, at 09jan, we have all 3 medication on going and
should come in the output. where as subject #101, have only 2 medication time is overlapping.
Thank you for arranging the input data. Yes. The input dataset might have may records. I want to see only the subject with more than 2 CM records overlapping at least once, then we
need it in output.
If you want a list of all medications that overlap with at least one other medication, you can approach it by comparing each medication to every other medication for the same subject and seeing if they overlap. In PROC SQL you could do it like this:
data test;
length cmtrt $20;
input subjid cmseq cmtrt $ cmstdtc cmendtc ;
informat cmstdtc cmendtc e8601da.;
format cmstdtc cmendtc e8601da.;
datalines;
101 1 Paracetomal 2019-10-20 2019-10-30
101 2 Acitretin 2019-10-21 2019-11-01
101 3 Apixaban 2019-11-05 2019-11-10
102 1 Paracetomal 2019-01-01 2019-01-30
102 2 Acitretin 2019-01-05 2019-01-10
102 3 Apixaban 2019-01-09 2019-02-10
;
run;
proc sql;
create table overlapping1 as
select distinct a.subjid, a.cmseq, a.cmtrt, a.cmstdtc, a.cmendtc
from test as a
full join test as b
on a.subjid = b.subjid
having a.cmstdtc < b.cmstdtc < a.cmendtc or b.cmstdtc < a.cmstdtc < b.cmendtc
order by subjid, cmseq
;
quit;
The data step solution is much more involved, but perhaps more efficient if each subject has many concomitant medications:
data overlapping2(keep=subjid cmseq cmtrt cmstdtc cmendtc);
set test(rename=(subjid=subjid1 cmseq=cmseq1 cmtrt=cmtrt1 cmstdtc=cmstdtc1 cmendtc=cmendtc1));
by subjid1;
if not last.subjid1 then do;
i = _n_ + 1;
do while (i <= n);
set test(rename=(subjid=subjid2 cmseq=cmseq2 cmtrt=cmtrt2 cmstdtc=cmstdtc2 cmendtc=cmendtc2)) point=i nobs=n;
if subjid1 ne subjid2 then leave;
if cmstdtc1 < cmstdtc2 < cmendtc1 or cmstdtc2 < cmstdtc1 < cmendtc2 then do;
subjid = subjid1;
cmseq = cmseq1;
cmtrt = cmtrt1;
cmstdtc = cmstdtc1;
cmendtc = cmendtc1;
format cmstdtc cmendtc e8601da.;
output;
subjid = subjid2;
cmseq = cmseq2;
cmtrt = cmtrt2;
cmstdtc = cmstdtc2;
cmendtc = cmendtc2;
output;
end;
i + 1;
if last.subjid2 then leave;
end;
end;
run;
proc sort data=overlapping2 nodup;
by subjid cmseq;
run;
Either of those will yield the same result.
Note that I fixed up your input data step and changed the dates to numeric variables. Let me know if you want me to explain my code at all.
You'd better post the output you are looking for .
data test;
input subjid cmseq cmtrt :$11. cmstdtc :yymmdd10. cmendtc :YYmmdd10.;
format cmstdtc cmendtc yymmdd10.;
datalines;
101 1 Paracetomal 2019-10-20 2019-10-30
101 2 Acitretin 2019-10-21 2019-11-01
101 3 Apixaban 2019-11-05 2019-11-10
102 1 Paracetomal 2019-01-01 2019-01-30
102 2 Acitretin 2019-01-05 2019-01-10
102 3 Apixaban 2019-01-09 2019-02-10
;
run;
data temp;
set test;
do date=cmstdtc to cmendtc;
output;
end;
keep subjid date cmtrt;
format date yymmdd10.;
run;
proc sort data=temp;by subjid date cmtrt;run;
data temp2;
set temp;
by subjid date ;
if not (first.date and last.date);
run;
data want;
do until(last.date);
set temp2;
by subjid date;
length comb $ 200;
comb=catx('|',comb,cmtrt);
end;
drop cmtrt;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.