I adapted my code to your last explanation and answer to @Reeza to get the wanted results:
data want(drop=ID treat date rename=(svid=ID svtrt=treat));
set have end=eof;
by ID notsorted treat;
retain svid svtrt start;
format start stop date9.;
if _N_ = 1 or first.ID then do;
svid = ID;
svtrt = treat;
start = date;
end;
else do;
if treat=2 then do;
if svtrt=1 then do;
stop = date-1;
output;
end;
svtrt=2;
end; else svtrt=1;
if treat=2 and last.treat then start=date+1;
if first.treat then svtreat=treat;
if last.ID then do;
stop=date;
if svtrt = 1 then output;
end;
end;
run;
proc sort data=want; by ID treat; run;
Please provide the logical rules to create the second data set.
@cuan wrote:
Dear Sas Community,
have the next problem, it is not trivial, please try to give me the dataset I am asking for below. Thank you very much in advance.
data have;
input id treat date: date9.;
format date date9.;
cards;
1 1 16oct2017
1 1 30oct2017
1 2 14nov2017
1 2 26nov2017
1 2 27nov2017
1 1 11dec2017
1 2 12dec2017
1 2 13dec2017
1 2 24dec2017
1 2 25dec2017
1 1 27dec2017
1 2 09jan2018
2 1 16oct2017
2 1 30oct2017
2 2 14nov2017
2 2 26nov2017
2 2 27nov2017
2 1 11dec2017
2 2 12dec2017
2 2 24dec2017
2 2 25dec2017
2 1 27dec2017
2 2 09jan2018
2 2 10jan2018
2 2 11jan2018
2 2 22jan2018
2 1 05feb2018
I need a dataset want:
subjid treat start stop
1 1 16oct2017 13nov2017
1 1 28nov2017 11dec2017
1 1 26dec2017 08jan2017
2 1 16oct2017 13nov2017
2 1 28nov2017 11dec2017
2 1 26dec2017 08jan2017
2 1 23jan2018 05feb2018
Thank you very much.
Cuan.
What does the date variable mean?
Does it mean that a person started taking treatment1/2 at this time?
I don't understand this portion:
Original Data:
1 2 27nov2017
1 1 11dec2017
Desired data:
1 1 28nov2017 11dec2017
Why is the start Nov 28, one day after Treatment2 starts? It seems like Treatment A, didn't start again until 11 December.
I adapted my code to your last explanation and answer to @Reeza to get the wanted results:
data want(drop=ID treat date rename=(svid=ID svtrt=treat));
set have end=eof;
by ID notsorted treat;
retain svid svtrt start;
format start stop date9.;
if _N_ = 1 or first.ID then do;
svid = ID;
svtrt = treat;
start = date;
end;
else do;
if treat=2 then do;
if svtrt=1 then do;
stop = date-1;
output;
end;
svtrt=2;
end; else svtrt=1;
if treat=2 and last.treat then start=date+1;
if first.treat then svtreat=treat;
if last.ID then do;
stop=date;
if svtrt = 1 then output;
end;
end;
run;
proc sort data=want; by ID treat; run;
Thanks a lot.
It was a challenging one.
Cuan.
The reason that doesn't make sense to me is that you use the exact opposite logic in the first two records where the A is assumed to carry forward, but here you're saying it doesn't carry forward.
1 1 16oct2017 13nov2017
From what I can see this is based on the date that is prior the drug regiment changing. So it seems like the logic isn't consistent to me.
Please check next code, though the result is not as expected:
data want(drop=ID treat date rename=(svid=ID svtrt=treat));
set have end=eof;
by ID notsorted treat;
retain svid svtrt start;
format start stop date9.;
if _N_ = 1 then do;
svid = ID;
svtrt = treat;
start = date;
end;
if first.treat and _N_>1 then do;
stop = date-1;
if svtrt = 1 then output;
svid = ID;
svtrt = treat;
start = date;
end;
if last.ID then do;
stop=date;
if svtrt = 1 then output;
end;
run;
proc sort data=want; by ID treat; run;
Hello @cuan,
Here's another, independently developed solution, so you can validate one against the other.
/* Create larger test dataset */
data have(drop=phase);
call streaminit(27182818);
do id=1 to 50000;
date='01JAN2000'd+rand('integer',10);
do phase=1 to rand('integer',50);
treat=rand('table',0.5);
output;
date+rand('integer',1,50); /* "('integer',0,50)" would cause ambiguities */
end;
end;
format date date9.;
run;
/* Create dataset with periods of treatment 1 */
data want(drop=date _: rename=(id=subjid));
array _t[%sysevalf('01JAN2000'd):%eval(%sysfunc(today())+100)] _temporary_;
do until(last.id);
set have;
by id date;
if first.id then _first_trt1=date;
if treat~=1 then do;
if _trt2_b=. then _trt2_b=date;
_trt2_e=date;
end;
if _trt2_e>. & (treat=1 | last.id) then do;
do _d=_trt2_b to _trt2_e;
_t[_d]=2;
end;
call missing(of _trt2:);
end;
end;
do _d=_first_trt1 to date;
if _t[_d]=. then _t[_d]=1;
end;
do _d=lbound(_t)+1 to hbound(_t)-1;
if _t[_d]=1 then do;
if _t[_d-1]~=1 then start=_d;
if _t[_d+1]~=1 then stop=_d;
end;
if n(start, stop)=2 then do;
treat=1;
output;
call missing(start, stop);
end;
end;
call missing(of _t[*]);
format start stop date9.;
run;
It's less elegant than @Shmuel's and also much slower on the large HAVE dataset created above (with more than one million observations). If your real dataset is much smaller (say, 1,000 patients, not 50,000), we're talking about fractions of a second, though. You can also improve performance by choosing more realistic lower and upper bounds (based on your data) for the temporary array _t, which just needs to cover all of your treatment dates +/- 1 day (which I assumed to fall between January 1, 2000 and 100 days from today).
Note that the resulting WANT datasets do not match for the large HAVE input dataset. Maybe they do with your real data because certain rare situations (like a patient treated only one day, with treat=1) don't occur in your data. You may want to check a few of the discrepancies (e.g. the single date difference for ID=6 of my simulated HAVE dataset) to get an idea of whether they are relevant for your application. The "ambiguities" mentioned in the comment above would arise if duplicate dates were allowed within an ID in dataset HAVE (e.g., two observations with the same ID and date, one with treat=1 and one with treat=2; it's not obvious what this would imply for the treatment on the preceding days, after an observation with treat=2). This would also cause more discrepancies between the WANT datasets of the two solutions.
Hello dear,
thank you very much for your code.
I compared it in my real data (10 patients), and I have patient took mono only 1 day, and @Shmuel code is working perfectly for this case also very well, that's why I gave him the valid solution, because it is working great
in my real data.
Regards,
Cuan.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.