I want to take the first and last date of a subject that meets a specific criteria. So where VISIT contains "PHASE #1", I want to store the first start date and time as TRTSDT and TRTSTM, respectively and the last end date and time as TRTEDT and TRTETM, respectively. Then I want to repeat the process for the remaining phases. Is there a way to do this without having to subset?
I essentially want something along the lines of:
data want;
set have;
by ID;
if findw(VISIT, 'PHASE#1') > 0 then do;
if TRTSDT = datepart(first.STDTC); TRTSTM = timepart(first.STDTC);
TRTEDT = datepart(last.ENDTC); TRTETM = timepart(last.ENDTC);
end;
run;
I have also tried using 'if first.ID then TRTSDT = datepart(first.STDTC)...' but it seemed to only look at first ID and last ID within the whole dataset (e.g. PHASE #1 DAY1 and PHASE#3 DAY2 for subject 1)
How about this:
if first.stdtc then do;
TRTSDT = datepart(STDTC);
trtstm = timepart(STDTC);
end;
with an optional
retain trtsdt trstm;
depending on what you are trying to do, which isn't 100% clear to me.
Remember that the data step is row-by-row always, it (normally) can't seek about the dataset randomly. (There is a way to do that, but it's nearly always the wrong approach - it's very slow and loses a bunch of functionality.)
What `first.var` does is it tells you whether the row you're on is *different from the previous row*. That's it! And what `last.var` does is it tells you if the row you're on is *different from the next row*. (Yes, SAS does sort of peek at the next row for this, it's the exception to the rule.) Mentally, in english, it tells you if you're on the "first" row of a sequence of that ID. It just returns 1 or 0 - true or false.
Unfortunately, it only works with the `by` variables - so we need to make your data work for that. We do need to subset the data, but we don't need to do it in a separate step - we can do it on the fly.
1. Let's subset your dataset while we do this, using where to only include the Phase1 folks.
where findw(VISIT, 'PHASE#1') > 0;
2. Let's retain the start variables, because we'll need them to hang around until the last row.
retain trtsdt trtstm;
3. Let's save aside the trtsdt and trtstm when we are on a first.id row.
if first.id then do;
trtsdt = datepart(stdtc);
trtstm = timepart(stdtc);
end;
4. Let's then save the trtedt/trtetm when we're on a last.id row, and output that row.
if last.id then do;
trtedt = datepart(stdtc);
trtetm = timepart(stdtc);
output;
end;
All together we have:
data want;
set have;
where findw(VISIT, 'PHASE#1') > 0;
by id;
retain trtsdt trtstm;
if first.id then do;
trtsdt = datepart(stdtc);
trtstm = timepart(stdtc);
end;
if last.id then do;
trtedt = datepart(stdtc);
trtetm = timepart(stdtc);
output;
end;
keep id trtsdt trtstm trtedt trtetm;
run;
Taking the DATEPART() or TIMEPART() of a binary variable is just silly. The DATEPART is always going to by zero, which is '01JAN1960'd. And the TIMEPART() is either going to be midnight or one second after midnight.
If you want to find the min/max datetime value per PHASE then you need to have a variable that indicates the PHASE.
data phase;
set have;
phase = scan(visit,2,' ');
run;
data want;
set phase ;
by ID phase;
if first.phase then start_dt=stddtc;
retain start_dt;
if last.phase;
end_dt=enddtc;
keep id phase start_dt end_dt;
run;
proc summary data=have nway;
class id;
var stdtc endtc;
output out=_minmax_(drop=_:) min(stdtc)=min_stdtc max(etdtc)=max_etdtc;
run;
data want;
merge have _minmax_;
by id;
trtsdt=datepart(min_stdtc);
trtstm=timepart(min_stdtc);
trtedt=datepart(max_etdtc);
trtetm=timepart(max_etdtc);
drop min_stdtc max_etdtc;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.