I need to get for each usubjid i need the first ECSTDTC(start date) and last ECENDTC (end date) so basically one record per usubjid.
proc sort data=ec out=ec7; by usubjid ECSTDTC; run;
data ec1;
set ec7;
by usubjid ECSTDTC ECENDTC;
if First.ECSTDTC and last.ECENDTC then output;
run;
can anyonew help me
That is not how the by variables work. They indicate whether the current record is the first or the last for the group defined by the by variable. So FIRST.ECSTDTC and LAST.ECENDTC could only be true if there is only one record for that value of ECSTDTC within that value of USUBJID.
If your data it properly sorted and has no missing values then you want.
data ec1;
set ec7;
by usubjid ;
retain first_start ;
if first.usubjid then first_start=ECSTDTC;
if last.usubjid ;
last_stop = ECENDTC;
keep usubjid first_start last_stop ;
run;
If you have missing values then you will need to add more logic and also retain the variable you use to store the last end date.
If you want to use ECENDTC in the BY statement in your data step, then you should also sort by that variable in your PROC SORT.
i used it but still doesnt work
How about:
proc summary data=ec7 nway;
class usubjid;
var ecstdtc ecendtc;
output out=want (keep=usubjid start_date end_date) min(ecstdtc)=start_date max(ecendtc)=end_date;
run;
unfortunately i cannot use proc summary as the dates are character dates.
It's easy enough to convert character dates to numeric. What format are they in?
It would guess that they're in year-month-day form, or else sorting them couldn't help. But one never knows. Give a couple of examples.
attaching test data
So these variables have both a date and a time. Do you care about the time portion, or just the date portion?
can usubjid be used in that code and take first and last observation? as i need time as well to calculate duration later
It would definitely work (but might be overkill) to process each variable separately:
proc sort data=have;
by usubjid ecstdtc;
run;
data start;
set have;
by usubjid ecstdtc;
if first.usubjid;
keep usubjid ecstdtc;
run;
proc sort data=have;
by usubjid ecendtc;
run;
data finish;
set have;
by usubjid ecendtc;
if last.usubjid;
keep usubjid ecendtc;
run;
Perhaps you can see the handwriting on the wall here ... your life will be a lot simpler down the road if you start out by converting those character variables to numeric DATETIMEs, since SAS is built to handle those easily.
That is not how the by variables work. They indicate whether the current record is the first or the last for the group defined by the by variable. So FIRST.ECSTDTC and LAST.ECENDTC could only be true if there is only one record for that value of ECSTDTC within that value of USUBJID.
If your data it properly sorted and has no missing values then you want.
data ec1;
set ec7;
by usubjid ;
retain first_start ;
if first.usubjid then first_start=ECSTDTC;
if last.usubjid ;
last_stop = ECENDTC;
keep usubjid first_start last_stop ;
run;
If you have missing values then you will need to add more logic and also retain the variable you use to store the last end date.
Thanks a lot Tom.
Can the new date format be converted to date9(characcter) in the same data step?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.