Hi Team,
How to flag last hour records by group vars for below data.
In below data sort by AVISIT ATPTN ADT ADTM then need to flag new var where last hour records by AVISIT and ATPTN
SUBJID | PARAMCD | AVISIT | ATPT | ADTM | ADT | NEWFLAG | |
3002 | ALT | Screening | 01JAN2024:13:58:15 | 01-Jan-24 | N | ||
3002 | ALT | Screening | 01JAN2024:13:58:56 | 01-Jan-24 | N | ||
3002 | ALT | Screening | 01JAN2024:13:53:39 | 01-Jan-24 | N | ||
3002 | ALT | Screening | 01JAN2024:13:56:05 | 01-Jan-24 | N | ||
3002 | ALT | Screening | 01JAN2024:13:57:37 | 01-Jan-24 | N | ||
3002 | ALT | Screening | 01JAN2024:14:05:31 | 01-Jan-24 | Y | ||
3002 | ALT | Screening | 01JAN2024:14:06:13 | 01-Jan-24 | Y | ||
3002 | ALT | Screening | 01JAN2024:14:06:53 | 01-Jan-24 | Y | ||
3002 | ALT | Cycle Day 1 | PreDose | 13JAN2024:10:20:10 | 13-Jan-24 | Y | |
3002 | ALT | Cycle Day 1 | PreDose | 13JAN2024:10:22:20 | 13-Jan-24 | Y | |
3002 | ALT | Cycle Day 1 | PreDose | 13JAN2024:10:23:30 | 13-Jan-24 | Y | |
3002 | ALT | Cycle Day 1 | 1Hour | 13JAN2024:13:26:21 | 13-Jan-24 | Y | |
3002 | ALT | Cycle Day 1 | 1Hour | 13JAN2024:13:27:18 | 13-Jan-24 | Y | |
3002 | ALT | Cycle Day 1 | 1Hour | 13JAN2024:13:28:29 | 13-Jan-24 | Y | |
3002 | ALT | Cycle Day 1 | 1Hour | 13JAN2024:13:45:05 | 13-Jan-24 | Y | |
3002 | ALT | Cycle Day 1 | 1Hour | 13JAN2024:13:46:32 | 13-Jan-24 | Y | |
3002 | ALT | Cycle Day 1 | 1Hour | 13JAN2024:13:47:45 | 13-Jan-24 | Y |
Thank you,
Raja
First let's convert your sample listing into an actual dataset. Let's make sure the ADTM variable is an actual DATETIME variable and eliminate that extra empty column. let's rename your provide results as EXPECT so we can compare whether our code works.
data have;
infile datalines dsd truncover ;
input SUBJID $ PARAMCD $ AVISIT :$20. ATPT $ ADTM :datetime. ADT :date. extra $ EXPECT $;
format ADTM datetime19. ADT date9.;
drop extra ;
datalines;
3002,ALT,Screening, ,01JAN2024:13:58:15,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:13:58:56,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:13:53:39,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:13:56:05,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:13:57:37,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:14:05:31,01-Jan-24, ,Y
3002,ALT,Screening, ,01JAN2024:14:06:13,01-Jan-24, ,Y
3002,ALT,Screening, ,01JAN2024:14:06:53,01-Jan-24, ,Y
3002,ALT,Cycle Day 1,PreDose,13JAN2024:10:20:10,13-Jan-24, ,Y
3002,ALT,Cycle Day 1,PreDose,13JAN2024:10:22:20,13-Jan-24, ,Y
3002,ALT,Cycle Day 1,PreDose,13JAN2024:10:23:30,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:26:21,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:27:18,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:28:29,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:45:05,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:46:32,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:47:45,13-Jan-24, ,Y
;
Your data is not actually sorted by the grouping variables. So you will probably need to sort it first. Since it is much easier to remember the last hour than to have to look into the future to find it we should sort by descending datetime.
proc sort data=have;
by SUBJID PARAMCD AVISIT ATPT descending ADTM;
run;
Now you just need to remember what hour it was on the first observation for the group and then compare that to the current hour to make you new flag variable.
data want;
set have;
by SUBJID PARAMCD AVISIT ATPT descending ADTM;
if first.atpt then firsthr = intnx('dthour',adtm,0);
format firsthr datetime19.;
retain firsthr ;
flag = (intnx('dthour',adtm,0)=firsthr);
drop firsthr;
run;
Result
If you really want that character variable instead of the easier boolean numeric variable then just use some IF/THEN logic instead of the simple assignment statement.
if (intnx('dthour',adtm,0)=firsthr) then flag='Y' ;
else flag='N';
Is that last YN variable what you want to get? If not can you provide the output data for this input?
If it is then explain in more detail the logic used to create it.
Let's assume you meant that the data is grouped by the first four variables:
SUBJID PARAMCD AVISIT ATPT
If so then why do you flag THREE of the observations as being the LAST for for the first group?
Is it because they appear on the same date?
First let's convert your sample listing into an actual dataset. Let's make sure the ADTM variable is an actual DATETIME variable and eliminate that extra empty column. let's rename your provide results as EXPECT so we can compare whether our code works.
data have;
infile datalines dsd truncover ;
input SUBJID $ PARAMCD $ AVISIT :$20. ATPT $ ADTM :datetime. ADT :date. extra $ EXPECT $;
format ADTM datetime19. ADT date9.;
drop extra ;
datalines;
3002,ALT,Screening, ,01JAN2024:13:58:15,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:13:58:56,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:13:53:39,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:13:56:05,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:13:57:37,01-Jan-24, ,N
3002,ALT,Screening, ,01JAN2024:14:05:31,01-Jan-24, ,Y
3002,ALT,Screening, ,01JAN2024:14:06:13,01-Jan-24, ,Y
3002,ALT,Screening, ,01JAN2024:14:06:53,01-Jan-24, ,Y
3002,ALT,Cycle Day 1,PreDose,13JAN2024:10:20:10,13-Jan-24, ,Y
3002,ALT,Cycle Day 1,PreDose,13JAN2024:10:22:20,13-Jan-24, ,Y
3002,ALT,Cycle Day 1,PreDose,13JAN2024:10:23:30,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:26:21,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:27:18,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:28:29,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:45:05,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:46:32,13-Jan-24, ,Y
3002,ALT,Cycle Day 1, 1Hour,13JAN2024:13:47:45,13-Jan-24, ,Y
;
Your data is not actually sorted by the grouping variables. So you will probably need to sort it first. Since it is much easier to remember the last hour than to have to look into the future to find it we should sort by descending datetime.
proc sort data=have;
by SUBJID PARAMCD AVISIT ATPT descending ADTM;
run;
Now you just need to remember what hour it was on the first observation for the group and then compare that to the current hour to make you new flag variable.
data want;
set have;
by SUBJID PARAMCD AVISIT ATPT descending ADTM;
if first.atpt then firsthr = intnx('dthour',adtm,0);
format firsthr datetime19.;
retain firsthr ;
flag = (intnx('dthour',adtm,0)=firsthr);
drop firsthr;
run;
Result
If you really want that character variable instead of the easier boolean numeric variable then just use some IF/THEN logic instead of the simple assignment statement.
if (intnx('dthour',adtm,0)=firsthr) then flag='Y' ;
else flag='N';
I think you want to construct a cutoff time exactly 1 hour (3600 seconds) prior to the last ADTM for each SUBJID AVISIT, yes? Then keep all cases on or after the cutoff.
This is untested in the absence of sample data in the form of a working DATA step:
data need /view=need;
set have (keep=subjid avisit atptn adtm);
by subjid avsit atptn;
if last.atptn;
adtm_cutoff=atptn-3600;
drop adtm;
run;
data want (drop=adtm_cutoff);
merge have need;
by subjid avisit atptn;
if adtm>=adtm_cutoff;
run;
You see two DATA steps, but they will process the data as one step, since NEED is a data set VIEW, not a data set FILE.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.