It sounds like what you want is the first observation per group where a new group starts when:
It would be easier if you had already assigned the grouping variable, but we can just convert your current FLAG variable into one that only has two possible values and then use that.
data have;
input id date :date. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 6jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 .
2 6jul2021 .
;
data want;
set have ;
by id ;
any_flag=not missing(flag);
if first.id or any_flag ne lag(any_flag) or dif(date)=>5 ;
run;
Obs id date flag any_flag 1 1 01JUL2021 . 0 2 1 06JUL2021 1 1 3 1 16JUL2021 2 1 4 2 01JUL2021 . 0 5 2 06JUL2021 . 0
data have;
input id date :date9.;
format date date9.;
cards;
1 1jul2021
1 1jul2021
1 4jul2021
1 10jul2021
1 16jul2021
1 18jul2021
;
data want;
set have;
prev_date=lag(date);
if _n_=1 or date-prev_date>5 then output;
drop prev_date;
run;
And if there are additional ID values, then what? Please present a data set with multiple IDs and the correct output.
When you present data sets, please follow my example above and provide working SAS data step code, rather than just text as you did — Thanks.
I have another twist--I have a "flag" variable. If not missing then get the row that associated with smallest date's row (see 1 4jul2021 1), but take if not missing and the flag variable is earliest date not with smallest flag value (see 1 16jul2021 2 although the 18jul2021 has smaller flag=1 ) . How can I do below?
Have:
id date flag
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
Wanted:
id date flag
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
2 1jul2021 1
2 6jul2021 .
As I requested, please post your example data as SAS data step code (as in my earlier example) and not text that is not part of a data step.
Check _all_ offered solutions. 🙂
B.
data want;
set have;
by id;
*calculates the difference between previous date and current date;
date_diff = dif(date);
*If it's the first record with that ID then sets the difference to missing;
if first.id then date_diff = .;
*If the date difference is less than 5 (if less than or equal to change the sign) and not the first record and deletes it if required;
if date_diff < 5 and not first.id then delete;
run;
@Emma2021 wrote:
Have:
id date
1 1jul2021
1 1jul2021
1 4jul2021
1 10jul2021
1 16jul2021
1 18jul2021
2 3jul2021
2 15jul2021
Wanted:
id date
1 1jul2021
1 10jul2021
1 16jul2021
2 3jul2021
2 15jul2021
I have another twist--I have a "flag" variable. If not missing then get the row that associated with smallest date's row (see 1 4jul2021 1), but take if not missing and the flag variable is earliest date not with smallest flag value (see 1 16jul2021 2 although the 18jul2021 has smaller flag=1 ) . How can I do below?
Have:
id date flag
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
Wanted:
id date flag
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
2 1jul2021 1
2 6jul2021 .
Again, I request data provided as SAS data step code, as I showed in my code, instead of simply text that is not a SAS data step.
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;
Is it this what you need:
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;
data want;
_N_ = 0;
do until(last.id);
set have;
by id;
_IORC_ + flag;
if _IORC_ and (first.id or (date - _N_ >= 5)) then
do;
output;
_N_ = date;
end;
end;
_IORC_ = 0;
run;
proc print;
run;
?
Bart
try this (I assume data are sorted):
data have;
input id date date9.;
format date date9.;
cards;
1 1jul2021
1 1jul2021
1 4jul2021
1 10jul2021
1 16jul2021
1 18jul2021
2 1jul2021
2 1jul2021
2 4jul2021
2 10jul2021
2 16jul2021
2 18jul2021
;
run;
data want;
do until(last.id);
set have;
by id;
if first.id or (date - _N_ > 5) then
do;
output;
_N_ = date;
end;
end;
run;
proc print;
run;
Bart
I have another twist--I have a "flag" variable. If not missing then get the row that associated with smallest date's row (see 1 4jul2021 1), but take if not missing and the flag variable is earliest date not with smallest flag value (see 1 16jul2021 2 although the 18jul2021 has smaller flag=1 ) . How can I do below?
Have:
id date flag
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
Wanted:
id date flag
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
2 1jul2021 1
2 6jul2021 .
Adding "earliest" has now made it VERY important to actually duplicate your data. If your "date" is a character variable then "earliest" is not going to be what you think it is as comparisons with character values will not be in date order.
The way you describe use of the "flag" variable doesn't seem to make sense with the shown result. Such as why is the data with no flag value set included in the output?
@Emma2021 wrote:
I have another twist--I have a "flag" variable. If not missing then get the row that associated with smallest date's row (see 1 4jul2021 1), but take if not missing and the flag variable is earliest date not with smallest flag value (see 1 16jul2021 2 although the 18jul2021 has smaller flag=1 ) . How can I do below?
Have:
id date flag
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 12 1jul2021 1
2 6jul2021 .
Wanted:
id date flag
1 4jul2021 1
1 10jul2021 4
1 16jul2021 22 1jul2021 1
2 6jul2021 .
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.