BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Emma2021
Quartz | Level 8
I have a long data. I would like to keep all rows if the lag date is more than 5 days, else keep only the first row, for example:

Have:
id date
1 1jul2021
1 1jul2021
1 4jul2021
1 10jul2021
1 16jul2021
1 18jul2021

Wanted:
id date
1 1jul2021
1 10jul2021
1 16jul2021


Thank you!
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

It sounds like what you want is the first observation per group where a new group starts when:

  • A new ID starts
  • A transition of the FLAG state
  • A gap of more than 5 days

It would be easier if you had already assigned the grouping variable, but we can just convert your current FLAG variable into one that only has two possible values and then use that.

data have;
  input id date :date. flag;
  format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 6jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 .
2 6jul2021 .
;

data want;
   set have ;
   by id ;
   any_flag=not missing(flag);
   if first.id or any_flag ne lag(any_flag) or dif(date)=>5 ;
run;
Obs    id         date    flag    any_flag

 1      1    01JUL2021      .         0
 2      1    06JUL2021      1         1
 3      1    16JUL2021      2         1
 4      2    01JUL2021      .         0
 5      2    06JUL2021      .         0

View solution in original post

33 REPLIES 33
PaigeMiller
Diamond | Level 26
data have;
input id date :date9.;
format date date9.;
cards;
1 1jul2021
1 1jul2021
1 4jul2021
1 10jul2021
1 16jul2021
1 18jul2021
;
data want;
    set have;
    prev_date=lag(date);
    if _n_=1 or date-prev_date>5 then output;
    drop prev_date;
run;

And if there are additional ID values, then what? Please present a data set with multiple IDs and the correct output.

 

When you present data sets, please follow my example above and provide working SAS data step code, rather than just text as you did — Thanks.

--
Paige Miller
Emma2021
Quartz | Level 8

I have another twist--I have a "flag" variable. If not missing then get the row that associated with smallest date's row (see 1 4jul2021 1), but take if not missing and the flag variable is earliest date not with smallest flag value (see 1 16jul2021 2 although the 18jul2021 has smaller flag=1 ) . How can I do below? 

 

Have:
id date flag
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1

2 1jul2021 1

2 6jul2021 .

Wanted:
id date flag
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2

2 1jul2021 1

2 6jul2021 .

PaigeMiller
Diamond | Level 26

As I requested, please post your example data as SAS data step code (as in my earlier example) and not text that is not part of a data step.

--
Paige Miller
Emma2021
Quartz | Level 8
Can you help with below:

When as below data:

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;


Then the wanted data should be as below:

1 4jul2021 1
1 10jul2021 2
2 1jul2021 1
2 6jul2021 .



But when the data have as below:

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
1 18jul2022 1
2 1jul2021 1
2 6jul2021 .
;
run;



Then it should be as below:

1 1jul2021 .
1 4jul2022 1
2 1jul2021 1
2 6jul2021 .
yabwon
Onyx | Level 15

Check _all_ offered solutions. 🙂

 

B.

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Reeza
Super User
data want;
set have;

by id;

*calculates the difference between previous date and current date;
date_diff = dif(date);

*If it's the first record with that ID then sets the difference to missing;
if first.id then date_diff = .;

*If the date difference is less than 5 (if less than or equal to change the sign) and not the first record and deletes it if required;
if date_diff < 5 and not first.id then delete;

run;

@Emma2021 wrote:
Have:
id date
1 1jul2021
1 1jul2021
1 4jul2021
1 10jul2021
1 16jul2021
1 18jul2021
2 3jul2021
2 15jul2021

Wanted:
id date
1 1jul2021
1 10jul2021
1 16jul2021
2 3jul2021
2 15jul2021

 

Emma2021
Quartz | Level 8
Can you help with below:

When as below data:

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;


Then the wanted data should be as below:

1 4jul2021 1
1 10jul2021 2
2 1jul2021 1
2 6jul2021 .



But when the data have as below:

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
1 18jul2022 1
2 1jul2021 1
2 6jul2021 .
;
run;



Then it should be as below:

1 1jul2021 .
1 4jul2022 1
2 1jul2021 1
2 6jul2021 .
Emma2021
Quartz | Level 8

I have another twist--I have a "flag" variable. If not missing then get the row that associated with smallest date's row (see 1 4jul2021 1), but take if not missing and the flag variable is earliest date not with smallest flag value (see 1 16jul2021 2 although the 18jul2021 has smaller flag=1 ) . How can I do below? 

 

Have:
id date flag
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1

2 1jul2021 1

2 6jul2021 .

Wanted:
id date flag
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2

2 1jul2021 1

2 6jul2021 .

PaigeMiller
Diamond | Level 26

Again, I request data provided as SAS data step code, as I showed in my code, instead of simply text that is not a SAS data step.

--
Paige Miller
Emma2021
Quartz | Level 8
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;
yabwon
Onyx | Level 15

Is it this what you need:

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;

data want;

_N_ = 0;
do until(last.id);
  set have;
  by id;

  _IORC_ + flag;

  if _IORC_ and (first.id or (date - _N_ >= 5)) then
    do;
      output;
      _N_ = date;
    end;
end;
_IORC_ = 0;
run;

proc print;
run;

?

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

try this (I assume data are sorted):

data have;
input id date date9.;
format date date9.;
cards;
1 1jul2021
1 1jul2021
1 4jul2021
1 10jul2021
1 16jul2021
1 18jul2021
2 1jul2021
2 1jul2021
2 4jul2021
2 10jul2021
2 16jul2021
2 18jul2021
;
run;


data want;

do until(last.id);
  set have;
  by id;

  if first.id or (date - _N_ > 5) then
    do;
      output;
      _N_ = date;
    end;
end;

run;

proc print;
run;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Emma2021
Quartz | Level 8

I have another twist--I have a "flag" variable. If not missing then get the row that associated with smallest date's row (see 1 4jul2021 1), but take if not missing and the flag variable is earliest date not with smallest flag value (see 1 16jul2021 2 although the 18jul2021 has smaller flag=1 ) . How can I do below? 

 

Have:
id date flag
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1

2 1jul2021 1

2 6jul2021 .

Wanted:
id date flag
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2

2 1jul2021 1

2 6jul2021 .

ballardw
Super User

Adding "earliest" has now made it VERY important to actually duplicate your data. If your "date" is a character variable then "earliest" is not going to be what you think it is as comparisons with character values will not be in date order.

 

The way you describe use of the "flag" variable doesn't seem to make sense with the shown result. Such as why is the data with no flag value set included in the output?

 


@Emma2021 wrote:

I have another twist--I have a "flag" variable. If not missing then get the row that associated with smallest date's row (see 1 4jul2021 1), but take if not missing and the flag variable is earliest date not with smallest flag value (see 1 16jul2021 2 although the 18jul2021 has smaller flag=1 ) . How can I do below? 

 

Have:
id date flag
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1

2 1jul2021 1

2 6jul2021 .

Wanted:
id date flag
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2

2 1jul2021 1

2 6jul2021 .


 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 33 replies
  • 1054 views
  • 8 likes
  • 8 in conversation