BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Guys, is there a way to flag consecutive years while considering gaps? 

Suppose to have: 

 

Person1       2015 

Person1       2018 

Person1       2019 

Person1       2020 

Person2       2015 

Person2       2016 

Person2       2018 

Person2       2019

Person3       2015

Person3       2016

Person3       2017

Person3       2020      

 

To have      

Person1       2015         0

Person1       2018         1

Person1       2019         1

Person1       2020         1

Person2       2015         1

Person2       2016         1

Person2       2018         2

Person2       2019         2

Person3       2015         1     

Person3       2016         1

Person3       2017         1

Person3       2020         0     

 

Thank you in advance.

 

Just edited to simplify. I used 2 to distinguish 2015-2016 from 2018-2019 for Person2 because there is a gap otherwise they will be considered consecutives while using 1 but any other flag is ok.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@NewUsrStat wrote:
Just need to flag out of consecutive

Assign each sequence its own number.  Then you can use BY group processing to exclude the sequences that only have one observation.

data have;
  input id $ year ;
cards;
Person1 2015
Person1 2018
Person1 2019
Person1 2020
Person2 2015
Person2 2016
Person2 2018
Person2 2019
Person3 2015
Person3 2016
Person3 2017
Person3 2020
;

data want;
 set have;
 by id ;
 sequence + dif(year) > 1;
 if first.id then sequence=1;
run;

proc print;
run;

data subset;
  set want;
  by id sequence;
  if not (first.sequence and last.sequence);
run;

proc print;
run;

Tom_0-1664493023258.png

Tom_1-1664493046536.png

 

 

View solution in original post

6 REPLIES 6
ballardw
Super User

I assume by "jumping years" you mean out of sequence. With an unstated requirement of "within Person"

 

Why is the first record marked as a flag:

Person1       2015      flag
Person1       2018        0

I would expect

Person1       2015      0
Person1       2018      flag

Assuming my interpretation above is complete one way:

data have;
   input person $ year;
datalines;
Person1       2015 
Person1       2018 
Person1       2019 
Person1       2020 
Person2       2015 
Person2       2016 
Person2       2018 
Person2       2019
Person3       2015
Person3       2016
Person3       2017
Person3       2020
;

/* assumes sorted by Person and Year*/

data want;
   set have;
   by person;
   dyr = dif(year);
   if first.person then flag=0;
   else flag=dyr>1;
   drop dyr;
run;

The data step is the way to provide example data. When we have to guess we make assumptions as to variable types that may not be correct.

The DIF function returns the numeric difference between the current value of a variable and that of the previous record with some limitations.

SAS when using BY group processing in a data step adds automatic variables First. and Last. for the variables on the By statement that indicate whether the current record is the first or last of the group. These are numeric 1 (true)/ 0 (false) values and can be used as logical values as shown.

If your data is not sorted by ID but is grouped by id with the years in sequence you would add the option NOTSORTED to the BY statement as the default is that the values are sorted and will cause an error if the data is not actually sorted.

A flag value of 1 indicates difference greater than one.

 

Note: if you are concerned about same values in sequence this will not flag repeated values.

 

NewUsrStat
Lapis Lazuli | Level 10

It is the same: I need to flag "all in sequence" or "all out of sequence" because consecutive sequences will be processed (but not "outliers"). Thank you very much!

Astounding
PROC Star
I think you meant to flag the second observation not the first.

Here is one approach.

data want;
set have:
by person;
flag = dif(year) ne 1;
if first.person then flag=0;
run;
NewUsrStat
Lapis Lazuli | Level 10
Just need to flag out of consecutive
Tom
Super User Tom
Super User

@NewUsrStat wrote:
Just need to flag out of consecutive

Assign each sequence its own number.  Then you can use BY group processing to exclude the sequences that only have one observation.

data have;
  input id $ year ;
cards;
Person1 2015
Person1 2018
Person1 2019
Person1 2020
Person2 2015
Person2 2016
Person2 2018
Person2 2019
Person3 2015
Person3 2016
Person3 2017
Person3 2020
;

data want;
 set have;
 by id ;
 sequence + dif(year) > 1;
 if first.id then sequence=1;
run;

proc print;
run;

data subset;
  set want;
  by id sequence;
  if not (first.sequence and last.sequence);
run;

proc print;
run;

Tom_0-1664493023258.png

Tom_1-1664493046536.png

 

 

NewUsrStat
Lapis Lazuli | Level 10
Thank you very much! It works perfectly!

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 814 views
  • 2 likes
  • 4 in conversation