Solved: Add and update flags for consecutive dates

NewUsrStat · Posted 08-10-2022 09:01 AM

Hi guys,

I have the following data set:

ID Start Stop

001 01JAN2013 31JAN2013

001 01FEB2013 31DEC2013

002 01MAR2013 31DC2013

003 01JAN2013 31DEC2013

I need the following output:

ID Start Stop Start_flag End_flag

001 01JAN2013 31JAN2013 1 2

001 01FEB2013 31DEC2013 2 3

002 01MAR2013 31DC2013 1 2

003 01JAN2013 31DEC2013 1 2

In other words I need to add a flag for the start and end with the exception that for consecutive periods the end flag of the previous period will become the start flag of the subsequent period and the remaining end flag will be increased by 1.

Can anyone help me please?

Thnk you in advance

mkeintz · Posted 08-10-2022 12:33 PM

data want;
  set have;
  by id;
  if first.id then start_flag=1;
  else start_flag+1;
  end_flag=start_flag+1;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 08-10-2022 12:33 PM

data want;
  set have;
  by id;
  if first.id then start_flag=1;
  else start_flag+1;
  end_flag=start_flag+1;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ballardw · Posted 08-10-2022 12:45 PM

You need to provide a somewhat better example as I am not sure from your description what the result for the 3rd row of the modified ID 001 data below. Or Explicitly state that there will not be gaps (or overlaps) in the dates of the sequences.

ID             Start                 Stop        
 001       01JAN2013     31JAN2013             
 001       01FEB2013     31DEC2013           
 001       01MAY2014     30JUN2014
 002       01MAR2013     31DC2013              
 003       01JAN2013     31DEC2013

What I understand to this point:

 /* assumes data is sorted by ID and start as implied*/

 data want;
   set have;
   by id;
   retain start_flag;
   ls = lag(stop);
   Difdate = start- ls;
   if first.id then do;
      start_flag=1;
      end_flag=2;
   end;
   else if difdate=1 then do;
      start_flag+1;
      end_flag = start_flag+1;
   end;
   Else do;
     /* this would be for gaps in the sequence*/
   end;
   drop ls difdate;
run;

When you have By group processing in a data step SAS creates automatic variables that indicate whether a particular observation is the First or Last of the by group. These are 1/0 (True/False) variables that are accessed with the First. and Last. construct as shown. This allows setting things at breaks in the data such as your Id changes.

The LAG function pulls a value from a preceding observation (if done properly, best not in IF type constructs). Retain keeps the value of a variable across data step iterations and is often used for a accumulator or sequencing variables like your Start_flag.

A style point for naming variables: If you have variables named Start and Stop and then Start_flag, it would make a lot more sense to have a Stop_flag then an End_flag for the variable name. Then the connection when you read the code a year later makes more sense.

PaigeMiller · Posted 08-10-2022 01:16 PM

From your explanation and example, the start and stop dates are not used in determining the flags. Is that right?

--
Paige Miller

Add and update flags for consecutive dates

Re: Add and update flags for consecutive dates

Re: Add and update flags for consecutive dates

Re: Add and update flags for consecutive dates

Re: Add and update flags for consecutive dates

Add and update flags for consecutive dates

Re: Add and update flags for consecutive dates

Re: Add and update flags for consecutive dates

Re: Add and update flags for consecutive dates

Re: Add and update flags for consecutive dates

SAS Innovate 2025: Save the Date