Hi guys,
I have the following data set:
ID Start Stop
001 01JAN2013 31JAN2013
001 01FEB2013 31DEC2013
002 01MAR2013 31DC2013
003 01JAN2013 31DEC2013
I need the following output:
ID Start Stop Start_flag End_flag
001 01JAN2013 31JAN2013 1 2
001 01FEB2013 31DEC2013 2 3
002 01MAR2013 31DC2013 1 2
003 01JAN2013 31DEC2013 1 2
In other words I need to add a flag for the start and end with the exception that for consecutive periods the end flag of the previous period will become the start flag of the subsequent period and the remaining end flag will be increased by 1.
Can anyone help me please?
Thnk you in advance
data want;
set have;
by id;
if first.id then start_flag=1;
else start_flag+1;
end_flag=start_flag+1;
run;
data want;
set have;
by id;
if first.id then start_flag=1;
else start_flag+1;
end_flag=start_flag+1;
run;
You need to provide a somewhat better example as I am not sure from your description what the result for the 3rd row of the modified ID 001 data below. Or Explicitly state that there will not be gaps (or overlaps) in the dates of the sequences.
ID Start Stop 001 01JAN2013 31JAN2013 001 01FEB2013 31DEC2013 001 01MAY2014 30JUN2014 002 01MAR2013 31DC2013 003 01JAN2013 31DEC2013
What I understand to this point:
/* assumes data is sorted by ID and start as implied*/ data want; set have; by id; retain start_flag; ls = lag(stop); Difdate = start- ls; if first.id then do; start_flag=1; end_flag=2; end; else if difdate=1 then do; start_flag+1; end_flag = start_flag+1; end; Else do; /* this would be for gaps in the sequence*/ end; drop ls difdate; run;
When you have By group processing in a data step SAS creates automatic variables that indicate whether a particular observation is the First or Last of the by group. These are 1/0 (True/False) variables that are accessed with the First. and Last. construct as shown. This allows setting things at breaks in the data such as your Id changes.
The LAG function pulls a value from a preceding observation (if done properly, best not in IF type constructs). Retain keeps the value of a variable across data step iterations and is often used for a accumulator or sequencing variables like your Start_flag.
A style point for naming variables: If you have variables named Start and Stop and then Start_flag, it would make a lot more sense to have a Stop_flag then an End_flag for the variable name. Then the connection when you read the code a year later makes more sense.
From your explanation and example, the start and stop dates are not used in determining the flags. Is that right?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.