If I have my data sorted by ID and date,
data have;
infile DATALINES dsd missover;
input ID date base_var;
CARDS;
01, 1, 0
01, 2, 0
01, 3, 1
01, 4, 0
01, 5, 0
01, 6, 0
02, 1, 1
02, 2, 1
02, 3, 0
;
run;
and my desired output is as follows:
data want;
infile DATALINES dsd missover;
input ID date base_var first_condition second_condition;
CARDS;
01, 1, 0, 0, 1
01, 2, 0, 0, 1
01, 3, 1, 0, 0
01, 4, 0, 1, 0
01, 5, 0, 1, 0
01, 6, 0, 1, 0
02, 1, 1, 0, 0
02, 2, 1, 1, 0
02, 3, 0, 1, 0
;
run;
So you can see, I have a base variable, and 2 variables I want to condition on the base variable.
I want it so that when base_var = 0, first_condition = 0, and second_condition =1.
But, for the first instance of base_var = 1, second_condition = 0 for the rest of the rows, for that specific ID.
and for the first instance of lag(base_var) =1, first_condition = 1 for the rest of the rows, for that specific ID.
So in this case, at ID = 01, date = 3 - you can see the third row is when this second_condition =0,
and then it resets when I get to ID = 2.
Similarly at ID = 01, date = 4 - first_condition = 1 for the remaining ID=01 rows, until we get to ID=02 where it goes back to 0.
Is this possible?
This is what I tried, and it isn't working:
DATA want;
SET have;
first_condition = 0;
second_condition = 1;
if lag(base_var) = 1 THEN first_condition = 1;
if lag(first_condition ) = 1 THEN first_condition = 1;
if ID ne lag(ID) THEN first_condition = 0; *so first_condition=0 every time we have the first obs of a new id;
if base_var= 1 THEN second_condition = 0;
if first_condition = 1 THEN second_condition = 0;
if id ne lag(id) and base_var = 1 THEN second_condition= 0; *so second_condition =0 if base_Var = 1 in the first obs of a new id;
run;
But when I do this, my code messes up at ID=01, date = 6, when I have two base_var= 0s in a row.
first_condition = 0 and second_condition = 1 in this row,
which I don't get, because I thought the
if lag(first_condition ) = 1 THEN first_condition = 1;
part would take care of it.
Any help appreciated.
Hello @UniversitySas,
You can achieve the desired result without using the (tricky) LAG function:
data want;
first_condition=0;
second_condition=1;
do until(last.id);
set have;
by id;
if base_var=1 then second_condition=0;
output;
if base_var=1 then first_condition=1;
end;
run;
Assuming I understood what you mean .
data have;
infile DATALINES dsd missover;
input ID date base_var;
CARDS;
01, 1, 0
01, 2, 0
01, 3, 1
01, 4, 0
01, 5, 0
01, 6, 0
02, 1, 1
02, 2, 1
02, 3, 0
;
run;
data want;
set have;
by id;
retain first second;
if first.id then do;first=0;second=1;end;
if base_var=1 then second=0;
if id=lag(id) and lag(base_var)=1 then first=1;
run;
Hello @UniversitySas,
You can achieve the desired result without using the (tricky) LAG function:
data want;
first_condition=0;
second_condition=1;
do until(last.id);
set have;
by id;
if base_var=1 then second_condition=0;
output;
if base_var=1 then first_condition=1;
end;
run;
You're welcome. Of course, Ksharp's solution works as well.
The OUTPUT statement writes the current observation to dataset WANT (and it overrides the implied OUTPUT statement at the end of each DATA step iteration). It is executed unconditionally for every observation of dataset HAVE after second_condition was set to 0, if applicable, but before first_condition is set to 1, if applicable. Since first_condition (and likewise second_condition) retains its value within the DO-UNTIL loop which is processing one BY group after the other (known as "DOW loop"), the assignment first_condition=1 affects the OUTPUT statement of the next and all subsequent iterations of the DO-UNTIL loop within the same BY group, hence all observations in dataset WANT after the one with base_var=1 until the end of the BY group. The two assignment statements at the beginning of the DATA step initialize the two "condition" variables before the processing of each BY group commences.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.