BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
UniversitySas
Quartz | Level 8

If I have my data sorted by ID and date,

 

data have;
infile DATALINES dsd missover;
input ID date base_var;
CARDS;
01, 1, 0
01, 2, 0
01, 3, 1
01, 4, 0
01, 5, 0
01, 6, 0
02, 1, 1
02, 2, 1
02, 3, 0
;
run;

and my desired output is as follows:

data want;
infile DATALINES dsd missover;
input ID date base_var first_condition second_condition;
CARDS;
01, 1, 0, 0, 1
01, 2, 0, 0, 1
01, 3, 1, 0, 0
01, 4, 0, 1, 0
01, 5, 0, 1, 0
01, 6, 0, 1, 0
02, 1, 1, 0, 0
02, 2, 1, 1, 0
02, 3, 0, 1, 0
;
run;

 

 

So you can see, I have a base variable, and 2 variables I want to condition on the base variable.

 

 

I want it so that when base_var = 0, first_condition = 0, and second_condition =1.

But, for the first instance of base_var = 1, second_condition = 0 for the rest of the rows, for that specific ID.

and for the first instance of lag(base_var) =1, first_condition = 1 for the rest of the rows, for that specific ID.

 

So in this case, at ID = 01, date = 3 - you can see the third row is when this second_condition =0,

and then it resets when I get to ID = 2.

Similarly at ID = 01, date = 4 - first_condition = 1 for the remaining ID=01 rows, until we get to ID=02 where it goes back to 0.

 

Is this possible?

 

This is what I tried, and it isn't working:

 

DATA want;
	SET have;
first_condition = 0; second_condition = 1; if lag(base_var) = 1 THEN first_condition = 1; if lag(first_condition ) = 1 THEN first_condition = 1; if ID ne lag(ID) THEN first_condition = 0; *so first_condition=0 every time we have the first obs of a new id; if base_var= 1 THEN second_condition = 0; if first_condition = 1 THEN second_condition = 0; if id ne lag(id) and base_var = 1 THEN second_condition= 0; *so second_condition =0 if base_Var = 1 in the first obs of a new id; run;

But when I do this, my code messes up at ID=01, date = 6, when I have two base_var= 0s in a row.

first_condition = 0 and second_condition = 1 in this row,

which I don't get, because I thought the 

if lag(first_condition ) = 1 THEN first_condition = 1;

part would take care of it.

 

Any help appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @UniversitySas,

 

You can achieve the desired result without using the (tricky) LAG function:

data want;
first_condition=0;
second_condition=1;
do until(last.id);
  set have;
  by id;
  if base_var=1 then second_condition=0;
  output;
  if base_var=1 then first_condition=1;
end;
run;

View solution in original post

4 REPLIES 4
Ksharp
Super User

Assuming I understood what you mean .

 

data have;
infile DATALINES dsd missover;
input ID date base_var;
CARDS;
01, 1, 0
01, 2, 0
01, 3, 1
01, 4, 0
01, 5, 0
01, 6, 0
02, 1, 1
02, 2, 1
02, 3, 0
;
run;
data want;
 set have;
 by id;
 retain first second;
 if first.id then do;first=0;second=1;end;
 if base_var=1 then second=0;
 if id=lag(id) and lag(base_var)=1 then first=1;
run;
FreelanceReinh
Jade | Level 19

Hello @UniversitySas,

 

You can achieve the desired result without using the (tricky) LAG function:

data want;
first_condition=0;
second_condition=1;
do until(last.id);
  set have;
  by id;
  if base_var=1 then second_condition=0;
  output;
  if base_var=1 then first_condition=1;
end;
run;
UniversitySas
Quartz | Level 8
THIS WORKED!

thank you so much.

Could I ask - what does the "output" do here? and why is it that first_condition = 1 contingent on the lagged base_var here, and not the current base_var?
FreelanceReinh
Jade | Level 19

You're welcome. Of course, Ksharp's solution works as well.

 

The OUTPUT statement writes the current observation to dataset WANT (and it overrides the implied OUTPUT statement at the end of each DATA step iteration). It is executed unconditionally for every observation of dataset HAVE after second_condition was set to 0, if applicable, but before first_condition is set to 1, if applicable. Since first_condition (and likewise second_condition) retains its value within the DO-UNTIL loop which is processing one BY group after the other (known as "DOW loop"), the assignment first_condition=1 affects the OUTPUT statement of the next and all subsequent iterations of the DO-UNTIL loop within the same BY group, hence all observations in dataset WANT after the one with base_var=1 until the end of the BY group. The two assignment statements at the beginning of the DATA step initialize the two "condition" variables before the processing of each BY group commences.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 973 views
  • 2 likes
  • 3 in conversation