Does your reporting of data set WANT really represent your desired output? I ask because it does not meet my understanding of your requirement.
I suggest:
data have;
input ID1 ID2 R I Seq ;
datalines;
1 11 1 . 1
2 11 0 1 2
1 143 0 1 1
1 22 1 . 1
2 22 1 . 2
3 22 1 . 1
4 22 1 . 2
1 165 0 1 1
1 164 0 1 1
1 166 0 1 1
1 33 0 1 1
2 33 1 . 2
3 33 1 . 3
4 33 1 . 4
;
proc sort data = have;
by ID2;
run;
data want;
set have;
by ID2;
rnew=ifn(i=1 and first.id2=0,lag(r),0);
run;
The statement
rnew=ifn(i=1 and first.id2=0,lag(r),0);
probably produces the result you describe, while the code you provide:
if I = 1 and lag(R) =1 then Rnew = 1;
Else Rnew = 0;
probably does not. That's for two reasons:
By testing for first.id2=0 I am allowing lag(R) to be a possible result only when the record-in-hand is not the start of an ID2 group.
But even if you had only one ID2 group, you would get erroneous results. That's because the lag function is really a queue-updater (a LIFO queue). If you put that queue updater function as the result of the IF condition, then the queue is not updated with every observation. And therefore the lagged result will return the value of R not from the prior observations, but from the prior observation that satisfied the "IF I=1" condition.
What you probably want is to update the queue with every observation, but return that update to the RNEW result only when the if condition is met. Unlike the IF … THEN … statement, the IFN function always updates both of the possible results (i.e. both LAG(r) and 0). And it returns LAG(r) when I=1 and first.ID2=0, and returns zero otherwise.
Whenever I see or use the LAG function, in my mind I substitute the term UFQ (update-fifo-queue), which I find to be a useful way to recognize what the function is actually doing.
Also:
BTW, it looks like your dataset HAVE is already grouped by ID2, even though it is not SORTED by ID2, you could avoid the PROC SORT. Just change the "BY ID2" statement to "BY ID2 NOTSORTED".
... View more