Dear All:
My data is as follows:
Date ID Lag_ID
21FEB2014 1 .
21FEB2014 1 1
21FEB2014 1 1
21FEB2014 1 1
21FEB2014 2 1
21FEB2014 3 2
21FEB2014 3 3
21FEB2014 3 3
What I want is the first observation of lag_ID for each ID be '.'
so In need
Date ID Lag_ID
21FEB2014 1 .
21FEB2014 1 1
21FEB2014 1 1
21FEB2014 1 1
21FEB2014 2 .
21FEB2014 3 .
21FEB2014 3 3
21FEB2014 3 3
i wrote
data want; set have;
by ID;
if first.ID then Lag_ID = . ;
run;
Am I making a mistake?
Randy
What happens if you run that code? Does it do what you want?
Whenever you use a lag function it is updating a queue - i.e. a stack of values in memory, that is retained from observation to observation. Often the queue is only 1 value in length (i.e. lag(x)). But it could be 200 in length (lag200(x)).
So ordinarily putting a lag function in a if condition then y=lag(x) will only update the queue when the condition is true. I.e. it is NOT a look back like in excel.
So this would be bad:
data want;
set have;
by id;
if not(first.id) then lag_id=lag(id);
run;
because it would contaminate the second instance each id with the value of the preceding id. Why? Because the queue was not updated when the first.id case was read. So a common approach is:
data want;
set have;
by id;
lag_id=lag(id);
if first.id then lag_id=.;
run;
That is ALWAYS UPDATE THE QUEUE, but revise the lag_id variable when necessary.
Another approach is to use the IFN function wrapped around a LAG function:
data want;
set have;
by id;
lag_id=ifn(first.id,.,lag(id));
run;
The reason this works is because, regardless of whether the first argument is true, both of the alternatives in arguments 2 and 3 are calculated. Then one of those alternatives is returned by the IFN function, depending upon the first argument. The benefit in this case is that the lag queue is always updated, but only selectively returned by ifn.
lag_id=ifn(first.id,
And I shouldn't even have bothered with the lag discourse. In your case:
data want;
set have;
by id;
if first.id=0 then lag_id=id;
run;
which leaves lag_id as missing for the beginning of each id.
This begs the question - since you already have the first.id dummy variable available, why do you need the lag_id variable?
@RandyStanhas
What I want is the first observation of lag_ID for each ID be '.'
so In need
Date ID Lag_ID
21FEB2014 1 .
21FEB2014 1 1
21FEB2014 1 1
21FEB2014 1 1
21FEB2014 2 .
21FEB2014 3 .
21FEB2014 3 3
21FEB2014 3 3
based on the incoming data @RandyStan has the second to the last record fails your request in your sample output data. it should reset Lag_id reset . based on the request.
data have;
input Date date9. ID Lag_ID;
cards;
21FEB2014 1 .
21FEB2014 1 1
21FEB2014 1 1
21FEB2014 1 1
21FEB2014 2 1
21FEB2014 3 2
21FEB2014 3 3
21FEB2014 3 3
;
proc sort data=have;
by id lag_id date;
run;
data want;
set have;
by id lag_id;
if first.lag_id then lag_id = .;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.