- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am getting unusual results when I am using the lag function and not getting the results I am looking for.
Dataset:
ID Event_Date
1 12MAY2020
1 24JUN2020
1 08AUG2020
2 09APR2020
3 26JUN2020
3 30AUG2020
Program I wrote:
data want;
set have;
by ID;
if first.ID then prev_event_dt = .;
else prev_event_dt=lag(Event_Date);
run;
Output I am expecting:
ID Event_Date prev_event_dt
1 12MAY2020 .
1 24JUN2020 12MAY2020
1 08AUG2020 24JUN2020
2 09APR2020 .
3 26JUN2020 .
3 30AUG2020 26JUN2020
Please point out where I am going wrong with the program I wrote.
Thank you
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You never passed in the date from the first observation per ID into the queue/stack that LAG() uses.
So it is not there to be returned later.
In general you never want to conditionally run the LAG() function because of this issue.
data want;
set have;
by ID;
prev_event_dt=lag(Event_Date);
if first.ID then prev_event_dt = .;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
LAG does not behave as you expect it in IF / THEN / ELSE statements. (It's doing the right thing, but your expectation is wrong)
Better you should use the IFN function (or IFC if you want to create a character variable)
prev_event_dt=ifn(first.id,.,lag(event_date));
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You never passed in the date from the first observation per ID into the queue/stack that LAG() uses.
So it is not there to be returned later.
In general you never want to conditionally run the LAG() function because of this issue.
data want;
set have;
by ID;
prev_event_dt=lag(Event_Date);
if first.ID then prev_event_dt = .;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The LAG function creates a FIFO queue that is filled every time the function is called. Calling LAG in a conditional branch means that not all values will make it into the queue.
Restructure your code:
data want;
set have;
by ID;
prev_event_dt = lag(Event_Date);
if first.ID then prev_event_dt = .;
run;