Getting Correct Lagged Data Across Multiple Variables

Reply
Frequent Contributor
Posts: 101

Getting Correct Lagged Data Across Multiple Variables

Hello, one more question for tonight.

I have a large dataset in which I would like to create a variable for the lagged return of a fund.  I have lots of data for many funds across a very large date range. 

The data takes the following form:

ID                           QTR                           SHARES                     

001                         2001Q1                        100

001                         2001Q2                        101                      

001                         2001Q3                       103                      

001                         2001Q4                        103

002                         2001Q1                           10                    

002                         2001Q2                           10                   

002                         2001Q3                           11

002                         2001Q4                           11

So I am currently using the lag function to get the lagged value for SHARES, which for ID number 001 on date 2001Q2 would return a value of 100.  However, when I do this, the value returned for ID number 002 on 2001Q1 is 103, which is incorrect.

How do I stop the incorrect values when the ID variable switches over?  As a note, I do not have consistent date ranges for all of the ID variables.  Some have 30 years of data, some have a few years, starting very recently.

I am hoping this is an easy fix, so thank you in advance!

John

Super User
Posts: 5,081

Re: Getting Correct Lagged Data Across Multiple Variables

You don't prevent it from happening.  You merely re-set the value.  For example:

data want;

  set have;

  by id;

  prior_shares = lag(shares);

  if first.id then prior_shares=.;

run;

Ask a Question
Discussion stats
  • 1 reply
  • 194 views
  • 1 like
  • 2 in conversation