Getting Correct Lagged Data Across Multiple Variables

Frequent Contributor
Posts: 101

Getting Correct Lagged Data Across Multiple Variables

Hello, one more question for tonight.

I have a large dataset in which I would like to create a variable for the lagged return of a fund.  I have lots of data for many funds across a very large date range. 

The data takes the following form:

ID                           QTR                           SHARES                     

001                         2001Q1                        100

001                         2001Q2                        101                      

001                         2001Q3                       103                      

001                         2001Q4                        103

002                         2001Q1                           10                    

002                         2001Q2                           10                   

002                         2001Q3                           11

002                         2001Q4                           11

So I am currently using the lag function to get the lagged value for SHARES, which for ID number 001 on date 2001Q2 would return a value of 100.  However, when I do this, the value returned for ID number 002 on 2001Q1 is 103, which is incorrect.

How do I stop the incorrect values when the ID variable switches over?  As a note, I do not have consistent date ranges for all of the ID variables.  Some have 30 years of data, some have a few years, starting very recently.

I am hoping this is an easy fix, so thank you in advance!


Super User
Posts: 6,754

Re: Getting Correct Lagged Data Across Multiple Variables

Posted in reply to mahler_ji

You don't prevent it from happening.  You merely re-set the value.  For example:

data want;

  set have;

  by id;

  prior_shares = lag(shares);

  if then prior_shares=.;


Ask a Question
Discussion stats
  • 1 reply
  • 1 like
  • 2 in conversation