BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Zhongda
Fluorite | Level 6

Hi All,

 here is the dataset:

data have;
input Id$ year income;
cards;
1 2000 33
1 2001 71
1 2002 76
1 2003 56
1 2004 56
2 2000 39
2 2001 74
2 2002 36
2 2003 48
2 2004 46
3 2000 34
3 2001 44
3 2002 69
3 2003 49
3 2004 31
4 2000 48
4 2001 38
4 2002 79
4 2003 48
4 2004 70
;
run;

 

and here is the target dataset:

id year income want_lag1_income want_lag2_income
1 2000 33 . .
1 2001 71 33 .
1 2002 76 71 33
1 2003 56 76 71
1 2004 56 56 76
2 2000 39 . .
2 2001 74 39 .
2 2002 36 74 39
2 2003 48 36 74
2 2004 46 48 36
3 2000 34 . .
3 2001 44 34 .
3 2002 69 44 34
3 2003 49 69 44
3 2004 31 49 69
4 2000 48 . .
4 2001 38 48 .
4 2002 79 38 48
4 2003 48 79 38
4 2004 70 48 79

 

Could you please help how to use 

lag function to create a one period lag and two period lag

by a setting group, here I mean by id and year

I would like the lag 1 has 1 missing value for each group and lag 2 has 2 missing value.

 

Thank you so much for the help.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You can take advantage of the BY statement and the IFN function as below:

 

data want;
  set have;
  by id;
  income_lag1=ifn(first.id,.,lag1(income));
  income_lag2=ifn(first.id or lag(first.id),.,lag2(income));
run;

 

 

Now if you have a large number of lags to consider, the IFN functions take on progressively more extensive first arguments.  You can get around that by this structure, in which an n-period lag takes its values from an n-1 period lag:

 

data want;
  set have;
  by id;
  income_lag1=ifn(first.id,.,lag1(income));
  income_lag2=ifn(first.id,.,lag1(income_lag1));
  income_lag3=ifn(first.id,.,lag1(income_lag2));
run;

 

 

Or you can avoid that sequential dependency by dropping the use of the BY ID statement, and comparing ID to n-period lagged values of ID:

 

data want;
  set have;
  income_lag1=ifn(id^=lag1(id),.,lag1(income));
  income_lag2=ifn(id^=lag2(id),.,lag2(income));
  income_lag3=ifn(id^=lag3(id),.,lag3(income));
run;

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
Zhongda
Fluorite | Level 6
Warren, Thank you so much for pointing out this, this helps me alot.
ballardw
Super User

The lag part is easy, just call the function. The complicated part of your request, and not very, is the conditional requirement to set some values missing.

One way:

The BY statement means that SAS will set automatic variables First.<by variable> and Last.<by variable> which indicate whether the current observation is the first or last of a by group. These automatic variables are not added to the data set, are numeric 1/0 for true/false. These can be used the aid any of the conditional code to execute when needed. In this case we use it reset a counter variable that is Retained so the value is kept across iterations of the data step. So we know if the current observation is the first, second (or third or ...). Then based on that counter we can conditionally set values to missing. I use the Call Missing function because that function will work with multiple variables and mixed numeric or character variables if needed to set the values to the missing condition.

Then I drop the counter variable so it does not appear in your data set.

 

Note: The BY variable(s) should be in sort order. If not but do appear as a group you can add the NOTSORTED option to the BY statement.

 

 

data want;
   set have;
   by id;
   retain idcounter;
   if first.id then idcounter=1;
   else idcounter+1;
   lagincome1 = lag1(income);
   lagincome2 = lag2(income);
   if idcounter = 1 then call missing(lagincome1,lagincome2);
   if idcounter = 2 then call missing(lagincome2);
   drop idcounter;
run;

@Zhongda wrote:

Hi All,

 here is the dataset:

data have;
input Id$ year income;
cards;
1 2000 33
1 2001 71
1 2002 76
1 2003 56
1 2004 56
2 2000 39
2 2001 74
2 2002 36
2 2003 48
2 2004 46
3 2000 34
3 2001 44
3 2002 69
3 2003 49
3 2004 31
4 2000 48
4 2001 38
4 2002 79
4 2003 48
4 2004 70
;
run;

 

and here is the target dataset:

id year income want_lag1_income want_lag2_income
1 2000 33 . .
1 2001 71 33 .
1 2002 76 71 33
1 2003 56 76 71
1 2004 56 56 76
2 2000 39 . .
2 2001 74 39 .
2 2002 36 74 39
2 2003 48 36 74
2 2004 46 48 36
3 2000 34 . .
3 2001 44 34 .
3 2002 69 44 34
3 2003 49 69 44
3 2004 31 49 69
4 2000 48 . .
4 2001 38 48 .
4 2002 79 38 48
4 2003 48 79 38
4 2004 70 48 79

 

Could you please help how to use 

lag function to create a one period lag and two period lag

by a setting group, here I mean by id and year

I would like the lag 1 has 1 missing value for each group and lag 2 has 2 missing value.

 

Thank you so much for the help.

 


 

 

s_lassen
Meteorite | Level 14

You can also try this:

data want;
  set have;
  by id;
  if first.id then call missing(want_lag1_income,want_lag2_income);
  output;
  want_lag2_income=want_lag1_income;
  want_lag1_income=income;
  retain want_lag1_income want_lag2_income;
run;
112211
Obsidian | Level 7


data new(drop=i count);
set have;
by id;
array x(*) want_lag1_income want_lag2_income;
want_lag1_income=lag1(income);
want_lag2_income=lag2(income);
if first.id then count=1;
do i=count to dim(x);
x(i)=.;
end;
count + 1;
run;

mkeintz
PROC Star

You can take advantage of the BY statement and the IFN function as below:

 

data want;
  set have;
  by id;
  income_lag1=ifn(first.id,.,lag1(income));
  income_lag2=ifn(first.id or lag(first.id),.,lag2(income));
run;

 

 

Now if you have a large number of lags to consider, the IFN functions take on progressively more extensive first arguments.  You can get around that by this structure, in which an n-period lag takes its values from an n-1 period lag:

 

data want;
  set have;
  by id;
  income_lag1=ifn(first.id,.,lag1(income));
  income_lag2=ifn(first.id,.,lag1(income_lag1));
  income_lag3=ifn(first.id,.,lag1(income_lag2));
run;

 

 

Or you can avoid that sequential dependency by dropping the use of the BY ID statement, and comparing ID to n-period lagged values of ID:

 

data want;
  set have;
  income_lag1=ifn(id^=lag1(id),.,lag1(income));
  income_lag2=ifn(id^=lag2(id),.,lag2(income));
  income_lag3=ifn(id^=lag3(id),.,lag3(income));
run;

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2942 views
  • 6 likes
  • 6 in conversation