Hi SAS community!
I face a problem when creating the lag variables with the duplicate value comes from nowhere, can you please give any suggestions about that? Many thanks
My have data is as below:
Type year obs amihud up_us nosh vo p_us
13016K 1987 0 . . . . .
13016K 1988 0 . . . . .
13016K 1989 0 . . . . .
13016K 1990 0 . . . . .
13016K 1991 0 . . . . .
13016K 1992 0 . . . . .
13016K 1993 0 . . . . .
13016K 1994 0 . . . . .
13016K 1995 0 . . . . .
13016K 1996 0 . . . . .
13016K 1997 0 . . . . .
13016K 1998 0 . . . . .
13016K 1999 0 . . . . .
13016K 2000 0 . 0.2945 40120 0.3 42.9638
13016K 2001 0 . 0.1945 40120 . 28.3783
13016K 2002 0 . 0.1802 45558 0.7 26.2883
13016K 2003 0 . 0.2712 45908 . 39.5715
13016K 2004 0 . 0.1568 45908 . 22.8729
13016K 2005 0 . 0.0337 45908 . 4.9228
13016K 2006 0 . 0.0394 45908 1.4 5.7496
13016K 2007 0 . 0.0483 97085 . 7.4443
13016K 2008 0 . 0.0349 100233 . 5.3736
13016K 2009 0 . 0.0351 120065 . 5.5743
13016K 2010 0 . 0.0205 152711 . 3.2582
13016K 2011 0 . 0.0195 308196 0.5 3.8958
13016K 2012 0 . 0.0093 308196 . 1.8687
13016K 2013 0 . 0.0018 308196 . 0.3579
13016K 2014 0 . 0.0368 315410 8.9 0.3683
13016K 2015 0 . 0.0255 1077004 . 0.2546
13016K 2016 4 0.0005137639 0.2498 159721 20.3 0.2498
13016K 2017 9 0.0000546129 0.1799 161321 198.4 0.1799
13016K 2018 2 0.0002143573 0.1619 161321 50 0.1619
13016K 2019 2 0 0.1687 177464 . 0.1687
130347 1987 0 . . . . .
130347 1988 0 . . . . .
130347 1989 0 . . . . .
130347 1990 0 . . . . .
130347 1991 0 . . . . .
130347 1992 0 . . . . .
130347 1993 0 . . . . .
130347 1994 47 0.0000738433 0.7 43242 22.5 0.64
130347 1995 35 0.0000662663 0.65 43242 . 0.59
130347 1996 25 0.0000750143 1.08 43242 566.1 0.99
130347 1997 44 0.0001311861 1.11 66105 . 1.11
130347 1998 15 0.0000954854 0.63 66806 2.5 0.63
and I create two lag variables of p_us and vo by the code below:
data want;
set have;
by Type;
if first.type then
do;
lag_vo=.;
lag_p_us=.;
end;
else
do;
lag_vo=lag(vo);
lag_p_us=lag(p_us);
end;
run;
And the result is as below:
Type year obs amihud up_us nosh vo p_us lag_vo lag_p_us
13016K 1987 0 . . . . . . .
13016K 1988 0 . . . . . . .
13016K 1989 0 . . . . . . .
13016K 1990 0 . . . . . . .
13016K 1991 0 . . . . . . .
13016K 1992 0 . . . . . . .
13016K 1993 0 . . . . . . .
13016K 1994 0 . . . . . . .
13016K 1995 0 . . . . . . .
13016K 1996 0 . . . . . . .
13016K 1997 0 . . . . . . .
13016K 1998 0 . . . . . . .
13016K 1999 0 . . . . . . .
13016K 2000 0 . 0.2945 40120 0.3 42.9638 . .
13016K 2001 0 . 0.1945 40120 . 28.3783 0.3 42.9638
13016K 2002 0 . 0.1802 45558 0.7 26.2883 . 28.3783
13016K 2003 0 . 0.2712 45908 . 39.5715 0.7 26.2883
13016K 2004 0 . 0.1568 45908 . 22.8729 . 39.5715
13016K 2005 0 . 0.0337 45908 . 4.9228 . 22.8729
13016K 2006 0 . 0.0394 45908 1.4 5.7496 . 4.9228
13016K 2007 0 . 0.0483 97085 . 7.4443 1.4 5.7496
13016K 2008 0 . 0.0349 100233 . 5.3736 . 7.4443
13016K 2009 0 . 0.0351 120065 . 5.5743 . 5.3736
13016K 2010 0 . 0.0205 152711 . 3.2582 . 5.5743
13016K 2011 0 . 0.0195 308196 0.5 3.8958 . 3.2582
13016K 2012 0 . 0.0093 308196 . 1.8687 0.5 3.8958
13016K 2013 0 . 0.0018 308196 . 0.3579 . 1.8687
13016K 2014 0 . 0.0368 315410 8.9 0.3683 . 0.3579
13016K 2015 0 . 0.0255 1077004 . 0.2546 8.9 0.3683
13016K 2016 4 0.0005137639 0.2498 159721 20.3 0.2498 . 0.2546
13016K 2017 9 0.0000546129 0.1799 161321 198.4 0.1799 20.3 0.2498
13016K 2018 2 0.0002143573 0.1619 161321 50 0.1619 198.4 0.1799
13016K 2019 2 0 0.1687 177464 . 0.1687 50 0.1619
130347 1987 0 . . . . . . .
130347 1988 0 . . . . . . 0.1687
130347 1989 0 . . . . . . .
130347 1990 0 . . . . . . .
130347 1991 0 . . . . . . .
130347 1992 0 . . . . . . .
130347 1993 0 . . . . . . .
130347 1994 47 0.0000738433 0.7 43242 22.5 0.64 . .
130347 1995 35 0.0000662663 0.65 43242 . 0.59 22.5 0.64
The problem here is there should be no value for lag_p_value for the Type 130347 in the year 1988 (because p_value in the year 1987 is "."), but from my result, lag_p_value of the Type 130347 equals to p_value of Type 13016K in the year 2019 (0.1687).
I am wondering if there is anything wrong with my code.
I also included my data below.
Many thanks.
Never call a LAG() function conditionally, unless you know exactly what you are doing and want to perform special tricks with the lagged values.
The LAG() function adds a value to the FIFO chain only when it is called, so you never feed values at first.type, but keep the last value of the previous group.
Change your data step like this:
data want;
set have;
by Type;
lag_vo = lag(vo);
lag_p_us = lag(p_us);
if first.type
then do;
lag_vo = .;
lag_p_us = .;
end;
run;
Use of whitespace to make code more readable is recommended, as is proper indentation (only one step per functional level).
Never call a LAG() function conditionally, unless you know exactly what you are doing and want to perform special tricks with the lagged values.
The LAG() function adds a value to the FIFO chain only when it is called, so you never feed values at first.type, but keep the last value of the previous group.
Change your data step like this:
data want;
set have;
by Type;
lag_vo = lag(vo);
lag_p_us = lag(p_us);
if first.type
then do;
lag_vo = .;
lag_p_us = .;
end;
run;
Use of whitespace to make code more readable is recommended, as is proper indentation (only one step per functional level).
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.