Dear all,
I have a data set as follow,
ID | year | lag4_var | lag5_var | lag6_var | lag7_var | treat |
1 | 2000 | 3 | 2 | 6 | 3 | 0 |
1 | 2001 | 6 | 7 | 8 | 9 | 0 |
1 | 2002 | 6 | 5 | 7 | 8 | 0 |
1 | 2003 | 6 | 7 | 11 | 8 | 1 |
1 | 2004 | 12 | 11 | 10 | 9 | 1 |
2 | 2015 | 6 | 7 | 8 | 9 | 0 |
2 | 2016 | 3 | 2 | 6 | 3 | 0 |
2 | 2017 | 13 | 1 | 5 | 6 | 1 |
2 | 2018 | 7 | 6 | 8 | 12 | 1 |
2 | 2019 | 5 | 22 | 5 | 6 | 1 |
in this data set, lag4_var = lag4(var) where var is some measurement of firm size for each ID/quarter. Similarly, lag5_var is the value of the 5th previous quarterly ‘var’ and lag6_var is the value of the 6th previous quarterly ‘var’, and so on. Treat is a dummy variable equals 1 if any of lag#_var greater than or equals 10, or equals to 0 if all the lag lag#_var less than 10.
I want to create an new variable, K. For observations with treat=1, K equals the earliest lag#_var that is greater than or equals 10. For observations with treat=0. K equals the latest (most current) lag#_var.
For example, for observation with ID=1 and year=2000, treat=0, and in this case K should be lag4_var as lag4_var is the most recent number for lag#_var.
For observation with ID=1 and year=2004, treat=1, and in this case K should be lag6_var as lag6_var is the first and earliest lag#_var that is greater than 10.
So as a result, I want to have the following data set.
ID | year | lag4_var | lag5_var | lag6_var | lag7_var | treat | K |
1 | 2000 | 3 | 2 | 6 | 3 | 0 | 3 |
1 | 2001 | 6 | 7 | 8 | 9 | 0 | 6 |
1 | 2002 | 6 | 5 | 7 | 8 | 0 | 6 |
1 | 2003 | 6 | 7 | 11 | 8 | 1 | 11 |
1 | 2004 | 12 | 11 | 10 | 9 | 1 | 10 |
2 | 2015 | 6 | 7 | 8 | 9 | 0 | 6 |
2 | 2016 | 3 | 2 | 6 | 3 | 0 | 3 |
2 | 2017 | 13 | 1 | 5 | 6 | 1 | 13 |
2 | 2018 | 7 | 6 | 8 | 12 | 1 | 12 |
2 | 2019 | 5 | 22 | 5 | 6 | 1 | 22 |
Thanks in advance for your help!
Hi @aj_goodnews
Please remember that it is good practice to supply input data as working code to create an input data set. It is a great help for anybody trying to answer your question, because creating test data from text is often the most time consuming part.
The code to solve your problem is simple:
data have;
input ID year lag4_var lag5_var lag6_var lag7_var treat;
cards;
1 2000 3 2 6 3 0
1 2001 6 7 8 9 0
1 2002 6 5 7 8 0
1 2003 6 7 11 8 1
1 2004 12 11 10 9 1
2 2015 6 7 8 9 0
2 2016 3 2 6 3 0
2 2017 13 1 5 6 1
2 2018 7 6 8 12 1
2 2019 5 22 5 6 1
;
run;
data want(drop=i); set have;
array vars (5) lag4_var lag5_var lag6_var lag7_var treat;
if treat = 0 then K = lag4_var;
else do i = 5 to 1 by -1;
if vars{i} >= 10 then do;
K = vars{i};
leave;
end;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.