Dear all,
I have a dataset as follow,
ID | year | lag_4var | lag_5var | lag_6var | lag_7var | treat |
1 | 1990 | 6 | 7 | 8 | 9 | 0 |
1 | 1991 | 11 | 7 | 8 | 9 | 1 |
1 | 1992 | 9 | 4 | 5 | 13 | 1 |
1 | 1993 | 12 | 13 | 14 | 11 | 1 |
2 | 2001 | 8 | 11 | 12 | 8 | 1 |
2 | 2002 | 6 | 12 | 5 | 6 | 1 |
2 | 2003 | 13 | 5 | 11 | 14 | 1 |
lag4_var = lag4(var) where var is a measurement of firm size for each ID/quarter. Similarly, lag5_var is the value of the 5th previous quarterly ‘var’ and lag6_var is the value of the 6th previous quarterly ‘var’, and so on.
Treat is a dummy variable equals 1 if any of lag#_var greater than or equals to 10, or it equals 0 if all the lag#_var are less than 10.
I want to create a new variable, K. For observations with treat=1, K equals the earliest lag#_var that is greater than 10. For observations with treat=0. K equals the latest (most current) lag#_var.
For example, for observation with ID=1 and year=1990, treat=0, so in this case K should be lag4_var as lag4_var is the most recent number for lag#_var.
For observation with ID=2 and year=2003, treat=1, and in this case K should be lag7_var as lag7_var is the first and earliest lag#_var that is greater than 10.
So as a result, I want to have the following dataset:
ID | year | lag_4var | lag_5var | lag_6var | lag_7var | treat | K |
1 | 1990 | 6 | 7 | 8 | 9 | 0 | 6 |
1 | 1991 | 11 | 7 | 8 | 9 | 1 | 11 |
1 | 1992 | 9 | 4 | 5 | 13 | 1 | 13 |
1 | 1993 | 12 | 13 | 14 | 11 | 1 | 11 |
2 | 2001 | 8 | 11 | 12 | 8 | 1 | 12 |
2 | 2002 | 6 | 12 | 5 | 6 | 1 | 12 |
2 | 2003 | 13 | 5 | 11 | 14 | 1 | 14 |
data have;
input ID year lag_4var lag_5var lag_6var lag_7var treat;
cards;
1 1990 6 7 8 9 0
1 1991 11 7 8 9 1
1 1992 9 4 5 13 1
1 1993 12 13 14 11 1
2 2001 8 11 12 8 1
2 2002 6 12 5 6 1
2 2003 13 5 11 14 1
;
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>10 then k=l;
end;
else k=lag_4var;
run;
To address
ID year lag_4var lag_5var lag_6var lag_7var treat
3 2000 10 9 8 7 1
So in this case, K=1 and the earliest lag#_var greater or equal to 10 is lag_4var, so K=10
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>10 then k=l;
else k=lag_4var;
end;
else k=lag_4var;
run;
data have;
input ID year lag_4var lag_5var lag_6var lag_7var treat;
cards;
1 1990 6 7 8 9 0
1 1991 11 7 8 9 1
1 1992 9 4 5 13 1
1 1993 12 13 14 11 1
2 2001 8 11 12 8 1
2 2002 6 12 5 6 1
2 2003 13 5 11 14 1
;
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>10 then k=l;
end;
else k=lag_4var;
run;
To address
ID year lag_4var lag_5var lag_6var lag_7var treat
3 2000 10 9 8 7 1
So in this case, K=1 and the earliest lag#_var greater or equal to 10 is lag_4var, so K=10
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>10 then k=l;
else k=lag_4var;
end;
else k=lag_4var;
run;
Hi novinosrin thanks for your reply and it works out perfectly. Now if I want to twist the results a little bit, for example, for observations with treat=1, K equals the average value of all lag#_var that are greater than or equals 10. For observations with treat=0. K equals the average value of lag#_var that are less than 10, do you think it is still possible to get?
Yes, slight tweak if I understand you correctly
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>=10 then do;
sum=sum(sum,l);
n=sum(n,1);
end;
end;
else do over l;
if l<10 then do;
sum=sum(sum,l);
n=sum(n,1);
end;
end;
k=sum/n;
run;
How would you score the following record?
ID year lag_4var lag_5var lag_6var lag_7var treat
3 2000 10 9 8 7 1
Art, CEO, AnalystFinder.com
Hi, based on your explanation I think you are looking for something like the following.
Note I used 7 as my cut off to fit my sample input data
/* Set up sample input data */
data input ;
infile cards ;
input var ;
lagVar1=lag1(var) ;
lagVar2=lag2(var) ;
lagVar3=lag3(var) ;
lagVar4=lag4(var) ;
cards;
3
2
8
4
6
7
1
3
2
9
3
4
6
7
;
run ;
/* generate output */
data ouput (drop=i) ;
set input ;
/* Load lag values into an array */
array lags{4} lagVar1-lagVar4 ;
/* Set treat and k to 0 */
treat=0 ;
k=0 ;
/* Loop through array backwards, starting with oldest lag value to most recent lag value */
do i=4 to 1 by -1 ;
/* if the lag value exceeds your trigger */
if lags{i}>7 then do ;
/* Set treat to 1 */
treat=1 ;
/* if k=0 */
if k=0 then
/* then store the lag value */
k=lags{i} ;
end ;
end ;
/* if k=0 */
if k=0 then
/* set k to the most recent lag value */
k=lags{1} ;
run ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.