Dear all,
I have a dataset as follow,
ID | year | lag_4var | lag_5var | lag_6var | lag_7var | treat |
1 | 1990 | 6 | 7 | 8 | 9 | 0 |
1 | 1991 | 11 | 7 | 8 | 9 | 1 |
1 | 1992 | 9 | 4 | 5 | 13 | 1 |
1 | 1993 | 12 | 13 | 14 | 11 | 1 |
2 | 2001 | 8 | 11 | 12 | 8 | 1 |
2 | 2002 | 6 | 12 | 5 | 6 | 1 |
2 | 2003 | 13 | 5 | 11 | 14 | 1 |
lag4_var = lag4(var) where var is a measurement of firm size for each ID/quarter. Similarly, lag5_var is the value of the 5th previous quarterly ‘var’ and lag6_var is the value of the 6th previous quarterly ‘var’, and so on.
Treat is a dummy variable equals 1 if any of lag#_var greater than or equals to 10, or it equals 0 if all the lag#_var are less than 10.
I want to create a new variable, K. For observations with treat=1, K equals the earliest lag#_var that is greater than 10. For observations with treat=0. K equals the latest (most current) lag#_var.
For example, for observation with ID=1 and year=1990, treat=0, so in this case K should be lag4_var as lag4_var is the most recent number for lag#_var.
For observation with ID=2 and year=2003, treat=1, and in this case K should be lag7_var as lag7_var is the first and earliest lag#_var that is greater than 10.
So as a result, I want to have the following dataset:
ID | year | lag_4var | lag_5var | lag_6var | lag_7var | treat | K |
1 | 1990 | 6 | 7 | 8 | 9 | 0 | 6 |
1 | 1991 | 11 | 7 | 8 | 9 | 1 | 11 |
1 | 1992 | 9 | 4 | 5 | 13 | 1 | 13 |
1 | 1993 | 12 | 13 | 14 | 11 | 1 | 11 |
2 | 2001 | 8 | 11 | 12 | 8 | 1 | 12 |
2 | 2002 | 6 | 12 | 5 | 6 | 1 | 12 |
2 | 2003 | 13 | 5 | 11 | 14 | 1 | 14 |
data have;
input ID year lag_4var lag_5var lag_6var lag_7var treat;
cards;
1 1990 6 7 8 9 0
1 1991 11 7 8 9 1
1 1992 9 4 5 13 1
1 1993 12 13 14 11 1
2 2001 8 11 12 8 1
2 2002 6 12 5 6 1
2 2003 13 5 11 14 1
;
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>10 then k=l;
end;
else k=lag_4var;
run;
To address
ID year lag_4var lag_5var lag_6var lag_7var treat
3 2000 10 9 8 7 1
So in this case, K=1 and the earliest lag#_var greater or equal to 10 is lag_4var, so K=10
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>10 then k=l;
else k=lag_4var;
end;
else k=lag_4var;
run;
data have;
input ID year lag_4var lag_5var lag_6var lag_7var treat;
cards;
1 1990 6 7 8 9 0
1 1991 11 7 8 9 1
1 1992 9 4 5 13 1
1 1993 12 13 14 11 1
2 2001 8 11 12 8 1
2 2002 6 12 5 6 1
2 2003 13 5 11 14 1
;
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>10 then k=l;
end;
else k=lag_4var;
run;
To address
ID year lag_4var lag_5var lag_6var lag_7var treat
3 2000 10 9 8 7 1
So in this case, K=1 and the earliest lag#_var greater or equal to 10 is lag_4var, so K=10
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>10 then k=l;
else k=lag_4var;
end;
else k=lag_4var;
run;
Hi novinosrin thanks for your reply and it works out perfectly. Now if I want to twist the results a little bit, for example, for observations with treat=1, K equals the average value of all lag#_var that are greater than or equals 10. For observations with treat=0. K equals the average value of lag#_var that are less than 10, do you think it is still possible to get?
Yes, slight tweak if I understand you correctly
data want;
set have;
array l lag_4var--lag_7var;
if treat then do over l;
if l>=10 then do;
sum=sum(sum,l);
n=sum(n,1);
end;
end;
else do over l;
if l<10 then do;
sum=sum(sum,l);
n=sum(n,1);
end;
end;
k=sum/n;
run;
How would you score the following record?
ID year lag_4var lag_5var lag_6var lag_7var treat
3 2000 10 9 8 7 1
Art, CEO, AnalystFinder.com
Hi, based on your explanation I think you are looking for something like the following.
Note I used 7 as my cut off to fit my sample input data
/* Set up sample input data */
data input ;
infile cards ;
input var ;
lagVar1=lag1(var) ;
lagVar2=lag2(var) ;
lagVar3=lag3(var) ;
lagVar4=lag4(var) ;
cards;
3
2
8
4
6
7
1
3
2
9
3
4
6
7
;
run ;
/* generate output */
data ouput (drop=i) ;
set input ;
/* Load lag values into an array */
array lags{4} lagVar1-lagVar4 ;
/* Set treat and k to 0 */
treat=0 ;
k=0 ;
/* Loop through array backwards, starting with oldest lag value to most recent lag value */
do i=4 to 1 by -1 ;
/* if the lag value exceeds your trigger */
if lags{i}>7 then do ;
/* Set treat to 1 */
treat=1 ;
/* if k=0 */
if k=0 then
/* then store the lag value */
k=lags{i} ;
end ;
end ;
/* if k=0 */
if k=0 then
/* set k to the most recent lag value */
k=lags{1} ;
run ;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.