Variable Investment is called Major investment (for each company) when Investment in a particular year exceeds by 200% of the prior 3-year average investment. So, I want to keep those observations which have major investment. for identification of year I have fyear variable and for the company name, I use a code CUSIP.
Secondly, Built investment is if Investment is greater than or equal to 30% of total assets (TA).
(I am trying to learn the coding system, I shall be Thankful if someone provides me simple and understandable codes)
Post some sample data and what you want the output to look like. It is probably best if you post just a few years of data and if you just make up the numbers and ids rather than posting real data. That will keep the post small and make it easy for you to manually calculate the desired new variables. Do NOT post the data as XLS file. Just type the numbers (or copy and paste as plain text) into the popup window that appears when you click on the Insert Code icon in the editor.
Here is sample data. and I am also attaching a jpg file for output reference.
CUSIP Fyear TA invest
1.1161 1999 307.279 14.216
1.1161 2000 2381.836 113.155
1.1161 2001 978.825 97.702
1.1161 2002 610.318 25.331
1.1161 2003 528.615 -68.854
1.1161 2004 780.99 95.234
1.1161 2005 573.504 55.255
1.4068 1991 1.632 0.091
1.4068 1992 2.73 0.118
1.4068 1993 10.949 0.346
1.4068 1994 7.576 0.699
1.4068 1995 27.49 0.134
1.4068 1996 113.026 -23.518
1.4068 1997 103.103 -19.758
1.4068 1998 93.98 5.424
1.4068 1999 92.203 4.959
1.4068 2000 65.035 1.862
1.4068 2001 49.697 0.807
1.4068 2002 53.42 0.298
1.4068 2003 130.106 0.413
1.4068 2004 115.184 -3.582
This program puts lag functions in the calculation of MAJOR and BUILT. But note that there is no IF statement containing a LAG in the "then" clause, assuring that the queus behind the lag functions are always updated, and therefore in synchrony with the data, regardless of the lagged values. Also I multiply the boolean expressions rather that using AND - because SAS is smart enough to not test the right side of the AND if the left side is false - which again would put some lag functions out of sync.
data want;
set have;
BUILT=(INVEST>0.3*TA);
MAJOR=(INVEST > 3*MEAN(LAG(INVEST),LAG2(INVEST),LAG3(INVEST)))
* (LAG3(FYEAR)=FYEAR-3)
* (LAG3(CUSIP)=CUSIP);
run;
It would be easy, if there is no gap between two year for the same cusip.
data have;
infile cards expandtabs truncover;
input CUSIP $ Fyear TA invest;
cards;
1.1161 1999 307.279 14.216
1.1161 2000 2381.836 113.155
1.1161 2001 978.825 97.702
1.1161 2002 610.318 25.331
1.1161 2003 528.615 -68.854
1.1161 2004 780.99 95.234
1.1161 2005 573.504 55.255
1.4068 1991 1.632 0.091
1.4068 1992 2.73 0.118
1.4068 1993 10.949 0.346
1.4068 1994 7.576 0.699
1.4068 1995 27.49 0.134
1.4068 1996 113.026 -23.518
1.4068 1997 103.103 -19.758
1.4068 1998 93.98 5.424
1.4068 1999 92.203 4.959
1.4068 2000 65.035 1.862
1.4068 2001 49.697 0.807
1.4068 2002 53.42 0.298
1.4068 2003 130.106 0.413
1.4068 2004 115.184 -3.582
;
run;
data want;
set have;
lag_ta=lag(ta);
if cusip=lag(cusip) then do;
if invest > 0.3*lag_ta then built=1;
end;
lag1=lag(invest);
lag2=lag2(invest);
if cusip=lag2(cusip) then do;
e2=2*mean(lag1,lag2,invest);
if invest > e2 then major_inv=1;
end;
drop lag:;
run;
I understand your these codes, But here is nothing mentioned about fyear which is a year in my data, If I apply these codes may be they do not consider year in it.
data want;
set have;
lag_ta=lag(ta);
if cusip=lag(cusip) then do;
if invest > 0.3*lag_ta then built=1;
end;
lag1=lag(invest);
lag2=lag2(invest);
if cusip=lag2(cusip) then do;
e2=2*mean(lag1,lag2,invest);
if invest > e2 then major_inv=1;
end;
drop lag:;
run;
As long as the year is ordered by ascending as your data showed, and there is no gap between two year, code should work. You can test it on your own .
You can fill these gap by this :
data have;
infile cards expandtabs truncover;
input CUSIP $ Fyear TA invest;
cards;
1.1161 1999 307.279 14.216
1.1161 2000 2381.836 113.155
1.1161 2001 978.825 97.702
1.1161 2002 610.318 25.331
1.1161 2003 528.615 -68.854
1.1161 2004 780.99 95.234
1.1161 2005 573.504 55.255
1.4068 1991 1.632 0.091
1.4068 1992 2.73 0.118
1.4068 1993 10.949 0.346
1.4068 1994 7.576 0.699
1.4068 1995 27.49 0.134
1.4068 1996 113.026 -23.518
1.4068 1997 103.103 -19.758
1.4068 1998 93.98 5.424
1.4068 1999 92.203 4.959
1.4068 2000 65.035 1.862
1.4068 2001 49.697 0.807
1.4068 2002 53.42 0.298
1.4068 2003 130.106 0.413
1.4068 2004 115.184 -3.582
;
run;
proc summary data=have;
var fyear;
output out=year min=min max=max;
run;
data temp;
set year;
do fyear=min to max;
output;
end;
run;
proc sql;
create table want as
select a.*,b.TA,b.invest
from
(select * from
(select distinct cusip from have),
(select distinct fyear from temp)
) as a left join have as b
on a.cusip=b.cusip and a.fyear=b.fyear ;
quit;
What precisely do you want to do about gaps? Do you want to keep only windows with no gaps (which is what my program does)?
Or do you want to accomodate such gaps? If so, how do you want to do it? Reach back for the most recent three records (regardless of gap)? Take the mean of all records within the last three fiscal years, even if there are only two such records (or one such record)?
Please specify what you want to do about gaps. If you wnat to somehow include windows with gaps, does that mean you want data for the 2nd and 3rd record for each cusip? (What if your first years are 2001 2002 2003 2004 ..., or 2001 2004 2005 2006 ...).
Also in my program, I used
(INVEST > 300*MEAN(LAG(INVEST),LAG2(INVEST),LAG3(INVEST)))
It should have been
(INVEST > 3*MEAN(LAG(INVEST),LAG2(INVEST),LAG3(INVEST)))
Or possibly
(INVEST >= 3*MEAN(LAG(INVEST),LAG2(INVEST),LAG3(INVEST)))
depending on whether you want current invest over 200% greater than prior 3 year average, or at least 200% greater than ...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.