BookmarkSubscribeRSS Feed
Khang
Obsidian | Level 7

Variable Investment is called Major investment (for each company) when Investment in a particular year exceeds by 200% of the prior 3-year average investment. So, I want to keep those observations which have major investment. for identification of year I have fyear variable and for the company name, I use a code CUSIP.

Secondly, Built investment is if Investment is greater than or equal to 30% of total assets (TA).
(I am trying to learn the coding system, I shall be Thankful if someone provides me simple and understandable codes)

10 REPLIES 10
Tom
Super User Tom
Super User

Post some sample data and what you want the output to look like.  It is probably best if you post just a few years of data and if you just make up the numbers and ids rather than posting real data.  That will keep the post small and make it easy for you to manually calculate the desired new variables. Do NOT post the data as XLS file.  Just type the numbers (or copy and paste as plain text) into the popup window that appears when you click on the Insert Code icon in the editor.

Khang
Obsidian | Level 7

Here is sample data. and I am also attaching a jpg file for output reference.

CUSIP	Fyear	TA	invest
1.1161	1999	307.279	14.216
1.1161	2000	2381.836	113.155
1.1161	2001	978.825	97.702
1.1161	2002	610.318	25.331
1.1161	2003	528.615	-68.854
1.1161	2004	780.99	95.234
1.1161	2005	573.504	55.255
1.4068	1991	1.632	0.091
1.4068	1992	2.73	0.118
1.4068	1993	10.949	0.346
1.4068	1994	7.576	0.699
1.4068	1995	27.49	0.134
1.4068	1996	113.026	-23.518
1.4068	1997	103.103	-19.758
1.4068	1998	93.98	5.424
1.4068	1999	92.203	4.959
1.4068	2000	65.035	1.862
1.4068	2001	49.697	0.807
1.4068	2002	53.42	0.298
1.4068	2003	130.106	0.413
1.4068	2004	115.184	-3.582

example.png
mkeintz
PROC Star

This program puts lag functions in the calculation of MAJOR and BUILT.  But note that there is no IF statement containing a LAG in the "then" clause, assuring that the queus behind the lag  functions are always updated, and therefore in synchrony with the data, regardless of the lagged values.   Also I multiply the boolean expressions rather that  using AND - because SAS is smart enough to not test the right side of the AND if the left side is false - which again would put some lag functions out  of sync.

 

data want;
  set have;
  BUILT=(INVEST>0.3*TA);
  MAJOR=(INVEST > 3*MEAN(LAG(INVEST),LAG2(INVEST),LAG3(INVEST))) 
        * (LAG3(FYEAR)=FYEAR-3) 
        * (LAG3(CUSIP)=CUSIP);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

It would be easy, if there is no gap between two year for the same cusip.

 

data have;
infile cards expandtabs truncover;
input CUSIP	$ Fyear	TA	invest;
cards;
1.1161	1999	307.279	14.216
1.1161	2000	2381.836	113.155
1.1161	2001	978.825	97.702
1.1161	2002	610.318	25.331
1.1161	2003	528.615	-68.854
1.1161	2004	780.99	95.234
1.1161	2005	573.504	55.255
1.4068	1991	1.632	0.091
1.4068	1992	2.73	0.118
1.4068	1993	10.949	0.346
1.4068	1994	7.576	0.699
1.4068	1995	27.49	0.134
1.4068	1996	113.026	-23.518
1.4068	1997	103.103	-19.758
1.4068	1998	93.98	5.424
1.4068	1999	92.203	4.959
1.4068	2000	65.035	1.862
1.4068	2001	49.697	0.807
1.4068	2002	53.42	0.298
1.4068	2003	130.106	0.413
1.4068	2004	115.184	-3.582
;
run;

data want;
 set have;
 lag_ta=lag(ta);
 if cusip=lag(cusip) then do;
  if invest > 0.3*lag_ta then built=1;
 end;

 lag1=lag(invest);
 lag2=lag2(invest);
 if cusip=lag2(cusip) then do;
   e2=2*mean(lag1,lag2,invest);
   if invest > e2 then major_inv=1;
 end;
drop lag:;
run;

 

 

Khang
Obsidian | Level 7

I understand your these codes, But here is nothing mentioned about fyear which is a year in my data, If I apply these codes may be they do not consider year in it.

data want;
 set have;
 lag_ta=lag(ta);
 if cusip=lag(cusip) then do;
  if invest > 0.3*lag_ta then built=1;
 end;

 lag1=lag(invest);
 lag2=lag2(invest);
 if cusip=lag2(cusip) then do;
   e2=2*mean(lag1,lag2,invest);
   if invest > e2 then major_inv=1;
 end;
drop lag:;
run;
Ksharp
Super User
As long as the year is ordered by ascending as your data showed, and there is no gap between two year, code should work.
You can test it on your own .


Khang
Obsidian | Level 7
year is in ascending order but there is gap between the years. The two codes gives me different solutions, one provided by you and other by mkeintz
Ksharp
Super User

You can fill these gap by this :

 

 

 

data have;
infile cards expandtabs truncover;
input CUSIP	$ Fyear	TA	invest;
cards;
1.1161	1999	307.279	14.216
1.1161	2000	2381.836	113.155
1.1161	2001	978.825	97.702
1.1161	2002	610.318	25.331
1.1161	2003	528.615	-68.854
1.1161	2004	780.99	95.234
1.1161	2005	573.504	55.255
1.4068	1991	1.632	0.091
1.4068	1992	2.73	0.118
1.4068	1993	10.949	0.346
1.4068	1994	7.576	0.699
1.4068	1995	27.49	0.134
1.4068	1996	113.026	-23.518
1.4068	1997	103.103	-19.758
1.4068	1998	93.98	5.424
1.4068	1999	92.203	4.959
1.4068	2000	65.035	1.862
1.4068	2001	49.697	0.807
1.4068	2002	53.42	0.298
1.4068	2003	130.106	0.413
1.4068	2004	115.184	-3.582
;
run;
proc summary data=have;
var fyear;
output out=year min=min max=max;
run;
data temp;
 set year;
 do fyear=min to max;
  output;
 end;
run;
proc sql;
create table want as
 select a.*,b.TA,b.invest
  from 
  (select * from 
(select distinct cusip from have),
(select distinct fyear from temp)
) as a left join have as b
on a.cusip=b.cusip and a.fyear=b.fyear ;
quit;
mkeintz
PROC Star

What precisely do you want to do about gaps?   Do you want to keep only windows with no gaps (which is what my program does)? 

 

Or do you want to accomodate such gaps?  If so, how do you want to do it?  Reach back for the most recent three records (regardless of gap)?   Take the mean of all records within the last three fiscal years, even if there are only two such records (or one such record)?

 

Please specify what you want to do about gaps.  If you wnat to somehow include windows with gaps, does that mean you want data for the 2nd and 3rd record for each cusip?  (What if your first years are 2001 2002 2003 2004 ..., or  2001 2004 2005 2006 ...).

 

Also in my program, I used

   (INVEST > 300*MEAN(LAG(INVEST),LAG2(INVEST),LAG3(INVEST)))

It should have been

   (INVEST > 3*MEAN(LAG(INVEST),LAG2(INVEST),LAG3(INVEST)))

Or possibly

   (INVEST >= 3*MEAN(LAG(INVEST),LAG2(INVEST),LAG3(INVEST)))

 

depending on whether you want current invest over 200% greater than prior 3 year average, or  at least 200% greater than ...

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Jahanzaib
Quartz | Level 8
I wanted to calculate current invest over 200% greater than prior 3 year average, Thanks I found the solution.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2312 views
  • 6 likes
  • 5 in conversation