I have panel data on companies with earnings information by year as follows:
Company_id Year Earnings
123 2000 5.5
123 2001 6.7
123 2002 7.1
123 2003 7.2
123 2004 6.9
123 2005 8.1
456 2000 15.5
456 2001 16.7
456 2002 17.1
456 2003 17.2
456 2004 16.9
456 2005 28.1
Company_id is formatted as a character and year/earnings are formatted as numeric.
I need to calculate the sum of future earnings for the next five years as well as the standard deviation of those earnings. The correct output would be as follows:
Company_id | Year | Earnings | Earnings_5yr | std dev |
123 | 2000 | 5.5 | 36.00 | 0.538516 |
123 | 2001 | 6.7 | . | . |
123 | 2002 | 7.1 | . | . |
123 | 2003 | 7.2 | . | . |
123 | 2004 | 6.9 | . | . |
123 | 2005 | 8.1 | . | . |
456 | 2000 | 15.5 | 96.00 | 4.978956 |
456 | 2001 | 16.7 | . | . |
456 | 2002 | 17.1 | . | . |
456 | 2003 | 17.2 | . | . |
456 | 2004 | 16.9 | . | . |
456 | 2005 | 28.1 | . | . |
I want to make sure to only sum earnings within a Company_id.
I have tried the following code without success. (Inspired by this post).
proc sql; create table future_earn as select distinct
gvkey, fyear, (select sum(eps) from comp3 where fyear between (a.fyear+1) and (a.fyear+5)) as eps_5yr,
(select std(eps) from comp3 where fyear between (a.fyear+1) and (a.fyear+5)) as stddev_5yr
from comp3 as a
group by gvkey;
quit;
run;
The above generates values for Earnings_5yr and stddev_5yr, but does not seem to be calculating correctly and/or restricting calculations to be within a Company_id. I do receive the following warning in SAS: "WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function."
I have SAS 9.4. I appreciate the help.
Please try this code and see if it helps.
DATA TEMP;
INPUT Company_id Year Earnings;
CARDS;
123 2000 5.5
123 2001 6.7
123 2002 7.1
123 2003 7.2
123 2004 6.9
123 2005 8.1
456 2000 15.5
456 2001 16.7
456 2002 17.1
456 2003 17.2
456 2004 16.9
456 2005 28.1
;
RUN;
DATA TEMP_C;
SET TEMP;
RENAME Company_id=New_Company_id Year=New_Year Earnings=New_Er ;
RUN;
DATA TEMP1;
IF _n_=1 THEN DO;
IF 0 THEN SET TEMP;
DECLARE HASH h1(multidata:'y');
h1.DEFINEKEY('Company_id','Year');
h1.DEFINEDATA('Earnings');
h1.DEFINEDONE();
END;
DO UNTIL(LAST.Company_id);
SET TEMP;
BY Company_id;
h1.add();
END;
DO UNTIL(LAST.New_Company_id);
SET TEMP_C;
BY New_Company_id;
ARRAY nxt_5{5} _temporary_;
DO i=1 to 5;
TEMP_Year= New_Year+i;
rc=h1.find(key:New_Company_id,key:TEMP_Year);
IF rc=0 THEN nxt_5{i}=Earnings;
ELSE LEAVE;
END;
TOTAL=SUM(OF nxt_5{*});
Std_dev= STD(OF nxt_5{*});
IF i<6 THEN DO;
FINAL_TOTAL=.;
FINAL_STD_DEV=.;
END;
ELSE DO;
FINAL_TOTAL=TOTAL;
FINAL_STD_DEV=Std_dev;
END;
OUTPUT;
END;
h1.clear();
RUN;
DATA TEMP_FINAL;
SET TEMP1(DROP= i rc Total Std_dev TEMP_Year Company_id Year Earnings);
RENAME New_Company_id=Company_id
New_Year=Year
New_Er=Earnings
FINAL_TOTAL= Sum_Next_5_years
FINAL_STD_DEV= Standard_Dev_Next_5_years
;
RUN;
PROC PRINT DATA= TEMP_FINAL;
RUN;
PROC DATASETS NOLIST;
DELETE TEMP1;
RUN;
How did you get 36 and 96 respectively. When i totalled, the values seem off
The sum is for future earnings. So for 2000, one sums the earnings for 2001-2005.
Do you have SAS/ETS? Run the following to see in your log:
proc product_status;run;
If you do, PROC EXPAND is significantly easier.
See the examples here:
And the different type of calculations you can do here:
You'll have to play around to get exactly what you want.
Please try this code and see if it helps.
DATA TEMP;
INPUT Company_id Year Earnings;
CARDS;
123 2000 5.5
123 2001 6.7
123 2002 7.1
123 2003 7.2
123 2004 6.9
123 2005 8.1
456 2000 15.5
456 2001 16.7
456 2002 17.1
456 2003 17.2
456 2004 16.9
456 2005 28.1
;
RUN;
DATA TEMP_C;
SET TEMP;
RENAME Company_id=New_Company_id Year=New_Year Earnings=New_Er ;
RUN;
DATA TEMP1;
IF _n_=1 THEN DO;
IF 0 THEN SET TEMP;
DECLARE HASH h1(multidata:'y');
h1.DEFINEKEY('Company_id','Year');
h1.DEFINEDATA('Earnings');
h1.DEFINEDONE();
END;
DO UNTIL(LAST.Company_id);
SET TEMP;
BY Company_id;
h1.add();
END;
DO UNTIL(LAST.New_Company_id);
SET TEMP_C;
BY New_Company_id;
ARRAY nxt_5{5} _temporary_;
DO i=1 to 5;
TEMP_Year= New_Year+i;
rc=h1.find(key:New_Company_id,key:TEMP_Year);
IF rc=0 THEN nxt_5{i}=Earnings;
ELSE LEAVE;
END;
TOTAL=SUM(OF nxt_5{*});
Std_dev= STD(OF nxt_5{*});
IF i<6 THEN DO;
FINAL_TOTAL=.;
FINAL_STD_DEV=.;
END;
ELSE DO;
FINAL_TOTAL=TOTAL;
FINAL_STD_DEV=Std_dev;
END;
OUTPUT;
END;
h1.clear();
RUN;
DATA TEMP_FINAL;
SET TEMP1(DROP= i rc Total Std_dev TEMP_Year Company_id Year Earnings);
RENAME New_Company_id=Company_id
New_Year=Year
New_Er=Earnings
FINAL_TOTAL= Sum_Next_5_years
FINAL_STD_DEV= Standard_Dev_Next_5_years
;
RUN;
PROC PRINT DATA= TEMP_FINAL;
RUN;
PROC DATASETS NOLIST;
DELETE TEMP1;
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.