DATA Step, Macro, Functions and more

Summing over intervals of time within a group

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Summing over intervals of time within a group

I have panel data on companies with earnings information by year as follows:

 

Company_id     Year     Earnings

123                    2000    5.5

123                    2001    6.7

123                    2002    7.1

123                    2003    7.2

123                    2004    6.9

123                    2005    8.1

456                    2000    15.5

456                   2001    16.7

456                   2002    17.1

456                   2003    17.2

456                   2004    16.9

456                   2005    28.1

 

Company_id is formatted as a character and year/earnings are formatted as numeric. 

 

I need to calculate the sum of future earnings for the next five years as well as the standard deviation of those earnings. The correct output would be as follows:

 

Company_id YearEarningsEarnings_5yrstd dev
123           2000   5.536.000.538516
123           2001   6.7..
123           2002   7.1..
123           2003   7.2..
123           2004   6.9..
123           2005   8.1..
456           2000   15.596.004.978956
456        2001   16.7..
456        2002   17.1..
456        2003   17.2..
456        2004   16.9..
456        2005   28.1..

 

I want to make sure to only sum earnings within a Company_id.

 

I have tried the following code without success. (Inspired by this post). 

 

proc sql; create table future_earn as select distinct
gvkey, fyear, (select sum(eps) from comp3 where fyear between (a.fyear+1) and (a.fyear+5)) as eps_5yr,
(select std(eps) from comp3 where fyear between (a.fyear+1) and (a.fyear+5)) as stddev_5yr
from comp3 as a
group by gvkey;
quit;
run;

The above generates values for Earnings_5yr and stddev_5yr, but does not seem to be calculating correctly and/or restricting calculations to be within a Company_id. I do receive the following warning in SAS: "WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function."  

 

I have SAS 9.4. I appreciate the help.

 


Accepted Solutions
Solution
3 weeks ago
Occasional Contributor
Posts: 15

Re: Summing over intervals of time within a group

Please try this code and see if it helps.

 

 

DATA TEMP;
INPUT Company_id Year Earnings;
CARDS;
123 2000 5.5
123 2001 6.7
123 2002 7.1
123 2003 7.2
123 2004 6.9
123 2005 8.1
456 2000 15.5
456 2001 16.7
456 2002 17.1
456 2003 17.2
456 2004 16.9
456 2005 28.1
;
RUN;


DATA TEMP_C;
SET TEMP;
RENAME Company_id=New_Company_id Year=New_Year Earnings=New_Er ;
RUN;

DATA TEMP1;

IF _n_=1 THEN DO;
IF 0 THEN SET TEMP;

DECLARE HASH h1(multidata:'y');

h1.DEFINEKEY('Company_id','Year');
h1.DEFINEDATA('Earnings');
h1.DEFINEDONE();
END;

DO UNTIL(LAST.Company_id);
SET TEMP;
BY Company_id;
h1.add();

END;

DO UNTIL(LAST.New_Company_id);
SET TEMP_C;
BY New_Company_id;

ARRAY nxt_5{5} _temporary_;

DO i=1 to 5;
TEMP_Year= New_Year+i;
rc=h1.find(key:New_Company_id,key:TEMP_Year);

IF rc=0 THEN nxt_5{i}=Earnings;
ELSE LEAVE;
END;

TOTAL=SUM(OF nxt_5{*});
Std_dev= STD(OF nxt_5{*});

IF i<6 THEN DO;
FINAL_TOTAL=.;
FINAL_STD_DEV=.;
END;
ELSE DO;
FINAL_TOTAL=TOTAL;
FINAL_STD_DEV=Std_dev;
END;
OUTPUT;
END;

h1.clear();


RUN;


DATA TEMP_FINAL;
SET TEMP1(DROP= i rc Total Std_dev TEMP_Year Company_id Year Earnings);
RENAME New_Company_id=Company_id
New_Year=Year
New_Er=Earnings
FINAL_TOTAL= Sum_Next_5_years
FINAL_STD_DEV= Standard_Dev_Next_5_years
;
RUN;

PROC PRINT DATA= TEMP_FINAL;
RUN;

PROC DATASETS NOLIST;
DELETE TEMP1;
RUN;

View solution in original post


All Replies
PROC Star
Posts: 1,566

Re: Summing over intervals of time within a group

How did you get 36 and 96 respectively. When i totalled, the values seem off 

New Contributor
Posts: 4

Re: Summing over intervals of time within a group

Posted in reply to novinosrin

The sum is for future earnings. So for 2000, one sums the earnings for 2001-2005.

Super User
Posts: 23,247

Re: Summing over intervals of time within a group

Do you have SAS/ETS? Run the following to see in your log:

 

proc product_status;run;

If you do, PROC EXPAND is significantly easier.

New Contributor
Posts: 4

Re: Summing over intervals of time within a group

I believe I do. Could you suggest a solution using PROC EXPAND?
Super User
Posts: 23,247

Re: Summing over intervals of time within a group

See the examples here:

http://documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_expand_examples04.htm&docsetVersion=...

 

And the different type of calculations you can do here:

http://documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_expand_details19.htm&docsetVersion=1...

 

You'll have to play around to get exactly what you want. 

Solution
3 weeks ago
Occasional Contributor
Posts: 15

Re: Summing over intervals of time within a group

Please try this code and see if it helps.

 

 

DATA TEMP;
INPUT Company_id Year Earnings;
CARDS;
123 2000 5.5
123 2001 6.7
123 2002 7.1
123 2003 7.2
123 2004 6.9
123 2005 8.1
456 2000 15.5
456 2001 16.7
456 2002 17.1
456 2003 17.2
456 2004 16.9
456 2005 28.1
;
RUN;


DATA TEMP_C;
SET TEMP;
RENAME Company_id=New_Company_id Year=New_Year Earnings=New_Er ;
RUN;

DATA TEMP1;

IF _n_=1 THEN DO;
IF 0 THEN SET TEMP;

DECLARE HASH h1(multidata:'y');

h1.DEFINEKEY('Company_id','Year');
h1.DEFINEDATA('Earnings');
h1.DEFINEDONE();
END;

DO UNTIL(LAST.Company_id);
SET TEMP;
BY Company_id;
h1.add();

END;

DO UNTIL(LAST.New_Company_id);
SET TEMP_C;
BY New_Company_id;

ARRAY nxt_5{5} _temporary_;

DO i=1 to 5;
TEMP_Year= New_Year+i;
rc=h1.find(key:New_Company_id,key:TEMP_Year);

IF rc=0 THEN nxt_5{i}=Earnings;
ELSE LEAVE;
END;

TOTAL=SUM(OF nxt_5{*});
Std_dev= STD(OF nxt_5{*});

IF i<6 THEN DO;
FINAL_TOTAL=.;
FINAL_STD_DEV=.;
END;
ELSE DO;
FINAL_TOTAL=TOTAL;
FINAL_STD_DEV=Std_dev;
END;
OUTPUT;
END;

h1.clear();


RUN;


DATA TEMP_FINAL;
SET TEMP1(DROP= i rc Total Std_dev TEMP_Year Company_id Year Earnings);
RENAME New_Company_id=Company_id
New_Year=Year
New_Er=Earnings
FINAL_TOTAL= Sum_Next_5_years
FINAL_STD_DEV= Standard_Dev_Next_5_years
;
RUN;

PROC PRINT DATA= TEMP_FINAL;
RUN;

PROC DATASETS NOLIST;
DELETE TEMP1;
RUN;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 123 views
  • 0 likes
  • 4 in conversation