Programming the statistical procedures from SAS

Need to do OLS Regression on previous observations

Reply
Occasional Contributor
Posts: 7

Need to do OLS Regression on previous observations

Hello everyone,

I have data on businesses, the year, and that businesses revenues for each year like below:

Header 1Header 2Header 3
BusinessYearRevenues

ABC

2000100000
ABC2001100025
ABC2002100035
ABC2003148520
ABC2004132454
ABC2005456214
ACME1996154741
BBB199845214
BBB199925632

I need to write code that will do the following:

1) Perform OLS regression by each company but using the last five years of data. For example, the regression for ABC in 2005 would make the DV the revenues for years 2000-2004 and the sole IV the year variable. I know how to do this in excel, but I want to automate everything into SAS so I am looking for help. The larger purpose is to get the standard error of the year beta coefficient and then divide it by the average revenues for those five years. This is used as a measure of uncertainty in the field I research.

2) I will need to be able to output this standard beta coefficient into a data table (assuming I would use an output command) and then merge it back into other datasets.

Other Notes:

1) As shown in the table above, not all businesses have at least 6 years of data. However, if there are not five years of prior data, then no regression needs to be run.

2) If it helps, I have been able to get my data to look like this: However, I do not believe I can run a regression this way. I believe I need rows of observations.

Header 1Header 2Header 3Header 4Header 5Header 6Header 7
FirmYearYear -1 RevenuesYear -2 RevsYear -3 RevsYear -4 RevsYear -5 Revs
ACME2005132454148520100035100025100000

Thank you very much for all help/hints.

-David

Super User
Posts: 18,586

Re: Need to do OLS Regression on previous observations

Here's one way:

1. Sort your data by Business and DESCENDING YEAR

2 (or 3). In a data step identify the 5 years of records available for regression analysis using BY value and counters.

3 (or 2). Subset only firms that have 5/6 years + (*depends on order)

4. Run regression using by variables and you'll get a single table with all coefficients for all companies/years.

Steps 2/3 can probably be done in a single SQL step if you're familiar with SQL and don't mind a bigger proc.

And yes, your data needs to be in the format you have for table 1, not table 2. You can use a proc transpose to transform it, or a data step.

Though you can do steps 1/2/3 easily in that form but how do you know what years go with what values? If that's not important you can fairly easily transpose and do steps1-3 in your transpose step.

Occasional Contributor
Posts: 7

Re: Need to do OLS Regression on previous observations

Hi Reeza,

Thanks for the information. I believe I left out a key piece of information. Some of the companies have more than 6 years of data. For example, ABC may have ten years of data from 2000-2009. Therefore, I need to run multiple regressions. One for 2005 on 2000-2004 data. A second for 2006 on 2001-2005 data. A third for 2007 on 2002-2006 data, and so on.

I may be misunderstanding something; but will your suggestion of using BY values and counters still work?  I believe I am having trouble visualizing how to increment the BY counter through each 5 year iteration. For example, if I have a counter that creates a BY group for 2000-2004 to do a regression for 2005, can I use a loop and change the value for 2000's revenue data to not be included in the 2006 regression while including 2005's data for the 2006 regression?

Thanks so much for the help.

Super User
Posts: 18,586

Re: Need to do OLS Regression on previous observations

Art's code accounts for all of that Smiley Happy

Basically create a new output group for each set of years that the analysis is running for. e.g

2008 - 2007 to 2003

2007 - 2006 to 2002

In a large dataset this may take up additional space to create but it avoids going into a macro solution. 

Super User
Posts: 9,777

Re: Need to do OLS Regression on previous observations

Use the following code to get the unique combination of Business and Year. then use call execute to invoke proc reg to go through all your data.

proc sort data=have out=temp nodupkey;

  by Business Year;

run;


data _null_;

set temp;

call execute(........)



Xia Keshan

PROC Star
Posts: 7,417

Re: Need to do OLS Regression on previous observations

You would have to adjust the following if you are missing years for any company (other than starting and ending years), but this might be an easy way to accomplish what you want:

data have;

  input company $ year revenues;

  cards;

a 2002 10

a 2003 50

a 2004 80

a 2005 90

a 2006 110

a 2007 120

a 2008 140

b 2004 58

b 2005 88

b 2006 118

b 2007 138

b 2008 175

c 2004 58

c 2005 88

c 2006 118

c 2007 138

c 2008 175

;

proc sort data=have out=need;

  by company year;

run;

data need (keep=company a_year year revenues);

  set need;

  by company;

  retain counter .;

  array stacka {0:4};

  array stackb {0:4};

  retain stacka stackb;

  if first.company then do;

    call missing(of stacka{*});

    call missing(of stackb{*});

    counter=0;

  end;

  counter+1;

  stacka{mod(counter,5)} = revenues;

  stackb{mod(counter,5)} = year;

  if counter gt 4 then do;

    a_year=year+1;

    do i=0 to 4;

      revenues=stacka{i};

      year=stackb{i};

      output;

    end;

  end;

run;

proc reg data=need;

  model revenues=year;

  by company a_year;

run;

PROC Star
Posts: 7,417

Re: Need to do OLS Regression on previous observations

If my previous suggestion accomplishes what you want, but you do have to skip groups of years where you are missing one or more years within five year ranges, the following code has an additional if statement to control for that:

data have;

  input company $ year revenues;

  cards;

a 2002 10

a 2003 50

a 2004 80

a 2005 90

a 2006 110

a 2007 120

a 2008 140

b 2004 58

b 2005 88

b 2006 118

b 2007 138

b 2008 175

c 2004 58

c 2005 88

c 2006 118

c 2007 138

c 2008 175

c 2010 185

;

proc sort data=have out=need;

  by company year;

run;

data need (keep=company a_year year revenues);

  set need;

  by company;

  retain counter .;

  array stacka {0:4};

  array stackb {0:4};

  retain stacka stackb;

  if first.company then do;

    call missing(of stacka{*});

    call missing(of stackb{*});

    counter=0;

  end;

  counter+1;

  stacka{mod(counter,5)} = revenues;

  stackb{mod(counter,5)} = year;

  if counter gt 4 and ((year+1)*5-15 eq sum(of stackb{*})) then do;

    a_year=year+1;

    do i=0 to 4;

      revenues=stacka{i};

      year=stackb{i};

      output;

    end;

  end;

run;

proc reg data=need;

  model revenues=year;

  by company a_year;

run;

Ask a Question
Discussion stats
  • 6 replies
  • 377 views
  • 1 like
  • 4 in conversation