BookmarkSubscribeRSS Feed
DThorn
Calcite | Level 5

Hello everyone,

I have data on businesses, the year, and that businesses revenues for each year like below:

Header 1Header 2Header 3
BusinessYearRevenues

ABC

2000100000
ABC2001100025
ABC2002100035
ABC2003148520
ABC2004132454
ABC2005456214
ACME1996154741
BBB199845214
BBB199925632

I need to write code that will do the following:

1) Perform OLS regression by each company but using the last five years of data. For example, the regression for ABC in 2005 would make the DV the revenues for years 2000-2004 and the sole IV the year variable. I know how to do this in excel, but I want to automate everything into SAS so I am looking for help. The larger purpose is to get the standard error of the year beta coefficient and then divide it by the average revenues for those five years. This is used as a measure of uncertainty in the field I research.

2) I will need to be able to output this standard beta coefficient into a data table (assuming I would use an output command) and then merge it back into other datasets.

Other Notes:

1) As shown in the table above, not all businesses have at least 6 years of data. However, if there are not five years of prior data, then no regression needs to be run.

2) If it helps, I have been able to get my data to look like this: However, I do not believe I can run a regression this way. I believe I need rows of observations.

Header 1Header 2Header 3Header 4Header 5Header 6Header 7
FirmYearYear -1 RevenuesYear -2 RevsYear -3 RevsYear -4 RevsYear -5 Revs
ACME2005132454148520100035100025100000

Thank you very much for all help/hints.

-David

6 REPLIES 6
Reeza
Super User

Here's one way:

1. Sort your data by Business and DESCENDING YEAR

2 (or 3). In a data step identify the 5 years of records available for regression analysis using BY value and counters.

3 (or 2). Subset only firms that have 5/6 years + (*depends on order)

4. Run regression using by variables and you'll get a single table with all coefficients for all companies/years.

Steps 2/3 can probably be done in a single SQL step if you're familiar with SQL and don't mind a bigger proc.

And yes, your data needs to be in the format you have for table 1, not table 2. You can use a proc transpose to transform it, or a data step.

Though you can do steps 1/2/3 easily in that form but how do you know what years go with what values? If that's not important you can fairly easily transpose and do steps1-3 in your transpose step.

DThorn
Calcite | Level 5

Hi Reeza,

Thanks for the information. I believe I left out a key piece of information. Some of the companies have more than 6 years of data. For example, ABC may have ten years of data from 2000-2009. Therefore, I need to run multiple regressions. One for 2005 on 2000-2004 data. A second for 2006 on 2001-2005 data. A third for 2007 on 2002-2006 data, and so on.

I may be misunderstanding something; but will your suggestion of using BY values and counters still work?  I believe I am having trouble visualizing how to increment the BY counter through each 5 year iteration. For example, if I have a counter that creates a BY group for 2000-2004 to do a regression for 2005, can I use a loop and change the value for 2000's revenue data to not be included in the 2006 regression while including 2005's data for the 2006 regression?

Thanks so much for the help.

Reeza
Super User

Art's code accounts for all of that Smiley Happy

Basically create a new output group for each set of years that the analysis is running for. e.g

2008 - 2007 to 2003

2007 - 2006 to 2002

In a large dataset this may take up additional space to create but it avoids going into a macro solution. 

Ksharp
Super User

Use the following code to get the unique combination of Business and Year. then use call execute to invoke proc reg to go through all your data.

proc sort data=have out=temp nodupkey;

  by Business Year;

run;


data _null_;

set temp;

call execute(........)



Xia Keshan

art297
Opal | Level 21

You would have to adjust the following if you are missing years for any company (other than starting and ending years), but this might be an easy way to accomplish what you want:

data have;

  input company $ year revenues;

  cards;

a 2002 10

a 2003 50

a 2004 80

a 2005 90

a 2006 110

a 2007 120

a 2008 140

b 2004 58

b 2005 88

b 2006 118

b 2007 138

b 2008 175

c 2004 58

c 2005 88

c 2006 118

c 2007 138

c 2008 175

;

proc sort data=have out=need;

  by company year;

run;

data need (keep=company a_year year revenues);

  set need;

  by company;

  retain counter .;

  array stacka {0:4};

  array stackb {0:4};

  retain stacka stackb;

  if first.company then do;

    call missing(of stacka{*});

    call missing(of stackb{*});

    counter=0;

  end;

  counter+1;

  stacka{mod(counter,5)} = revenues;

  stackb{mod(counter,5)} = year;

  if counter gt 4 then do;

    a_year=year+1;

    do i=0 to 4;

      revenues=stacka{i};

      year=stackb{i};

      output;

    end;

  end;

run;

proc reg data=need;

  model revenues=year;

  by company a_year;

run;

art297
Opal | Level 21

If my previous suggestion accomplishes what you want, but you do have to skip groups of years where you are missing one or more years within five year ranges, the following code has an additional if statement to control for that:

data have;

  input company $ year revenues;

  cards;

a 2002 10

a 2003 50

a 2004 80

a 2005 90

a 2006 110

a 2007 120

a 2008 140

b 2004 58

b 2005 88

b 2006 118

b 2007 138

b 2008 175

c 2004 58

c 2005 88

c 2006 118

c 2007 138

c 2008 175

c 2010 185

;

proc sort data=have out=need;

  by company year;

run;

data need (keep=company a_year year revenues);

  set need;

  by company;

  retain counter .;

  array stacka {0:4};

  array stackb {0:4};

  retain stacka stackb;

  if first.company then do;

    call missing(of stacka{*});

    call missing(of stackb{*});

    counter=0;

  end;

  counter+1;

  stacka{mod(counter,5)} = revenues;

  stackb{mod(counter,5)} = year;

  if counter gt 4 and ((year+1)*5-15 eq sum(of stackb{*})) then do;

    a_year=year+1;

    do i=0 to 4;

      revenues=stacka{i};

      year=stackb{i};

      output;

    end;

  end;

run;

proc reg data=need;

  model revenues=year;

  by company a_year;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1568 views
  • 1 like
  • 4 in conversation