Solved
New Contributor
Posts: 3

# Running a regression on averaged yearly variables

Hello, I am fairly new to SAS

I am trying to run a regression on varaibles averaged from yearly data.  My SAS file imports a excel CSV in the following format:

YEAR    COUNTRY    GDP

1991     A                   100

1991     B                   200

1991     C                   300

1992     A                   150

1992     B                   250

1992     C                   350

1993     A                   200

1993     B                   300

1993     C                   400

And so on.  Is there a way to average the variable GDP for a particular country over the various years?  Without the averages my observations read is very high = 2000+ and In doing averages I would like to limit it my observations to the number of countries in my data set = 77.  Any help is appreciated.

Accepted Solutions
Solution
‎04-05-2016 06:54 PM
Super User
Posts: 13,583

## Re: Running a regression on averaged yearly variables

Now you're mentioning more variables that need to be averaged...

This is extensible:

``````proc summary data=have nway;
class country;
var gdp;  /*<= put all of the variables you want averaged */
output out=summary mean=;

run;
``````

This will give a data set with one record per country, value for country and the means of all of the variables that you specify on the VAR line. And the way this code works the name of the variable stays the same.

All Replies
Super User
Posts: 13,583

## Re: Running a regression on averaged yearly variables

Averaging by the country, which I have to assume means ignore the year given your example data&colon;

proc summary data=have nway;

class country;

var gdp;

output out=summary mean=;

run;

What is the analysis question you are trying to answer? I'm not sure that the summary would be desireable based for many purposes.

And 2000 records isn't really that many to turn SAS Prog Reg or similar procedures loose on.

New Contributor
Posts: 3

## Re: Running a regression on averaged yearly variables

I am doing a replication study of a econometric paper, which measures foreign aid's impact upon economic growth.  The study focuses on developing countries, which generally have many gaps in reported data year to year.  Therefore most studies average variables such as population growth, investment, government spending and so forth.  When I run a exact replication without averaging, which inevitability corrects for errors in data, my model becomes essentially useless.

So then from your example how would I implement that into into a simple OLS?

For example my current model is as follows:

PROC REG;

MODEL GDPCG = GDP POPG INV GOV ODA ODAS;

RUN;

How would I take the code you displayed and implement that into a model.  Would I set output out= summary mean= AVGGDP; (being an average of GDP?  In order to run the regression for averaged values rather than every instance.  Thanks for your response.

Super User
Posts: 13,583

## Re: Running a regression on averaged yearly variables

To run any regression you will have to get those mean values associated with the other variables. If your other variables change from year to year, which seems likely, then using those averages seems like a poor idea.

You may also have to consider whether results in one year are dependent on the previous year. If there is a dependency then Year is a factor to consider.

I am not any type of economist but I suspect that interventions (changes in aid or what have you) likely do not have an immediate effect and has likely impact over a longer period than a single year. You probably want to look into the SAS/ETS, time series procedures for better approaches on time and econometric modeling. I don't have access to the ETS so can't help much further.

New Contributor
Posts: 3

## Re: Running a regression on averaged yearly variables

This is all correct, but for the purposes of my class it is meant to be simple.  It will obviously not be the best estimator but instead a learning excercise.  I plan to estimate through a two way fixed effect via proc panel, but until i can average my variables (as is what is required) I cannot continue.  In other words more simplistic terms, instead of:

Model A = B C D E.

I need to run

Model A = The average of B, the average of C, the Average of D, and the average of E.

I realize it may seem to be obscure its just the way it needs to be done for the purposes of my class.

Solution
‎04-05-2016 06:54 PM
Super User
Posts: 13,583

## Re: Running a regression on averaged yearly variables

Now you're mentioning more variables that need to be averaged...

This is extensible:

``````proc summary data=have nway;
class country;
var gdp;  /*<= put all of the variables you want averaged */
output out=summary mean=;

run;
``````

This will give a data set with one record per country, value for country and the means of all of the variables that you specify on the VAR line. And the way this code works the name of the variable stays the same.

🔒 This topic is solved and locked.

Discussion stats
• 5 replies
• 428 views
• 4 likes
• 2 in conversation