BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jcm105
Fluorite | Level 6

Hello, I am fairly new to SAS

I am trying to run a regression on varaibles averaged from yearly data.  My SAS file imports a excel CSV in the following format:

 

YEAR    COUNTRY    GDP

1991     A                   100

1991     B                   200

1991     C                   300

1992     A                   150

1992     B                   250

1992     C                   350 

1993     A                   200

1993     B                   300

1993     C                   400

 

And so on.  Is there a way to average the variable GDP for a particular country over the various years?  Without the averages my observations read is very high = 2000+ and In doing averages I would like to limit it my observations to the number of countries in my data set = 77.  Any help is appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Now you're mentioning more variables that need to be averaged...

 

This is extensible:

proc summary data=have nway;
   class country;
   var gdp;  /*<= put all of the variables you want averaged */
   output out=summary mean=;

run;

This will give a data set with one record per country, value for country and the means of all of the variables that you specify on the VAR line. And the way this code works the name of the variable stays the same.

 

View solution in original post

5 REPLIES 5
ballardw
Super User

Averaging by the country, which I have to assume means ignore the year given your example data&colon;

 

proc summary data=have nway;

   class country;

   var gdp;

   output out=summary mean=;

run;

 

What is the analysis question you are trying to answer? I'm not sure that the summary would be desireable based for many purposes.

And 2000 records isn't really that many to turn SAS Prog Reg or similar procedures loose on.

Jcm105
Fluorite | Level 6

I am doing a replication study of a econometric paper, which measures foreign aid's impact upon economic growth.  The study focuses on developing countries, which generally have many gaps in reported data year to year.  Therefore most studies average variables such as population growth, investment, government spending and so forth.  When I run a exact replication without averaging, which inevitability corrects for errors in data, my model becomes essentially useless. 

 

So then from your example how would I implement that into into a simple OLS?

For example my current model is as follows:

 

PROC REG;

MODEL GDPCG = GDP POPG INV GOV ODA ODAS;

RUN;

 

How would I take the code you displayed and implement that into a model.  Would I set output out= summary mean= AVGGDP; (being an average of GDP?  In order to run the regression for averaged values rather than every instance.  Thanks for your response.

ballardw
Super User

To run any regression you will have to get those mean values associated with the other variables. If your other variables change from year to year, which seems likely, then using those averages seems like a poor idea.

 

You may also have to consider whether results in one year are dependent on the previous year. If there is a dependency then Year is a factor to consider.

 

I am not any type of economist but I suspect that interventions (changes in aid or what have you) likely do not have an immediate effect and has likely impact over a longer period than a single year. You probably want to look into the SAS/ETS, time series procedures for better approaches on time and econometric modeling. I don't have access to the ETS so can't help much further.

Jcm105
Fluorite | Level 6

This is all correct, but for the purposes of my class it is meant to be simple.  It will obviously not be the best estimator but instead a learning excercise.  I plan to estimate through a two way fixed effect via proc panel, but until i can average my variables (as is what is required) I cannot continue.  In other words more simplistic terms, instead of:

 

Model A = B C D E.

 

I need to run

 

Model A = The average of B, the average of C, the Average of D, and the average of E.

 

I realize it may seem to be obscure its just the way it needs to be done for the purposes of my class.

ballardw
Super User

Now you're mentioning more variables that need to be averaged...

 

This is extensible:

proc summary data=have nway;
   class country;
   var gdp;  /*<= put all of the variables you want averaged */
   output out=summary mean=;

run;

This will give a data set with one record per country, value for country and the means of all of the variables that you specify on the VAR line. And the way this code works the name of the variable stays the same.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1278 views
  • 4 likes
  • 2 in conversation