BookmarkSubscribeRSS Feed
buekie
Calcite | Level 5

Dear,

I would like to do a trend or a regression analysis. Following data is known:

year     average      standard deviation

1996     200                         10

1997     210                         15

1998     220                         15

etc.

I'm familiar with the normal proc reg analysis.

All I want to do now is to see if there is a significant trend but first I do not know how to read in these data (average, standard deviation).

Thank you in advance,

BJ

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Ordinary least squares, as performed by PROC REG, will still provide you with the least squares estimate of the slope and intercept.

The t-tests and F-tests will not be correct, as they apply only to the case where the errors are i.i.d. normally distributed.

If you know the standard deviation of each data point, then you could use a weighted least squares to obtain statistical tests.

By the way, when you say average of 200 and standard deviation of 10, what is this the average and standard deviation of?

--
Paige Miller
buekie
Calcite | Level 5

Thank you for the quick answer!

It was just a hypothetical example. Goal is to analyse the trend analysis of air quality impact (expressed in disability adjusted life years or DALYs).

I'm familiar with the PROC REG procedure which I used many years ago for my PhD but in that time we always started from the raw data.

Now I only have the average and the standard deviation. How do you put in into SAS?

Thank you,

J

SteveDenham
Jade | Level 19

Try this (untested code) working from your example:


data have;

input year     average      standard_deviation;

cards;

1996     200                         10

1997     210                         15

1998     220                         15

...

...

...

;

data want;

set have;

wt = 1/(standard_deviation;*standard_deviation); /* Makes the weight proportional to the reciprocal of variance, so the estimates are BLUE */

run;

proc reg data=want;

model average=year;

weight=wt;

run;

Steve Denham

Message was edited by: Steve Denham

PaigeMiller
Diamond | Level 26

It was just a hypothetical example. Goal is to analyse the trend analysis of air quality impact (expressed in disability adjusted life years or DALYs).

I'm familiar with the PROC REG procedure which I used many years ago for my PhD but in that time we always started from the raw data

I'm not convinced that this standard deviation that you are describing is meaningful here (or perhaps it is indeed meaningful, but in ways that are not what you are implying).

The condition needed to generate valid t-tests and F-tests is that the ERRORs around the regression line are independent and identically distributed as a normal distribution. The condition has NOTHING to do with the standard deviation of the air quality impact numbers during the year.

Thus, I'm also skeptical that the weighted regression is appropriate here, and so I would not (yet) recommend using the code above.

In fact, I would advise fitting the regression to the averages (ignoring the standard deviations), and then plotting the residuals in any one of a number of ways to see if they are normally distributed, and to see if they (somewhat) systematically get larger or smaller as the average increases.

Now, it may be true that a weighted regression is needed, because the ERRORs are not i.i.d. normal, but nowhere has that claim been made or implied.

--
Paige Miller
art297
Opal | Level 21

One correction to Steve's suggested code: in the weight statement, change the '=' sign to a space.

Here is a more brute force solution that will provide the same parameter estimates, but also provide possibly more meaningful numbers for the various other results output;

data have;

  input year average sd;

  cards;

1996 200 10

1997 210 15

1998 220 15

;

data base;

  input score;

  cards;

1

2

3

4

5

6

7

8

9

10

;

filename doit temp;

data _null_;

  file doit;

  set have;

  stmt=catx(' ','proc standard data=base mean=',average,' std=',sd,

              ' out=stndized;run;');

  put stmt;

  stmt=catx(' ','data stndized; retain year',year,'; set stndized;run;');

  put stmt;

  stmt=catt('proc append base=want data=stndized;run;');

  put stmt;

run;

%include doit;

proc reg data=want;

  model score=year;

run;

SteveDenham
Jade | Level 19

Thanks, Art.  I should just put the hex on cut and paste...

Steve Denham

data_null__
Jade | Level 19

You may be able to adapt this one-way ANOVA example to your application.

25020 - One-way ANOVA on summary data

buekie
Calcite | Level 5

Thank you for the update. I'll play around with the data today!

Kind regards,

J

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1993 views
  • 0 likes
  • 5 in conversation