Statistical Procedures

buekie · Posted 11-03-2014 08:57 AM

Dear,

I would like to do a trend or a regression analysis. Following data is known:

year average standard deviation

1996 200 10

1997 210 15

1998 220 15

etc.

I'm familiar with the normal proc reg analysis.

All I want to do now is to see if there is a significant trend but first I do not know how to read in these data (average, standard deviation).

Thank you in advance,

BJ

PaigeMiller · Posted 11-03-2014 09:03 AM

Ordinary least squares, as performed by PROC REG, will still provide you with the least squares estimate of the slope and intercept.

The t-tests and F-tests will not be correct, as they apply only to the case where the errors are i.i.d. normally distributed.

If you know the standard deviation of each data point, then you could use a weighted least squares to obtain statistical tests.

By the way, when you say average of 200 and standard deviation of 10, what is this the average and standard deviation of?

--
Paige Miller

buekie · Posted 11-03-2014 09:53 AM

Thank you for the quick answer!

It was just a hypothetical example. Goal is to analyse the trend analysis of air quality impact (expressed in disability adjusted life years or DALYs).

I'm familiar with the PROC REG procedure which I used many years ago for my PhD but in that time we always started from the raw data.

Now I only have the average and the standard deviation. How do you put in into SAS?

Thank you,

J

SteveDenham · Posted 11-03-2014 10:03 AM

Try this (untested code) working from your example:

data have;

input year average standard_deviation;

cards;

1996 200 10

1997 210 15

1998 220 15

...

;

data want;

set have;

wt = 1/(standard_deviation;*standard_deviation); /* Makes the weight proportional to the reciprocal of variance, so the estimates are BLUE */

run;

proc reg data=want;

model average=year;

weight=wt;

run;

Steve Denham

Message was edited by: Steve Denham

PaigeMiller · Posted 11-03-2014 10:57 AM

It was just a hypothetical example. Goal is to analyse the trend analysis of air quality impact (expressed in disability adjusted life years or DALYs).

I'm familiar with the PROC REG procedure which I used many years ago for my PhD but in that time we always started from the raw data

I'm not convinced that this standard deviation that you are describing is meaningful here (or perhaps it is indeed meaningful, but in ways that are not what you are implying).

The condition needed to generate valid t-tests and F-tests is that the ERRORs around the regression line are independent and identically distributed as a normal distribution. The condition has NOTHING to do with the standard deviation of the air quality impact numbers during the year.

Thus, I'm also skeptical that the weighted regression is appropriate here, and so I would not (yet) recommend using the code above.

In fact, I would advise fitting the regression to the averages (ignoring the standard deviations), and then plotting the residuals in any one of a number of ways to see if they are normally distributed, and to see if they (somewhat) systematically get larger or smaller as the average increases.

Now, it may be true that a weighted regression is needed, because the ERRORs are not i.i.d. normal, but nowhere has that claim been made or implied.

--
Paige Miller

art297 · Posted 11-03-2014 10:16 AM

One correction to Steve's suggested code: in the weight statement, change the '=' sign to a space.

Here is a more brute force solution that will provide the same parameter estimates, but also provide possibly more meaningful numbers for the various other results output;

data have;

input year average sd;

cards;

1996 200 10

1997 210 15

1998 220 15

;

data base;

input score;

cards;

1

2

3

4

5

6

7

8

9

10

;

filename doit temp;

data _null_;

file doit;

set have;

stmt=catx(' ','proc standard data=base mean=',average,' std=',sd,

' out=stndized;run;');

put stmt;

stmt=catx(' ','data stndized; retain year',year,'; set stndized;run;');

put stmt;

stmt=catt('proc append base=want data=stndized;run;');

put stmt;

run;

%include doit;

proc reg data=want;

model score=year;

run;

SteveDenham · Posted 11-03-2014 10:20 AM

Thanks, Art. I should just put the hex on cut and paste...

Steve Denham

data_null__ · Posted 11-03-2014 01:02 PM

You may be able to adapt this one-way ANOVA example to your application.

25020 - One-way ANOVA on summary data

buekie · Posted 11-04-2014 02:51 AM

Thank you for the update. I'll play around with the data today!

Kind regards,

J

Statistical Procedures

How to do a trend or regression analysis if mean and standard deviation values are known

Re: How to do a trend or regression analysis if mean and standard deviation values are known

Re: How to do a trend or regression analysis if mean and standard deviation values are known

Re: How to do a trend or regression analysis if mean and standard deviation values are known

Re: How to do a trend or regression analysis if mean and standard deviation values are known

Re: How to do a trend or regression analysis if mean and standard deviation values are known

Re: How to do a trend or regression analysis if mean and standard deviation values are known

Re: How to do a trend or regression analysis if mean and standard deviation values are known

Re: How to do a trend or regression analysis if mean and standard deviation values are known

Automatic Linearization Using the OPTMODEL Procedure: Least Absolute D...

Pooled standard deviation

선형회귀(Linear Regression)

trend analysis

standard deviation

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...