- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear,
I would like to do a trend or a regression analysis. Following data is known:
year average standard deviation
1996 200 10
1997 210 15
1998 220 15
etc.
I'm familiar with the normal proc reg analysis.
All I want to do now is to see if there is a significant trend but first I do not know how to read in these data (average, standard deviation).
Thank you in advance,
BJ
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ordinary least squares, as performed by PROC REG, will still provide you with the least squares estimate of the slope and intercept.
The t-tests and F-tests will not be correct, as they apply only to the case where the errors are i.i.d. normally distributed.
If you know the standard deviation of each data point, then you could use a weighted least squares to obtain statistical tests.
By the way, when you say average of 200 and standard deviation of 10, what is this the average and standard deviation of?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the quick answer!
It was just a hypothetical example. Goal is to analyse the trend analysis of air quality impact (expressed in disability adjusted life years or DALYs).
I'm familiar with the PROC REG procedure which I used many years ago for my PhD but in that time we always started from the raw data.
Now I only have the average and the standard deviation. How do you put in into SAS?
Thank you,
J
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this (untested code) working from your example:
data have;
input year average standard_deviation;
cards;
1996 200 10
1997 210 15
1998 220 15
...
...
...
;
data want;
set have;
wt = 1/(standard_deviation;*standard_deviation); /* Makes the weight proportional to the reciprocal of variance, so the estimates are BLUE */
run;
proc reg data=want;
model average=year;
weight=wt;
run;
Steve Denham
Message was edited by: Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It was just a hypothetical example. Goal is to analyse the trend analysis of air quality impact (expressed in disability adjusted life years or DALYs).
I'm familiar with the PROC REG procedure which I used many years ago for my PhD but in that time we always started from the raw data
I'm not convinced that this standard deviation that you are describing is meaningful here (or perhaps it is indeed meaningful, but in ways that are not what you are implying).
The condition needed to generate valid t-tests and F-tests is that the ERRORs around the regression line are independent and identically distributed as a normal distribution. The condition has NOTHING to do with the standard deviation of the air quality impact numbers during the year.
Thus, I'm also skeptical that the weighted regression is appropriate here, and so I would not (yet) recommend using the code above.
In fact, I would advise fitting the regression to the averages (ignoring the standard deviations), and then plotting the residuals in any one of a number of ways to see if they are normally distributed, and to see if they (somewhat) systematically get larger or smaller as the average increases.
Now, it may be true that a weighted regression is needed, because the ERRORs are not i.i.d. normal, but nowhere has that claim been made or implied.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One correction to Steve's suggested code: in the weight statement, change the '=' sign to a space.
Here is a more brute force solution that will provide the same parameter estimates, but also provide possibly more meaningful numbers for the various other results output;
data have;
input year average sd;
cards;
1996 200 10
1997 210 15
1998 220 15
;
data base;
input score;
cards;
1
2
3
4
5
6
7
8
9
10
;
filename doit temp;
data _null_;
file doit;
set have;
stmt=catx(' ','proc standard data=base mean=',average,' std=',sd,
' out=stndized;run;');
put stmt;
stmt=catx(' ','data stndized; retain year',year,'; set stndized;run;');
put stmt;
stmt=catt('proc append base=want data=stndized;run;');
put stmt;
run;
%include doit;
proc reg data=want;
model score=year;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Art. I should just put the hex on cut and paste...
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You may be able to adapt this one-way ANOVA example to your application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the update. I'll play around with the data today!
Kind regards,
J