Hi,
I want to do a linear regression on every row and attach the slope as a new column: I have four measurements with unequal intervals, and I need to see the trend. How can I do this in SAS? My desired columns are as follows:
time_1 time_3 time_4 time_5 calculated_slope
Thank you!
Putting variable values in variable names is bad practice. Anyhow, here is a way to do this:
data have;
input id time1 time3 time4 time5;
datalines;
1 1 2 3 4
2 3 2 1 1
;
proc transpose data=have out=temp;
by id;
var time:;
run;
data regr;
set temp;
x = input(compress(_name_,,"kd"), best.);
drop _name_;
run;
proc reg data=regr outest=estm noprint;
by id;
model col1 = x;
run;
data want;
merge have estm(keep=id x rename=x=slope);
by id;
run;
proc print data=want noobs; run;
id time1 time3 time4 time5 slope 1 1 2 3 4 0.74286 2 3 2 1 1 -0.54286
"Slope" implies a change in y as x changes. You don't show any y value(s), or X values; one is missing.
If you values shown are the measurement at the given time then we need to know what the actual intervals are, especially since you say the intervals are unequal.
Hi, the intervals are (1,3,4,5) as in the column names. Here is how it is done in R:
The exception is: I have 4 points, the days are not (1,2,3) but (1,3,4,5), and I only care about the slope. Does that make sense? Thank you.
@asasha wrote:
Hi, the intervals are (1,3,4,5) as in the column names. Here is how it is done in R:
The exception is: I have 4 points, the days are not (1,2,3) but (1,3,4,5), and I only care about the slope. Does that make sense? Thank you.
The functions in R have no corresponding function in SAS. As I said above, you can convert the formula for slope to a SAS formula, using the USS() function, the SUM() function, and other functions as appropriate.
Please show an example of inputs.
This is what I want:
time1 time3 time4 time5 slope
1 2 3 4 0.00000
3 2 1 1 4.00000
1 1 5 1 -1.66667
The slope variable is not currently in the data. The example above does not show correct slopes. Thank you.
Putting variable values in variable names is bad practice. Anyhow, here is a way to do this:
data have;
input id time1 time3 time4 time5;
datalines;
1 1 2 3 4
2 3 2 1 1
;
proc transpose data=have out=temp;
by id;
var time:;
run;
data regr;
set temp;
x = input(compress(_name_,,"kd"), best.);
drop _name_;
run;
proc reg data=regr outest=estm noprint;
by id;
model col1 = x;
run;
data want;
merge have estm(keep=id x rename=x=slope);
by id;
run;
proc print data=want noobs; run;
id time1 time3 time4 time5 slope 1 1 2 3 4 0.74286 2 3 2 1 1 -0.54286
You can convert the regression formula to be functions of the variables, for example in a linear regression part of the formula for slope involves a sum of squares, you could use the USS() function.
Or you can transpose the data so that each row is now a column of four numbers; and of course there needs to also be a column of the x values; and then run PROC REG with a BY statement.
I agree with everyone else that this isn't a good idea. That being said, it can be implemented.
data test;
input x1 x3 x4 x5 ;
datalines;
-0.069965723 0.492749371 0.955245597 1.346963522
;
run;
data slope;
set test;
array ys(4) x1 x3 x4 x5;
array vals(6) (1 3 4 5);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
num=sum(num, (vals(i)-xbar)*(ys(i)-ybar));
den=sum(den, (vals(i)-xbar)**2);
end;
slope = num/den;
run;
proc transpose data=test out=test2(rename=col1=y);
run;
data test2;
set test2;
x=_n_;
run;
proc reg data=test2;
model y=x;
run;
@asasha wrote:
Hi,
I want to do a linear regression on every row and attach the slope as a new column: I have four measurements with unequal intervals, and I need to see the trend. How can I do this in SAS? My desired columns are as follows:
time_1 time_3 time_4 time_5 calculated_slope
Thank you!
I point out that the PROC TRANSPOSE method will handle missing Y values properly. The method of writing your own formula for slope, that I and others have mentioned, will not handle missing Y values properly without additional attention.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.