Here are a few rows of data (actual dataset has over a million rows):
id c1 c2 c3 a 0.87396 2.00827 2.81477 b 0.97002 2.00064 2.81468 c 0.68026 1.86006 2.81403 d 0.58230 1.84271 2.81402 e 0.73355 1.59606 2.81368 f 1.20452 2.07169 2.81365 g 0.91387 1.19560 2.81352 h 1.19418 2.10696 2.81306 i -0.10435 1.34213 2.81296 j -0.06286 1.50670 2.81225 k 1.73225 2.37420 2.81185 l 0.53130 1.87948 2.81164 m 0.86563 1.65322 2.81151 n 1.05598 2.40835 2.81065 o 0.15833 1.55971 2.81035 p 0.00912 0.61380 2.81033
The objective is to compute the slope for each row, and place that slope computation into column 4.
Assume the X values are 1, 2, 3.
Y values are those given in each row.
The code below is what I have been trying, but something seems incorrect, as the resulting slope computations don't appear correct.
Please take a look at the code and tell me if you see any problem with it.
Thanks much!
data nicholas.n_slope_means__7;
set nicholas.n_slope_means__7;
array ys(3) _50501 _50502 _50503;
array vals(3) (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;
... View more