Here are a few rows of data (actual dataset has over a million rows):
id c1 c2 c3
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
The objective is to compute the slope for each row, and place that slope computation into column 4.
Assume the X values are 1, 2, 3.
Y values are those given in each row.
The code below is what I have been trying, but something seems incorrect, as the resulting slope computations don't appear correct.
Please take a look at the code and tell me if you see any problem with it.
Thanks much!
data nicholas.n_slope_means__7;
set nicholas.n_slope_means__7;
array ys(3) _50501 _50502 _50503;
array vals(3) (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;
OK. Assuming you want that Beta parameter of PROC REG. The following is two kind of code , one is data step, another is IML. notice there is some difference. data have; input id $ c1 c2 c3; cards; a 0.87396 2.00827 2.81477 b 0.97002 2.00064 2.81468 c 0.68026 1.86006 2.81403 d 0.58230 1.84271 2.81402 e 0.73355 1.59606 2.81368 f 1.20452 2.07169 2.81365 g 0.91387 1.19560 2.81352 h 1.19418 2.10696 2.81306 i -0.10435 1.34213 2.81296 j -0.06286 1.50670 2.81225 k 1.73225 2.37420 2.81185 l 0.53130 1.87948 2.81164 m 0.86563 1.65322 2.81151 n 1.05598 2.40835 2.81065 o 0.15833 1.55971 2.81035 p 0.00912 0.61380 2.81033 ; run; proc transpose data=have out=temp; by id; var c:; run; data temp; set temp; by id; if first.id then x=0; x+1; drop _name_; run; proc reg data=temp outest=wantt noprint; by id; model col1=x ; /* model col1=x /noint; <-- Which is the same as IML code*/ quit; data have; input id $ c1 c2 c3; cards; a 0.87396 2.00827 2.81477 b 0.97002 2.00064 2.81468 c 0.68026 1.86006 2.81403 d 0.58230 1.84271 2.81402 e 0.73355 1.59606 2.81368 f 1.20452 2.07169 2.81365 g 0.91387 1.19560 2.81352 h 1.19418 2.10696 2.81306 i -0.10435 1.34213 2.81296 j -0.06286 1.50670 2.81225 k 1.73225 2.37420 2.81185 l 0.53130 1.87948 2.81164 m 0.86563 1.65322 2.81151 n 1.05598 2.40835 2.81065 o 0.15833 1.55971 2.81035 p 0.00912 0.61380 2.81033 ; run; proc iml; use have; read all var _num_ into y[r=id c=vnames]; close; x={1 ,2 ,3}; beta=j(nrow(y),1,.); do i=1 to nrow(y); beta[i]=solve(x`*x,x`*y[i,]`); end; want=y||beta; create want from want[r=id c=(vnames||'Beta')]; append from want[r=id]; close; quit;
OK. You have three variables not two , So can you explain how to get row slope ? What is your logic to get that slope (you mean PROC REG to get that parameter(slope)?). data nicholas.n_slope_means__7; set nicholas.n_slope_means__7; array ys(3) _50501 _50502 _50503; array vals(3) (1 2 3); xbar = mean(of vals(*)); ybar = mean(of ys(*)); do i=1 to dim(vals); s_xy=sum(s_xy, i*ys(i)); s_y=sum(s_y, ys(i)); s_x2=sum(s_x2, vals(i)**2); num=(vals(i)-xbar)*(ys(i)-ybar); den=(vals(i)-xbar)**2; num_tot=sum(num, num_tot); den_tot=sum(den, den_tot); end; s_x=sum(of vals(*)); n1=dim(vals); slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2); Slope_5050x_3 = num_tot/den_tot; run;
OK. Assuming you want that Beta parameter of PROC REG. The following is two kind of code , one is data step, another is IML. notice there is some difference. data have; input id $ c1 c2 c3; cards; a 0.87396 2.00827 2.81477 b 0.97002 2.00064 2.81468 c 0.68026 1.86006 2.81403 d 0.58230 1.84271 2.81402 e 0.73355 1.59606 2.81368 f 1.20452 2.07169 2.81365 g 0.91387 1.19560 2.81352 h 1.19418 2.10696 2.81306 i -0.10435 1.34213 2.81296 j -0.06286 1.50670 2.81225 k 1.73225 2.37420 2.81185 l 0.53130 1.87948 2.81164 m 0.86563 1.65322 2.81151 n 1.05598 2.40835 2.81065 o 0.15833 1.55971 2.81035 p 0.00912 0.61380 2.81033 ; run; proc transpose data=have out=temp; by id; var c:; run; data temp; set temp; by id; if first.id then x=0; x+1; drop _name_; run; proc reg data=temp outest=wantt noprint; by id; model col1=x ; /* model col1=x /noint; <-- Which is the same as IML code*/ quit; data have; input id $ c1 c2 c3; cards; a 0.87396 2.00827 2.81477 b 0.97002 2.00064 2.81468 c 0.68026 1.86006 2.81403 d 0.58230 1.84271 2.81402 e 0.73355 1.59606 2.81368 f 1.20452 2.07169 2.81365 g 0.91387 1.19560 2.81352 h 1.19418 2.10696 2.81306 i -0.10435 1.34213 2.81296 j -0.06286 1.50670 2.81225 k 1.73225 2.37420 2.81185 l 0.53130 1.87948 2.81164 m 0.86563 1.65322 2.81151 n 1.05598 2.40835 2.81065 o 0.15833 1.55971 2.81035 p 0.00912 0.61380 2.81033 ; run; proc iml; use have; read all var _num_ into y[r=id c=vnames]; close; x={1 ,2 ,3}; beta=j(nrow(y),1,.); do i=1 to nrow(y); beta[i]=solve(x`*x,x`*y[i,]`); end; want=y||beta; create want from want[r=id c=(vnames||'Beta')]; append from want[r=id]; close; quit;
Been having some computer problems. Thanks very much for your continued assistance with this slope matter. I'll try and use your code.
I see no problem with your code.
I tried it, and it works. The 2 types of slope calculations give the same results.
You probably have missing values in the _505xx variables. That can cause differences.
data have;
input _50501 _50502 _50503;
datalines;
3 4 5
7 7 7
3 2 3
;
run;
data want;
set have;
array ys(3) _50501 _50502 _50503;
array vals(3) _temporary_ (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.