Here are a few rows of data (actual dataset has over a million rows):
id c1 c2 c3
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
The objective is to compute the slope for each row, and place that slope computation into column 4.
Assume the X values are 1, 2, 3.
Y values are those given in each row.
The code below is what I have been trying, but something seems incorrect, as the resulting slope computations don't appear correct.
Please take a look at the code and tell me if you see any problem with it.
Thanks much!
data nicholas.n_slope_means__7;
set nicholas.n_slope_means__7;
array ys(3) _50501 _50502 _50503;
array vals(3) (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;
OK. Assuming you want that Beta parameter of PROC REG. The following is two kind of code , one is data step, another is IML. notice there is some difference. data have; input id $ c1 c2 c3; cards; a 0.87396 2.00827 2.81477 b 0.97002 2.00064 2.81468 c 0.68026 1.86006 2.81403 d 0.58230 1.84271 2.81402 e 0.73355 1.59606 2.81368 f 1.20452 2.07169 2.81365 g 0.91387 1.19560 2.81352 h 1.19418 2.10696 2.81306 i -0.10435 1.34213 2.81296 j -0.06286 1.50670 2.81225 k 1.73225 2.37420 2.81185 l 0.53130 1.87948 2.81164 m 0.86563 1.65322 2.81151 n 1.05598 2.40835 2.81065 o 0.15833 1.55971 2.81035 p 0.00912 0.61380 2.81033 ; run; proc transpose data=have out=temp; by id; var c:; run; data temp; set temp; by id; if first.id then x=0; x+1; drop _name_; run; proc reg data=temp outest=wantt noprint; by id; model col1=x ; /* model col1=x /noint; <-- Which is the same as IML code*/ quit; data have; input id $ c1 c2 c3; cards; a 0.87396 2.00827 2.81477 b 0.97002 2.00064 2.81468 c 0.68026 1.86006 2.81403 d 0.58230 1.84271 2.81402 e 0.73355 1.59606 2.81368 f 1.20452 2.07169 2.81365 g 0.91387 1.19560 2.81352 h 1.19418 2.10696 2.81306 i -0.10435 1.34213 2.81296 j -0.06286 1.50670 2.81225 k 1.73225 2.37420 2.81185 l 0.53130 1.87948 2.81164 m 0.86563 1.65322 2.81151 n 1.05598 2.40835 2.81065 o 0.15833 1.55971 2.81035 p 0.00912 0.61380 2.81033 ; run; proc iml; use have; read all var _num_ into y[r=id c=vnames]; close; x={1 ,2 ,3}; beta=j(nrow(y),1,.); do i=1 to nrow(y); beta[i]=solve(x`*x,x`*y[i,]`); end; want=y||beta; create want from want[r=id c=(vnames||'Beta')]; append from want[r=id]; close; quit;
OK. You have three variables not two , So can you explain how to get row slope ? What is your logic to get that slope (you mean PROC REG to get that parameter(slope)?). data nicholas.n_slope_means__7; set nicholas.n_slope_means__7; array ys(3) _50501 _50502 _50503; array vals(3) (1 2 3); xbar = mean(of vals(*)); ybar = mean(of ys(*)); do i=1 to dim(vals); s_xy=sum(s_xy, i*ys(i)); s_y=sum(s_y, ys(i)); s_x2=sum(s_x2, vals(i)**2); num=(vals(i)-xbar)*(ys(i)-ybar); den=(vals(i)-xbar)**2; num_tot=sum(num, num_tot); den_tot=sum(den, den_tot); end; s_x=sum(of vals(*)); n1=dim(vals); slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2); Slope_5050x_3 = num_tot/den_tot; run;
OK. Assuming you want that Beta parameter of PROC REG. The following is two kind of code , one is data step, another is IML. notice there is some difference. data have; input id $ c1 c2 c3; cards; a 0.87396 2.00827 2.81477 b 0.97002 2.00064 2.81468 c 0.68026 1.86006 2.81403 d 0.58230 1.84271 2.81402 e 0.73355 1.59606 2.81368 f 1.20452 2.07169 2.81365 g 0.91387 1.19560 2.81352 h 1.19418 2.10696 2.81306 i -0.10435 1.34213 2.81296 j -0.06286 1.50670 2.81225 k 1.73225 2.37420 2.81185 l 0.53130 1.87948 2.81164 m 0.86563 1.65322 2.81151 n 1.05598 2.40835 2.81065 o 0.15833 1.55971 2.81035 p 0.00912 0.61380 2.81033 ; run; proc transpose data=have out=temp; by id; var c:; run; data temp; set temp; by id; if first.id then x=0; x+1; drop _name_; run; proc reg data=temp outest=wantt noprint; by id; model col1=x ; /* model col1=x /noint; <-- Which is the same as IML code*/ quit; data have; input id $ c1 c2 c3; cards; a 0.87396 2.00827 2.81477 b 0.97002 2.00064 2.81468 c 0.68026 1.86006 2.81403 d 0.58230 1.84271 2.81402 e 0.73355 1.59606 2.81368 f 1.20452 2.07169 2.81365 g 0.91387 1.19560 2.81352 h 1.19418 2.10696 2.81306 i -0.10435 1.34213 2.81296 j -0.06286 1.50670 2.81225 k 1.73225 2.37420 2.81185 l 0.53130 1.87948 2.81164 m 0.86563 1.65322 2.81151 n 1.05598 2.40835 2.81065 o 0.15833 1.55971 2.81035 p 0.00912 0.61380 2.81033 ; run; proc iml; use have; read all var _num_ into y[r=id c=vnames]; close; x={1 ,2 ,3}; beta=j(nrow(y),1,.); do i=1 to nrow(y); beta[i]=solve(x`*x,x`*y[i,]`); end; want=y||beta; create want from want[r=id c=(vnames||'Beta')]; append from want[r=id]; close; quit;
Been having some computer problems. Thanks very much for your continued assistance with this slope matter. I'll try and use your code.
I see no problem with your code.
I tried it, and it works. The 2 types of slope calculations give the same results.
You probably have missing values in the _505xx variables. That can cause differences.
data have;
input _50501 _50502 _50503;
datalines;
3 4 5
7 7 7
3 2 3
;
run;
data want;
set have;
array ys(3) _50501 _50502 _50503;
array vals(3) _temporary_ (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.