Solved: Once again, row slope revisited

NKormanik · Posted 08-11-2016 03:04 AM

Here are a few rows of data (actual dataset has over a million rows):

id c1 c2 c3
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033

The objective is to compute the slope for each row, and place that slope computation into column 4.

Assume the X values are 1, 2, 3.

Y values are those given in each row.

The code below is what I have been trying, but something seems incorrect, as the resulting slope computations don't appear correct.

Please take a look at the code and tell me if you see any problem with it.

Thanks much!

data nicholas.n_slope_means__7;
set nicholas.n_slope_means__7;
array ys(3) _50501 _50502 _50503;
array vals(3) (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;

Ksharp · Posted 08-11-2016 04:43 AM

OK. Assuming you want that Beta parameter of PROC REG. 
The following is two kind of code , one is data step, another is IML. notice there is some difference.


data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc transpose data=have out=temp;
by id;
var c:;
run;
data temp;
 set temp;
 by id;
 if first.id then x=0;
 x+1;
 drop _name_;
run;
proc reg data=temp outest=wantt noprint;
 by id;
 model col1=x ; 
/* model col1=x /noint;  <-- Which  is the same as IML code*/
quit;

 








data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc iml;
use have;
read all var _num_ into y[r=id c=vnames];
close;
x={1 ,2 ,3};
beta=j(nrow(y),1,.);
do i=1 to nrow(y);
 beta[i]=solve(x`*x,x`*y[i,]`);
end;
want=y||beta;
create want from want[r=id c=(vnames||'Beta')];
append from want[r=id];
close;
quit;

View solution in original post

Ksharp · Posted 08-11-2016 04:12 AM

OK. You have three variables not two , So can you explain how to get row slope ?
What is your logic to get that slope (you mean PROC REG  to get that parameter(slope)?).


data nicholas.n_slope_means__7;
set nicholas.n_slope_means__7;
array ys(3) _50501 _50502 _50503;
array vals(3) (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;

Ksharp · Posted 08-11-2016 04:43 AM

OK. Assuming you want that Beta parameter of PROC REG. 
The following is two kind of code , one is data step, another is IML. notice there is some difference.


data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc transpose data=have out=temp;
by id;
var c:;
run;
data temp;
 set temp;
 by id;
 if first.id then x=0;
 x+1;
 drop _name_;
run;
proc reg data=temp outest=wantt noprint;
 by id;
 model col1=x ; 
/* model col1=x /noint;  <-- Which  is the same as IML code*/
quit;

 








data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc iml;
use have;
read all var _num_ into y[r=id c=vnames];
close;
x={1 ,2 ,3};
beta=j(nrow(y),1,.);
do i=1 to nrow(y);
 beta[i]=solve(x`*x,x`*y[i,]`);
end;
want=y||beta;
create want from want[r=id c=(vnames||'Beta')];
append from want[r=id];
close;
quit;

NKormanik · Posted 08-14-2016 06:21 AM

Been having some computer problems. Thanks very much for your continued assistance with this slope matter. I'll try and use your code.

gergely_batho · Posted 08-11-2016 05:24 AM

I see no problem with your code.

I tried it, and it works. The 2 types of slope calculations give the same results.

You probably have missing values in the _505xx variables. That can cause differences.

data have;
input _50501 _50502 _50503;
datalines;
3 4 5
7 7 7
3 2 3
;
run;
data want;
set have;
array ys(3) _50501 _50502 _50503;
array vals(3) _temporary_ (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;

Once again, row slope revisited

Re: Once again, row slope revisited

Re: Once again, row slope revisited

Re: Once again, row slope revisited

Re: Once again, row slope revisited

Re: Once again, row slope revisited

Once again, row slope revisited

Re: Once again, row slope revisited

Re: Once again, row slope revisited

Re: Once again, row slope revisited

Re: Once again, row slope revisited

Re: Once again, row slope revisited

Click image to register for webinar

Classroom Training Available!