DATA Step, Macro, Functions and more

Once again, row slope revisited

Accepted Solution Solved
Reply
Regular Contributor
Posts: 223
Accepted Solution

Once again, row slope revisited

[ Edited ]

Here are a few rows of data (actual dataset has over a million rows):

 

 

id c1 c2 c3
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033

The objective is to compute the slope for each row, and place that slope computation into column 4.

 

Assume the X values are 1, 2, 3.

 

Y values are those given in each row.

 

The code below is what I have been trying, but something seems incorrect, as the resulting slope computations don't appear correct.

 

Please take a look at the code and tell me if you see any problem with it.

 

Thanks much!

 

 

data nicholas.n_slope_means__7;
set nicholas.n_slope_means__7;
array ys(3) _50501 _50502 _50503;
array vals(3) (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;

 


Accepted Solutions
Solution
‎08-14-2016 06:19 AM
Super User
Posts: 10,020

Re: Once again, row slope revisited

Posted in reply to NicholasKormanik
OK. Assuming you want that Beta parameter of PROC REG. 
The following is two kind of code , one is data step, another is IML. notice there is some difference.


data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc transpose data=have out=temp;
by id;
var c:;
run;
data temp;
 set temp;
 by id;
 if first.id then x=0;
 x+1;
 drop _name_;
run;
proc reg data=temp outest=wantt noprint;
 by id;
 model col1=x ; 
/* model col1=x /noint;  <-- Which  is the same as IML code*/
quit;

 








data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc iml;
use have;
read all var _num_ into y[r=id c=vnames];
close;
x={1 ,2 ,3};
beta=j(nrow(y),1,.);
do i=1 to nrow(y);
 beta[i]=solve(x`*x,x`*y[i,]`);
end;
want=y||beta;
create want from want[r=id c=(vnames||'Beta')];
append from want[r=id];
close;
quit;

 



View solution in original post


All Replies
Super User
Posts: 10,020

Re: Once again, row slope revisited

Posted in reply to NicholasKormanik
OK. You have three variables not two , So can you explain how to get row slope ?
What is your logic to get that slope (you mean PROC REG  to get that parameter(slope)?).


data nicholas.n_slope_means__7;
set nicholas.n_slope_means__7;
array ys(3) _50501 _50502 _50503;
array vals(3) (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;
 


Solution
‎08-14-2016 06:19 AM
Super User
Posts: 10,020

Re: Once again, row slope revisited

Posted in reply to NicholasKormanik
OK. Assuming you want that Beta parameter of PROC REG. 
The following is two kind of code , one is data step, another is IML. notice there is some difference.


data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc transpose data=have out=temp;
by id;
var c:;
run;
data temp;
 set temp;
 by id;
 if first.id then x=0;
 x+1;
 drop _name_;
run;
proc reg data=temp outest=wantt noprint;
 by id;
 model col1=x ; 
/* model col1=x /noint;  <-- Which  is the same as IML code*/
quit;

 








data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc iml;
use have;
read all var _num_ into y[r=id c=vnames];
close;
x={1 ,2 ,3};
beta=j(nrow(y),1,.);
do i=1 to nrow(y);
 beta[i]=solve(x`*x,x`*y[i,]`);
end;
want=y||beta;
create want from want[r=id c=(vnames||'Beta')];
append from want[r=id];
close;
quit;

 



Regular Contributor
Posts: 223

Re: Once again, row slope revisited

Been having some computer problems.  Thanks very much for your continued assistance with this slope matter.  I'll try and use your code.

 

 

SAS Employee
Posts: 340

Re: Once again, row slope revisited

Posted in reply to NicholasKormanik

I see no problem with your code.

I tried it, and it works. The 2 types of slope calculations give the same results.

You probably have missing values in the _505xx variables. That can cause differences.

 

 

data have;
input _50501 _50502 _50503;
datalines;
3 4 5
7 7 7
3 2 3
;
run;
data want;
set have;
array ys(3) _50501 _50502 _50503;
array vals(3) _temporary_ (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 366 views
  • 1 like
  • 3 in conversation