BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Here are a few rows of data (actual dataset has over a million rows):

 

 

id c1 c2 c3
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033

The objective is to compute the slope for each row, and place that slope computation into column 4.

 

Assume the X values are 1, 2, 3.

 

Y values are those given in each row.

 

The code below is what I have been trying, but something seems incorrect, as the resulting slope computations don't appear correct.

 

Please take a look at the code and tell me if you see any problem with it.

 

Thanks much!

 

 

data nicholas.n_slope_means__7;
set nicholas.n_slope_means__7;
array ys(3) _50501 _50502 _50503;
array vals(3) (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
OK. Assuming you want that Beta parameter of PROC REG. 
The following is two kind of code , one is data step, another is IML. notice there is some difference.


data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc transpose data=have out=temp;
by id;
var c:;
run;
data temp;
 set temp;
 by id;
 if first.id then x=0;
 x+1;
 drop _name_;
run;
proc reg data=temp outest=wantt noprint;
 by id;
 model col1=x ; 
/* model col1=x /noint;  <-- Which  is the same as IML code*/
quit;

 








data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc iml;
use have;
read all var _num_ into y[r=id c=vnames];
close;
x={1 ,2 ,3};
beta=j(nrow(y),1,.);
do i=1 to nrow(y);
 beta[i]=solve(x`*x,x`*y[i,]`);
end;
want=y||beta;
create want from want[r=id c=(vnames||'Beta')];
append from want[r=id];
close;
quit;

 



View solution in original post

4 REPLIES 4
Ksharp
Super User
OK. You have three variables not two , So can you explain how to get row slope ?
What is your logic to get that slope (you mean PROC REG  to get that parameter(slope)?).


data nicholas.n_slope_means__7;
set nicholas.n_slope_means__7;
array ys(3) _50501 _50502 _50503;
array vals(3) (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;
 


Ksharp
Super User
OK. Assuming you want that Beta parameter of PROC REG. 
The following is two kind of code , one is data step, another is IML. notice there is some difference.


data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc transpose data=have out=temp;
by id;
var c:;
run;
data temp;
 set temp;
 by id;
 if first.id then x=0;
 x+1;
 drop _name_;
run;
proc reg data=temp outest=wantt noprint;
 by id;
 model col1=x ; 
/* model col1=x /noint;  <-- Which  is the same as IML code*/
quit;

 








data have;
input id $ c1 c2 c3;
cards;
a 0.87396 2.00827 2.81477
b 0.97002 2.00064 2.81468
c 0.68026 1.86006 2.81403
d 0.58230 1.84271 2.81402
e 0.73355 1.59606 2.81368
f 1.20452 2.07169 2.81365
g 0.91387 1.19560 2.81352
h 1.19418 2.10696 2.81306
i -0.10435 1.34213 2.81296
j -0.06286 1.50670 2.81225
k 1.73225 2.37420 2.81185
l 0.53130 1.87948 2.81164
m 0.86563 1.65322 2.81151
n 1.05598 2.40835 2.81065
o 0.15833 1.55971 2.81035
p 0.00912 0.61380 2.81033
;
run;
proc iml;
use have;
read all var _num_ into y[r=id c=vnames];
close;
x={1 ,2 ,3};
beta=j(nrow(y),1,.);
do i=1 to nrow(y);
 beta[i]=solve(x`*x,x`*y[i,]`);
end;
want=y||beta;
create want from want[r=id c=(vnames||'Beta')];
append from want[r=id];
close;
quit;

 



NKormanik
Barite | Level 11

Been having some computer problems.  Thanks very much for your continued assistance with this slope matter.  I'll try and use your code.

 

 

gergely_batho
SAS Employee

I see no problem with your code.

I tried it, and it works. The 2 types of slope calculations give the same results.

You probably have missing values in the _505xx variables. That can cause differences.

 

 

data have;
input _50501 _50502 _50503;
datalines;
3 4 5
7 7 7
3 2 3
;
run;
data want;
set have;
array ys(3) _50501 _50502 _50503;
array vals(3) _temporary_ (1 2 3);
xbar = mean(of vals(*));
ybar = mean(of ys(*));
do i=1 to dim(vals);
s_xy=sum(s_xy, i*ys(i));
s_y=sum(s_y, ys(i));
s_x2=sum(s_x2, vals(i)**2);
num=(vals(i)-xbar)*(ys(i)-ybar);
den=(vals(i)-xbar)**2;
num_tot=sum(num, num_tot);
den_tot=sum(den, den_tot);
end;
s_x=sum(of vals(*));
n1=dim(vals);
slope2 = (n1*s_xy - s_x*s_y)/(n1*s_x2 - s_x**2);
Slope_5050x_3 = num_tot/den_tot;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1226 views
  • 1 like
  • 3 in conversation