Hello, I am working to obtain OLS parameter estimates (with no intercept) for each possible pair of observations within subgroups. Let's say the year 1991 represents a group, then let's say there are two subgroups within 1991; the first subgroup contains 3 observations, which produces 3 possible pairs or combination (1-2, 2-3 and 1-3); and the second subgroup contains 4 observations, which produces 6 possible pairs or combinations (1-2, 1-3, 1-4, 2-3, 2-4 and 3-4). The goal is to generate OLS parameter estimates for a model such as Y = B1 + B2 + e (i.e., two indep vars and no intercept). If this sounds like the Theil-Sen method, you are spot on. However, note that this is the "multivariate version", which requires more than computing the slope for y = mx + b. A fish much larger than I from the academic world has determined that OLS with no intercept and two observations will generate Theil-Sen slopes for B1 and B2 mentioned earlier (a modified r-square is necessary for OLS with no intercept, and will typically be between 99.5% and 100% when using the big fish methodology).. I have the "data step" SAS code from the big fish, but having witnessed the power and efficiency of IML, I set out to improve the lengthy processing time of the "data step" code. The IML code I developed below uses the OLS code from one of Rick Wicklin's blog, along with a loop developed by a SAS Community member. It works at the group level, but I lack the IML knowledge to make it work at the subgroup level. I believe the key is to insert a DO loop within the existing DO loop to capture the dynamics of the subgroups. This will require reconstructing a new X matrix for each possible pair within a subgroup in order to compute the two slope values. If interested or helpful, I can supply the code from the big fish. Thank you for any ideas or suggestions! Rick
data untrimmed;
input group lag2cvrank lagcfo_ts cfo_ts lag2cfo_ts lag2acr_ts;
cards;
1991 0 155 175 165 35
1991 0 200 225 250 75
1991 0 75 125 135 65
1991 1 350 375 400 55
1991 1 155 175 165 85
1991 1 200 225 250 100
1991 1 75 125 135 125
;
run;
/* find unique BY-group combinations */
proc freq data=untrimmed;
tables group*lag2cvrank / out=FreqOut;
run;
proc iml;
start regress(XY);
*c = allcomb(nrow(XY),2); /* all "N choose 2" combinations of pairs */
*c_rows=nrow(c);
group = XY[1,4]; /* extract group from XY, to be used as a BY variable later */
lag2cvrank = XY[1,5]; /* extract lag2cvrank from XY, to be used as a BY variable later */
/* Extract x from XY */
X = XY[c[i],{1 2}]; /* extract X from XY */ /* extract pairs, i.e, each combo in C */
/* Extract y from XY */
Y = XY[c[,],3]; /* extract Y from XY */
xpx = x`*x; /*cross-products*/
xpy = x`*y;
/*solve linear system*/
/*Solution 1: compute inverse with INV (inefficient)*/
xpxi = inv(xpx); /*form inverse crossproducts*/
b = xpxi*xpy; /*solve for parameter estimates*/
* Or a better solution ***;
/*Solution 2: compute solution with SOLVE. More efficient*/
b = (solve(xpx, xpy))`; /* solve for parameter estimates*/
t = nrow(XY); /* number of rows in XY */
group_col = group;
lag2cvrank_col = lag2cvrank;
return (b || group_col || lag2cvrank_col);
end;
finish;
/* read the BY groups */
use FreqOut nobs NumGroups;
read all var {group lag2cvrank};
close FreqOut;
use work.untrimmed;
create ts_fcst1_IML var {m m1 group_col lag2cvrank_col};
setin work.untrimmed;
setout ts_fcst1_IML;
inVarNames = {"lag2cfo_ts" "lag2acr_ts" "lagcfo_ts" "group" "lag2cvrank"};
do i = 1 to NumGroups; /* for each BY group */
read all var inVarNames into XY
where(group=(group[i]) & lag2cvrank=(lag2cvrank[i]));
print xy;
/* X contains data for i_th group; analyze it */
c = allcomb(nrow(XY),2); /* all "N choose 2" combinations of pairs */
c_rows=nrow(c);
do i = 1 to c_rows; /* this is my feeble attempt at the new DO loop */
G = regress(XY);
/* extract the columns of the matrix */
m=G[,1]; m1=G[,2]; group_col=G[,3]; lag2cvrank_col=G[,4];
append;
end;
end;
close work.untrimmed;
close ts_fcst1_IML;
quit;
data ts_fcst1_IML;
set ts_fcst1_IML;
proc print;run;
... View more