Hello, I am using IML (inside of a macro) which begins by reading the contents of a work file containing five variables into a matrix (XY). I then create new variables, which I eventually wish to merge (append) as columns (vectors) to the original work file. With the CREATE statement, I am able to write the new variables to a file. However, this overwrites the contents of the original four variables (i.e., now have missing values).. Omitting the new variables from the CREATE statement allows the original variables to retain the proper values. Other posts in this forum suggest merging the original dataset with a dataset containing the new variables. The problem here is creating a BY variable for the new dataset. Guess the question how do I create a BY variable that maintains its integrity? Code is below, with a brief amount of sample data. Thank you for any suggestions. Rick
data untrimmed;
input group lag2cvrank lagcfo_ts cfo_ts lag2cfo_ts;
cards;
1991 0 155 175 165
1991 0 200 225 250
1991 0 75 125 135
1991 0 350 375 400
;
%MACRO combination;
proc iml;
use work.untrimmed where (group = &I & lag2cvrank = &J);
read all var {lag2cfo_ts lagcfo_ts group lag2cvrank} into XY;
close work.untrimmed;
c = allcomb(nrow(XY), 2); /* all "N choose 2" combinations of pairs */
group = XY[1,3]; /* extract group from XY, to be used as a BY variable later */
lag2cvrank = XY[1,4]; /* extract lag2cvrank from XY, to be used as a BY variable later */
Pt1 = XY[c[,1],]; /* extract first point of line segments */
Pt2 = XY[c[,2],]; /* extract second point of line segments */
slope = (Pt1[,2] - Pt2[,2]) / (Pt1[,1] - Pt2[,1]); /* (y1 - y2)/(x1 - x2) */
m = median(slope); /* scalar */
b = median( XY[,2] - m*XY[,1] ); /* median(y-mx) */ /* Scalar */
mxy = m*XY[,2]; /* N x 1 Vector */
t = nrow(XY); /* number of rows in XY */
int = J(t, 1, b); /* nrow x 1 column vector */
p = int + mxy; /* nrow x 1 column vector */
print (b||m)[c={'Intercept' 'Slope'} L="Method=Theil Combs=All"];
create ts_1_&I&J var {int mxy p group lag2cvrank}; /** create data set **/
append; /** write new variables in vectors for each combo of group and lag2cvrank **/
close ts_1_&I&J; /** close the data set **/
quit;
/*
proc sql;
create table perm1.ts_1_*I&J as
select * from
untrimmed as a inner join ts_1&I&J as b on
*/
%MEND combination;
%MACRO LOOP;
%DO I=1991 %TO 1991;
%DO J=0 %TO 0;
%combination;
%END;
%END;
%MEND LOOP;
%LOOP;
run;
proc print data=ts_1_19910;run;
Thank you. I did not know the IML forum existed. My bad. Thank you again. Rick
Why not post it at IML forum, and calling @Rick_SAS
Thank you. I did not know the IML forum existed. My bad. Thank you again. Rick
I'm confused. I was going to move this thread to the SAS/IML Community, but you have marked it as Solved. Please let me know if you want the question moved and if you still need an answer (if so, mark as unsolved).
I apologize, just trying to stay out of the way. I will re-post in the IML forum. Thank you for you offer. And I apologize for the inconvenience.
It's no problem. I moved the thread to the IML forum.
I would recommend BY-group processing by using the WHERE clause on the READ statement, as shown in this blog post:
In addition, you can use the "SETIN" and "SETOUT" statements to have two data sets open simultaneously and read and write to them in a loop. Refer to this blog post for a review of the available options:
Using the above techniques, you can remove the macro and the macro loop and instead use a pure IML loop over the number of BY-groups, as determined by PROC FREQ.
I have rewritten your program and modified your dataset to cover a more general case:
data untrimmed;
input group lag2cvrank lagcfo_ts cfo_ts lag2cfo_ts;
cards;
1991 0 155 175 165
1991 0 200 225 250
1991 0 75 125 135
1991 0 350 375 400
1991 1 155 175 165
1991 1 200 225 250
1991 1 75 125 135
1992 0 155 175 165
1992 0 200 225 250
1992 0 75 125 135
1992 0 350 375 400
1992 2 155 175 165
1992 2 200 225 250
1992 2 75 125 135
;
/* find unique BY-group combinations */
proc freq data=untrimmed;
tables group*lag2cvrank / out=FreqOut;
run;
proc iml;
/* Suggest using the Theil-Sen function
https://blogs.sas.com/content/iml/2019/05/28/theil-sen-robust-regression.html
because it handles the case of infinite slope.*/
start compute_slopes(XY);
c = allcomb(nrow(XY), 2); /* all "N choose 2" combinations of pairs */
group = XY[1,3]; /* extract group from XY, to be used as a BY variable later */
lag2cvrank = XY[1,4]; /* extract lag2cvrank from XY, to be used as a BY variable later */
Pt1 = XY[c[,1],]; /* extract first point of line segments */
Pt2 = XY[c[,2],]; /* extract second point of line segments */
slope = (Pt1[,2] - Pt2[,2]) / (Pt1[,1] - Pt2[,1]); /* (y1 - y2)/(x1 - x2) */
m = median(slope); /* scalar */
b = median( XY[,2] - m*XY[,1] ); /* median(y-mx) */ /* Scalar */
mxy = m*XY[,2]; /* N x 1 Vector */
t = nrow(XY); /* number of rows in XY */
int = J(t, 1, b); /* nrow x 1 column vector */
p = int + mxy; /* nrow x 1 column vector */
print (b||m)[c={'Intercept' 'Slope'} L="Method=Theil Combs=All"];
group_col = J(t, 1, group);
lag2cvrank_col = J(t, 1, lag2cvrank);
return (int || mxy || p || group_col || lag2cvrank_col);
finish;
/* read the BY groups */
use FreqOut nobs NumGroups;
read all var {group lag2cvrank};
close FreqOut;
print NumGroups, group lag2cvrank;
use work.untrimmed;
create ts_1 var {int mxy p group_col lag2cvrank_col};
setin work.untrimmed;
setout ts_1;
inVarNames = {"lag2cfo_ts" "lagcfo_ts" "group" "lag2cvrank"};
do i = 1 to NumGroups; /* for each BY group */
read all var inVarNames into XY
where(group=(group[i]) & lag2cvrank=(lag2cvrank[i]));
/* X contains data for i_th group; analyze it */
G = compute_slopes(XY);
/* extract the columns of the matrix */
int=G[,1]; mxy=G[,2]; p=G[,3]; group_col=G[,4]; lag2cvrank_col=G[,5];
append;
end;
close work.untrimmed;
close ts_1;
Thank you so much for your effort! I really appreciate it! Trying to pick up a basic knowledge of IML, so this really helpful!
Hi ardegh, two quick questions (BTW, your IML code works very well!) ... 1) the DO LOOP refers to NumGroups, which first appears in the USE statement; however, I'm not seeing where NumGroups obtains the values; logically PROC FREQ is the source, but I'm not seeing any assignment between PROC FREQ (or the related output dataset and NumGroups; what am I missing? and 2) the goal is to create a sas dataset with the results, which appear in the dataset ts_1; however, this dataset contains 16 obs but should only be 14 (from the original input data); the difference is two missing obs, which occur at the 8th and 16th observations; something about the loop is looking for data where no data exists maybe; so what is the source of the missing obs? Thank you for any ideas or suggestions! Rick
Hi @rfrancis, the following are answers to your questions:
1- This line use FreqOut nobs NumGroups;
means:
open the FreqOut
dataset and store its number of observations in an IML variable named NumGroups
. So NumGroups
is assigned on this line.
2- Please change the following line in your program from:
create ts_1 var {int mxy p group lag2cvrank};
to:
create ts_1 var {int mxy p group_col lag2cvrank_col};
This change should eliminate those missing values. I have corrected my answer earlier in this thread to reflect this change.
Wow! So easy ... thank you for what you do on this site! I am grateful! Rick
You are very welcome! Thanks for the follow-up questions.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.