Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-07-2020 11:15 AM
(824 views)

Hello, I am using IML (inside of a macro) which begins by reading the contents of a work file containing five variables into a matrix (XY). I then create new variables, which I eventually wish to merge (append) as columns (vectors) to the original work file. With the CREATE statement, I am able to write the new variables to a file. However, this overwrites the contents of the original four variables (i.e., now have missing values).. Omitting the new variables from the CREATE statement allows the original variables to retain the proper values. Other posts in this forum suggest merging the original dataset with a dataset containing the new variables. The problem here is creating a BY variable for the new dataset. Guess the question how do I create a BY variable that maintains its integrity? Code is below, with a brief amount of sample data. Thank you for any suggestions. Rick

```
data untrimmed;
input group lag2cvrank lagcfo_ts cfo_ts lag2cfo_ts;
cards;
1991 0 155 175 165
1991 0 200 225 250
1991 0 75 125 135
1991 0 350 375 400
;
%MACRO combination;
proc iml;
use work.untrimmed where (group = &I & lag2cvrank = &J);
read all var {lag2cfo_ts lagcfo_ts group lag2cvrank} into XY;
close work.untrimmed;
c = allcomb(nrow(XY), 2); /* all "N choose 2" combinations of pairs */
group = XY[1,3]; /* extract group from XY, to be used as a BY variable later */
lag2cvrank = XY[1,4]; /* extract lag2cvrank from XY, to be used as a BY variable later */
Pt1 = XY[c[,1],]; /* extract first point of line segments */
Pt2 = XY[c[,2],]; /* extract second point of line segments */
slope = (Pt1[,2] - Pt2[,2]) / (Pt1[,1] - Pt2[,1]); /* (y1 - y2)/(x1 - x2) */
m = median(slope); /* scalar */
b = median( XY[,2] - m*XY[,1] ); /* median(y-mx) */ /* Scalar */
mxy = m*XY[,2]; /* N x 1 Vector */
t = nrow(XY); /* number of rows in XY */
int = J(t, 1, b); /* nrow x 1 column vector */
p = int + mxy; /* nrow x 1 column vector */
print (b||m)[c={'Intercept' 'Slope'} L="Method=Theil Combs=All"];
create ts_1_&I&J var {int mxy p group lag2cvrank}; /** create data set **/
append; /** write new variables in vectors for each combo of group and lag2cvrank **/
close ts_1_&I&J; /** close the data set **/
quit;
/*
proc sql;
create table perm1.ts_1_*I&J as
select * from
untrimmed as a inner join ts_1&I&J as b on
*/
%MEND combination;
%MACRO LOOP;
%DO I=1991 %TO 1991;
%DO J=0 %TO 0;
%combination;
%END;
%END;
%MEND LOOP;
%LOOP;
run;
proc print data=ts_1_19910;run;
```

```
```

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you. I did not know the IML forum existed. My bad. Thank you again. Rick

12 REPLIES 12

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Why not post it at IML forum, and calling @Rick_SAS

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you. I did not know the IML forum existed. My bad. Thank you again. Rick

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It's no problem. I moved the thread to the IML forum.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I would recommend BY-group processing by using the WHERE clause on the READ statement, as shown in this blog post:

In addition, you can use the "SETIN" and "SETOUT" statements to have two data sets open simultaneously and read and write to them in a loop. Refer to this blog post for a review of the available options:

Using the above techniques, you can remove the macro and the macro loop and instead use a pure IML loop over the number of BY-groups, as determined by PROC FREQ.

I have rewritten your program and modified your dataset to cover a more general case:

```
data untrimmed;
input group lag2cvrank lagcfo_ts cfo_ts lag2cfo_ts;
cards;
1991 0 155 175 165
1991 0 200 225 250
1991 0 75 125 135
1991 0 350 375 400
1991 1 155 175 165
1991 1 200 225 250
1991 1 75 125 135
1992 0 155 175 165
1992 0 200 225 250
1992 0 75 125 135
1992 0 350 375 400
1992 2 155 175 165
1992 2 200 225 250
1992 2 75 125 135
;
/* find unique BY-group combinations */
proc freq data=untrimmed;
tables group*lag2cvrank / out=FreqOut;
run;
proc iml;
/* Suggest using the Theil-Sen function
https://blogs.sas.com/content/iml/2019/05/28/theil-sen-robust-regression.html
because it handles the case of infinite slope.*/
start compute_slopes(XY);
c = allcomb(nrow(XY), 2); /* all "N choose 2" combinations of pairs */
group = XY[1,3]; /* extract group from XY, to be used as a BY variable later */
lag2cvrank = XY[1,4]; /* extract lag2cvrank from XY, to be used as a BY variable later */
Pt1 = XY[c[,1],]; /* extract first point of line segments */
Pt2 = XY[c[,2],]; /* extract second point of line segments */
slope = (Pt1[,2] - Pt2[,2]) / (Pt1[,1] - Pt2[,1]); /* (y1 - y2)/(x1 - x2) */
m = median(slope); /* scalar */
b = median( XY[,2] - m*XY[,1] ); /* median(y-mx) */ /* Scalar */
mxy = m*XY[,2]; /* N x 1 Vector */
t = nrow(XY); /* number of rows in XY */
int = J(t, 1, b); /* nrow x 1 column vector */
p = int + mxy; /* nrow x 1 column vector */
print (b||m)[c={'Intercept' 'Slope'} L="Method=Theil Combs=All"];
group_col = J(t, 1, group);
lag2cvrank_col = J(t, 1, lag2cvrank);
return (int || mxy || p || group_col || lag2cvrank_col);
finish;
/* read the BY groups */
use FreqOut nobs NumGroups;
read all var {group lag2cvrank};
close FreqOut;
print NumGroups, group lag2cvrank;
use work.untrimmed;
create ts_1 var {int mxy p group_col lag2cvrank_col};
setin work.untrimmed;
setout ts_1;
inVarNames = {"lag2cfo_ts" "lagcfo_ts" "group" "lag2cvrank"};
do i = 1 to NumGroups; /* for each BY group */
read all var inVarNames into XY
where(group=(group[i]) & lag2cvrank=(lag2cvrank[i]));
/* X contains data for i_th group; analyze it */
G = compute_slopes(XY);
/* extract the columns of the matrix */
int=G[,1]; mxy=G[,2]; p=G[,3]; group_col=G[,4]; lag2cvrank_col=G[,5];
append;
end;
close work.untrimmed;
close ts_1;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Best of luck to you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

__but I'm not seeing any assignment between PROC FREQ (or the related output dataset and NumGroups__; what am I missing? and 2) the goal is to create a sas dataset with the results, which appear in the dataset ts_1; __however, this dataset contains 16 obs but should only be 14 (from the original input data); the difference is two missing obs, which occur at the 8th and 16th observations__; something about the loop is looking for data where no data exists maybe; so what is the source of the missing obs? Thank you for any ideas or suggestions! Rick

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @rfrancis, the following are answers to your questions:

1- This line `use FreqOut nobs NumGroups;`

means:

open the `FreqOut`

dataset and store its number of observations in an IML variable named `NumGroups`

. So `NumGroups`

is assigned on this line.

2- Please change the following line in your program from:

`create ts_1 var {int mxy p group lag2cvrank};`

to:

`create ts_1 var {int mxy p group_col lag2cvrank_col};`

This change should eliminate those missing values. I have corrected my answer earlier in this thread to reflect this change.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Wow! So easy ... thank you for what you do on this site! I am grateful! Rick

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You are very welcome! Thanks for the follow-up questions.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.