Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 09-08-2020 03:57 AM
(795 views)

Hello,

I'm trying to loop though a large dataset in sas to perform analysis in R (poisson with robust SE on very large datasets) using proc iml. However, when running the macro-code I'm getting the following error which I cannot conqueror:

Submit block cannot be directly placed in a macro. Instead, place the submit block into a file first and then use %include to include the file within a macro definition

```
%macro robust_se_r(catnum);
submit;
proc sql;
create table r as select
* from splinebasis
where catnum eq &catnum;;
quit;
endrsubmit;
proc iml;
%robust_se_r(8);
quit;
call ExportDataSetToR("r", "data1");
submit / R;
library(sandwich)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
call ImportDataSetFromR("r_done", "full");
data r_done_com;
set r_done;
irr=exp(Estimate);
Lower=exp(LL);
Upper=exp(UL);
catnum=&catnum;
run;
data wk.cat_&catnum;
set r_done_com;
run;
%mend();
proc iml;
%robust_se_r(8);
quit;
```

I've kind of tried all combinations including %inc of the iml part in a .sas file called from within proc iml.

What I'm I doing wrong?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So you need to place this

```
submit / R;
library(sandwich)
data1<-subset(data, data$catnum==x,)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
```

in a separate file (e.g. /somepath/include.sas), and use

```
proc iml;
call ExportDataSetToR("r", "data");
%include "/somepath/include.sas";
call ImportDataSetFromR("r_done", "full");
quit;
```

in the macro.

12 REPLIES 12

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Since you call the macro unconditionally within itself, this would lead to an infinite recursion.

Start with *working* non-macro code first that solves a single instance, identify the parts that need to be made dynamic, and THEN start with macro coding.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the reply!

```
proc sql;
create table r as select
* from splinebasis
where catnum eq 1;
quit;
proc iml;
call ExportDataSetToR("r", "data");
submit / R;
library(sandwich)
data1<-subset(data, data$catnum==x,)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
call ImportDataSetFromR("r_done", "full");
quit;
data r_done_com;
set r_done;
irr=exp(Estimate);
Lower=exp(LL);
Upper=exp(UL);
drop V1;
catnum=1;
run;
data cat_1;
set r_done_com;
run;
```

This is my working single instance example!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

And you want the condition

`where catnum eq 1`

the assignment

`catnum=1;`

and the dataset name

`data cat_1;`

to be dynamic, and the whole thing run repeatedly for different values? If yes, how are those different values determined (stored in a dataset, do loop, ...)?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes,

And the values are stored in a dataset column and I call it using

```
data _null_;
set all_catnums;
call execute(cats('%nrstr(%r_analysis(',catnum,'))'));
run;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So you need to place this

```
submit / R;
library(sandwich)
data1<-subset(data, data$catnum==x,)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
```

in a separate file (e.g. /somepath/include.sas), and use

```
proc iml;
call ExportDataSetToR("r", "data");
%include "/somepath/include.sas";
call ImportDataSetFromR("r_done", "full");
quit;
```

in the macro.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks alot!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Mind that I am NO expert for PROC IML at all. In fact we have not even licensed SAS/IML where I work. I only did a quick study of the IML SUBMIT statement documentation (see Maxim 1), and did take note of the message you got and posted in your original question.

@Rick_SAS will be able to provide expert guidance for improving your IML performance, believe me!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Actually this was my first code!

```
%macro r_analysis(catnum);
proc sql;
create table r as select
* from splinebasis
where catnum eq &catnum;
quit;
proc iml;
call ExportDataSetToR("r", "data");
submit / R;
library(sandwich)
data1<-subset(data, data$catnum==x,)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
call ImportDataSetFromR("r_done", "full");
quit;
data r_done_com;
set r_done;
irr=exp(Estimate);
Lower=exp(LL);
Upper=exp(UL);
drop V1;
catnum=&catnum;
run;
data wk.cat_&catnum;
set r_done_com;
run;
mend;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Kurt is doing a good job guiding you.

I will add one important point: There is no need for a macro here, and eliminating the macro would simplify the code. The IML language can read the data using a WHERE clause and can write the output data set(s) directly. There is no need for the PROC SQL or DATA steps. IML also has a looping structure, in case the purpose of the macro is to loop over one or more values of catnum.

Here is a simplified example that shows how to use the %INCLUDE statement to wrap the SUBMIT/R block:

```
/* 1. FIRST: Get code working without macro */
%let catnum=8;
proc sql;
create table r as select * from splinebasis
where catnum eq &catnum;;
quit;
proc iml;
call ExportDataSetToR("r", "data1");
/* PUT THIS SUBMIT BLOCK INTO A FILE such as C:/temp/RCode.sas */
/***************************/
submit / R;
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
endsubmit;
/***************************/
quit;
/* 2. NEXT: CREATE THE FILE */
/*3. LAST: Include file with %INC */
%macro robust_se_r(catnum);
proc sql;
create table r as select * from splinebasis
where catnum eq &catnum;;
quit;
proc iml;
call ExportDataSetToR("r", "data1");
/* SUBMIT BLOCK IS IN THIS FILE */
%inc "C:/temp/RCode.sas";
quit;
%mend;
%robust_se_r(8);
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for excellent guidance!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have 'macro' version working but I guess its way slower as it read-writes a large input file for every catnum. I did try to run it using a do-step but I have not succedded yet. Could you point me in the right direction?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you are sending many large data sets to R, that is probably one reason it is slow. The other is that you are starting/quitting IML and R each time you run the macro, as well as reloading the package.

I guess you are trying to read/analyze/write all subgroups of the data? If PROC GENMOD can perform the Poisson regression analysis that you want, then the fastest way is to perform all the Poisson regressions in SAS by using BY-group analysis and bypass R. That way you don't have to read/write the data between SAS and R.

If you want to use R, the fastest way might be to use the DATA step to create the input data sets, transfer each one to R, and then export them out.

And here's another efficiency tip: Your post-IML DATA steps are unnecessary. Use R to add the exponentiated variables and then import the results directly to the data set name that you want. The following program should help you get started.

```
data inCat1 inCat2 inCat3 inCat4 inCat5 inCat6 inCat7 inCat8;
set splinebasis;
select (inCatnum);
when (1) output inCat1;
when (2) output inCat2;
when (3) output inCat3;
when (4) output inCat4;
when (5) output inCat5;
when (6) output inCat6;
when (7) output inCat7;
when (8) output inCat8;
otherwise;
end;
run;
proc iml;
do catnum = 1 to 8;
inDSName = cats("inCat", char(catnum));
outDSName = cats("outCat", char(catnum));
print catnum inDSName outDSName;
/*
call ExportDataSetToR("r", dsname);
SUBMIT R statements HERE
call ImportDataSetFromR(outDSName, "full");
*/
end;
```

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.