Hello,
I'm trying to loop though a large dataset in sas to perform analysis in R (poisson with robust SE on very large datasets) using proc iml. However, when running the macro-code I'm getting the following error which I cannot conqueror:
Submit block cannot be directly placed in a macro. Instead, place the submit block into a file first and then use %include to include the file within a macro definition
%macro robust_se_r(catnum);
submit;
proc sql;
create table r as select
* from splinebasis
where catnum eq &catnum;;
quit;
endrsubmit;
proc iml;
%robust_se_r(8);
quit;
call ExportDataSetToR("r", "data1");
submit / R;
library(sandwich)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
call ImportDataSetFromR("r_done", "full");
data r_done_com;
set r_done;
irr=exp(Estimate);
Lower=exp(LL);
Upper=exp(UL);
catnum=&catnum;
run;
data wk.cat_&catnum;
set r_done_com;
run;
%mend();
proc iml;
%robust_se_r(8);
quit;
I've kind of tried all combinations including %inc of the iml part in a .sas file called from within proc iml.
What I'm I doing wrong?
Thanks
So you need to place this
submit / R;
library(sandwich)
data1<-subset(data, data$catnum==x,)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
in a separate file (e.g. /somepath/include.sas), and use
proc iml;
call ExportDataSetToR("r", "data");
%include "/somepath/include.sas";
call ImportDataSetFromR("r_done", "full");
quit;
in the macro.
Since you call the macro unconditionally within itself, this would lead to an infinite recursion.
Start with working non-macro code first that solves a single instance, identify the parts that need to be made dynamic, and THEN start with macro coding.
Thanks for the reply!
proc sql;
create table r as select
* from splinebasis
where catnum eq 1;
quit;
proc iml;
call ExportDataSetToR("r", "data");
submit / R;
library(sandwich)
data1<-subset(data, data$catnum==x,)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
call ImportDataSetFromR("r_done", "full");
quit;
data r_done_com;
set r_done;
irr=exp(Estimate);
Lower=exp(LL);
Upper=exp(UL);
drop V1;
catnum=1;
run;
data cat_1;
set r_done_com;
run;
This is my working single instance example!
And you want the condition
where catnum eq 1
the assignment
catnum=1;
and the dataset name
data cat_1;
to be dynamic, and the whole thing run repeatedly for different values? If yes, how are those different values determined (stored in a dataset, do loop, ...)?
Yes,
And the values are stored in a dataset column and I call it using
data _null_;
set all_catnums;
call execute(cats('%nrstr(%r_analysis(',catnum,'))'));
run;
quit;
So you need to place this
submit / R;
library(sandwich)
data1<-subset(data, data$catnum==x,)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
in a separate file (e.g. /somepath/include.sas), and use
proc iml;
call ExportDataSetToR("r", "data");
%include "/somepath/include.sas";
call ImportDataSetFromR("r_done", "full");
quit;
in the macro.
Mind that I am NO expert for PROC IML at all. In fact we have not even licensed SAS/IML where I work. I only did a quick study of the IML SUBMIT statement documentation (see Maxim 1), and did take note of the message you got and posted in your original question.
@Rick_SAS will be able to provide expert guidance for improving your IML performance, believe me!
Actually this was my first code!
%macro r_analysis(catnum);
proc sql;
create table r as select
* from splinebasis
where catnum eq &catnum;
quit;
proc iml;
call ExportDataSetToR("r", "data");
submit / R;
library(sandwich)
data1<-subset(data, data$catnum==x,)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)
endsubmit;
call ImportDataSetFromR("r_done", "full");
quit;
data r_done_com;
set r_done;
irr=exp(Estimate);
Lower=exp(LL);
Upper=exp(UL);
drop V1;
catnum=&catnum;
run;
data wk.cat_&catnum;
set r_done_com;
run;
mend;
Kurt is doing a good job guiding you.
I will add one important point: There is no need for a macro here, and eliminating the macro would simplify the code. The IML language can read the data using a WHERE clause and can write the output data set(s) directly. There is no need for the PROC SQL or DATA steps. IML also has a looping structure, in case the purpose of the macro is to loop over one or more values of catnum.
Here is a simplified example that shows how to use the %INCLUDE statement to wrap the SUBMIT/R block:
/* 1. FIRST: Get code working without macro */
%let catnum=8;
proc sql;
create table r as select * from splinebasis
where catnum eq &catnum;;
quit;
proc iml;
call ExportDataSetToR("r", "data1");
/* PUT THIS SUBMIT BLOCK INTO A FILE such as C:/temp/RCode.sas */
/***************************/
submit / R;
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
endsubmit;
/***************************/
quit;
/* 2. NEXT: CREATE THE FILE */
/*3. LAST: Include file with %INC */
%macro robust_se_r(catnum);
proc sql;
create table r as select * from splinebasis
where catnum eq &catnum;;
quit;
proc iml;
call ExportDataSetToR("r", "data1");
/* SUBMIT BLOCK IS IN THIS FILE */
%inc "C:/temp/RCode.sas";
quit;
%mend;
%robust_se_r(8);
Thanks for excellent guidance!
If you are sending many large data sets to R, that is probably one reason it is slow. The other is that you are starting/quitting IML and R each time you run the macro, as well as reloading the package.
I guess you are trying to read/analyze/write all subgroups of the data? If PROC GENMOD can perform the Poisson regression analysis that you want, then the fastest way is to perform all the Poisson regressions in SAS by using BY-group analysis and bypass R. That way you don't have to read/write the data between SAS and R.
If you want to use R, the fastest way might be to use the DATA step to create the input data sets, transfer each one to R, and then export them out.
And here's another efficiency tip: Your post-IML DATA steps are unnecessary. Use R to add the exponentiated variables and then import the results directly to the data set name that you want. The following program should help you get started.
data inCat1 inCat2 inCat3 inCat4 inCat5 inCat6 inCat7 inCat8;
set splinebasis;
select (inCatnum);
when (1) output inCat1;
when (2) output inCat2;
when (3) output inCat3;
when (4) output inCat4;
when (5) output inCat5;
when (6) output inCat6;
when (7) output inCat7;
when (8) output inCat8;
otherwise;
end;
run;
proc iml;
do catnum = 1 to 8;
inDSName = cats("inCat", char(catnum));
outDSName = cats("outCat", char(catnum));
print catnum inDSName outDSName;
/*
call ExportDataSetToR("r", dsname);
SUBMIT R statements HERE
call ImportDataSetFromR(outDSName, "full");
*/
end;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.