BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
td1345
Fluorite | Level 6

Hello,

 

I'm trying to loop though a large dataset in sas to perform analysis in R (poisson with robust SE on very large datasets) using proc iml. However, when running the macro-code I'm getting the following error which I cannot conqueror:

Submit block cannot be directly placed in a macro. Instead, place the submit block into a file first and then use %include to include the file within a macro definition
%macro robust_se_r(catnum);

submit;
proc sql;
create table r as select
* from splinebasis
where catnum eq &catnum;;
quit;
endrsubmit;


proc iml;
%robust_se_r(8);
quit;

call ExportDataSetToR("r", "data1");
submit / R;
library(sandwich)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)

endsubmit;
call ImportDataSetFromR("r_done", "full");

data r_done_com;
set r_done;
irr=exp(Estimate);
Lower=exp(LL);
Upper=exp(UL);
catnum=&catnum;
run;

data wk.cat_&catnum;
set r_done_com;
run;




%mend();

proc iml;
%robust_se_r(8);
quit;

I've kind of tried all combinations including %inc of the iml part in a .sas file called from within proc iml.

 

What I'm I doing wrong?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

So you need to place this

	submit / R;
	library(sandwich)
	data1<-subset(data, data$catnum==x,)
	df=data1[c(7,9:27)]
	poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
	cov.poimod <- vcovHC(poimod, type="HC0")
	std.err <- sqrt(diag(cov.poimod))
	estimate=coef(poimod)[!is.na(coef(poimod))]
	se=std.err[!is.na(std.err)]
	ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
	ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
	p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
	AIC=AIC(poimod)
	gof=1-pchisq(poimod$deviance, poimod$df.residual)
	full=cbind(estimate, se, ll, ul,p, AIC, gof)
	full<-cbind(rownames(full),full)

	endsubmit;

in a separate file (e.g. /somepath/include.sas), and use

proc iml;
call ExportDataSetToR("r", "data");
%include "/somepath/include.sas";
call ImportDataSetFromR("r_done", "full");
quit;

in the macro.

View solution in original post

12 REPLIES 12
Kurt_Bremser
Super User

Since you call the macro unconditionally within itself, this would lead to an infinite recursion.

 

Start with working non-macro code first that solves a single instance, identify the parts that need to be made dynamic, and THEN start with macro coding.

td1345
Fluorite | Level 6

Thanks for the reply!

 


proc sql;
create table r as select
* from splinebasis
where catnum eq 1;
quit;


proc iml;
call ExportDataSetToR("r", "data");
	submit / R;
	library(sandwich)
	data1<-subset(data, data$catnum==x,)
	df=data1[c(7,9:27)]
	poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
	cov.poimod <- vcovHC(poimod, type="HC0")
	std.err <- sqrt(diag(cov.poimod))
	estimate=coef(poimod)[!is.na(coef(poimod))]
	se=std.err[!is.na(std.err)]
	ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
	ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
	p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
	AIC=AIC(poimod)
	gof=1-pchisq(poimod$deviance, poimod$df.residual)
	full=cbind(estimate, se, ll, ul,p, AIC, gof)
	full<-cbind(rownames(full),full)

	endsubmit;
	call ImportDataSetFromR("r_done", "full");
quit;

	data r_done_com;
	set r_done;
	irr=exp(Estimate);
	Lower=exp(LL);
	Upper=exp(UL);
	drop V1;
	catnum=1;
	run;

	data cat_1;
	set r_done_com;
	run;

This is my working single instance example!

 

 

Kurt_Bremser
Super User

And you want the condition

where catnum eq 1

the assignment

catnum=1;

and the dataset name

data cat_1;

to be dynamic, and the whole thing run repeatedly for different values? If yes, how are those different values determined (stored in a dataset, do loop, ...)?

td1345
Fluorite | Level 6

Yes,

 

And the values are stored in a dataset column and I call it using

data _null_;
set all_catnums;
call execute(cats('%nrstr(%r_analysis(',catnum,'))'));
run;
quit;
Kurt_Bremser
Super User

So you need to place this

	submit / R;
	library(sandwich)
	data1<-subset(data, data$catnum==x,)
	df=data1[c(7,9:27)]
	poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
	cov.poimod <- vcovHC(poimod, type="HC0")
	std.err <- sqrt(diag(cov.poimod))
	estimate=coef(poimod)[!is.na(coef(poimod))]
	se=std.err[!is.na(std.err)]
	ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
	ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
	p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
	AIC=AIC(poimod)
	gof=1-pchisq(poimod$deviance, poimod$df.residual)
	full=cbind(estimate, se, ll, ul,p, AIC, gof)
	full<-cbind(rownames(full),full)

	endsubmit;

in a separate file (e.g. /somepath/include.sas), and use

proc iml;
call ExportDataSetToR("r", "data");
%include "/somepath/include.sas";
call ImportDataSetFromR("r_done", "full");
quit;

in the macro.

td1345
Fluorite | Level 6
Thanks alot!
Kurt_Bremser
Super User

Mind that I am NO expert for PROC IML at all. In fact we have not even licensed SAS/IML where I work. I only did a quick study of the IML SUBMIT statement documentation (see Maxim 1), and did take note of the message you got and posted in your original question.

@Rick_SAS will be able to provide expert guidance for improving your IML performance, believe me!

td1345
Fluorite | Level 6

Actually this was my first code!

%macro r_analysis(catnum);

proc sql;
create table r as select
* from splinebasis
where catnum eq &catnum;
quit;


proc iml;
call ExportDataSetToR("r", "data");
	submit / R;
	library(sandwich)
	data1<-subset(data, data$catnum==x,)
	df=data1[c(7,9:27)]
	poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
	cov.poimod <- vcovHC(poimod, type="HC0")
	std.err <- sqrt(diag(cov.poimod))
	estimate=coef(poimod)[!is.na(coef(poimod))]
	se=std.err[!is.na(std.err)]
	ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
	ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
	p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
	AIC=AIC(poimod)
	gof=1-pchisq(poimod$deviance, poimod$df.residual)
	full=cbind(estimate, se, ll, ul,p, AIC, gof)
	full<-cbind(rownames(full),full)

	endsubmit;
	call ImportDataSetFromR("r_done", "full");
quit;

	data r_done_com;
	set r_done;
	irr=exp(Estimate);
	Lower=exp(LL);
	Upper=exp(UL);
	drop V1;
	catnum=&catnum;
	run;

	data wk.cat_&catnum;
	set r_done_com;
	run;
mend;

Rick_SAS
SAS Super FREQ

Kurt is doing a good job guiding you.

 

I will add one important point: There is no need for a macro here, and eliminating the macro would simplify the code. The IML language can read the data using a WHERE clause and can write the output data set(s) directly. There is no need for the PROC SQL or DATA steps. IML also has a looping structure, in case the purpose of the macro is to loop over one or more values of catnum.

 

Here is a simplified example that shows how to use the %INCLUDE statement to wrap the SUBMIT/R block:

 

/* 1. FIRST: Get code working without macro */
%let catnum=8;
proc sql;
create table r as select * from splinebasis
where catnum eq &catnum;;
quit;

proc iml;
call ExportDataSetToR("r", "data1");
/* PUT THIS SUBMIT BLOCK INTO A FILE such as C:/temp/RCode.sas */
/***************************/
submit / R;
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
endsubmit;
/***************************/
quit;

/* 2. NEXT: CREATE THE FILE */

/*3.  LAST: Include file with %INC */
%macro robust_se_r(catnum);
   proc sql;
   create table r as select * from splinebasis
   where catnum eq &catnum;;
   quit;

   proc iml;
   call ExportDataSetToR("r", "data1");

   /* SUBMIT BLOCK IS IN THIS FILE */
   %inc "C:/temp/RCode.sas";
   quit;
%mend;

%robust_se_r(8);
td1345
Fluorite | Level 6

Thanks for excellent guidance!

 

 

td1345
Fluorite | Level 6
I have 'macro' version working but I guess its way slower as it read-writes a large input file for every catnum. I did try to run it using a do-step but I have not succedded yet. Could you point me in the right direction?

Rick_SAS
SAS Super FREQ

If you are sending many large data sets to R, that is probably one reason it is slow. The other is that you are starting/quitting IML and R each time you run the macro, as well as reloading the package.

 

I guess you are trying to read/analyze/write all subgroups of the data? If PROC GENMOD can perform the Poisson regression analysis that you want, then the fastest way is to perform all the Poisson regressions in SAS by using BY-group analysis and bypass R. That way you don't have to read/write the data between SAS and R.

 

If you want to use R, the fastest way might be to use the DATA step to create the input data sets, transfer each one to R, and then export them out.

 

And here's another efficiency tip: Your post-IML DATA steps are unnecessary. Use R to add the exponentiated variables and then import the results directly to the data set name that you want. The following program should help you get started.

 

data inCat1 inCat2 inCat3 inCat4 inCat5 inCat6 inCat7 inCat8;
set splinebasis;
select (inCatnum);
   when (1)  output inCat1;
   when (2)  output inCat2;
   when (3)  output inCat3;
   when (4)  output inCat4;
   when (5)  output inCat5;
   when (6)  output inCat6;
   when (7)  output inCat7;
   when (8)  output inCat8;
   otherwise;
end;
run;

proc iml;
do catnum = 1 to 8;
   inDSName = cats("inCat", char(catnum));
   outDSName = cats("outCat", char(catnum));
   print catnum inDSName outDSName;
   /*
   call ExportDataSetToR("r", dsname);

   SUBMIT R statements HERE

	call ImportDataSetFromR(outDSName, "full");
   */
end;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 12 replies
  • 1253 views
  • 6 likes
  • 3 in conversation