Solved: Re: Proc iml within a macro that submits to R

td1345 · Posted 09-08-2020 03:57 AM

Hello,

I'm trying to loop though a large dataset in sas to perform analysis in R (poisson with robust SE on very large datasets) using proc iml. However, when running the macro-code I'm getting the following error which I cannot conqueror:

Submit block cannot be directly placed in a macro. Instead, place the submit block into a file first and then use %include to include the file within a macro definition

%macro robust_se_r(catnum);

submit;
proc sql;
create table r as select
* from splinebasis
where catnum eq &catnum;;
quit;
endrsubmit;


proc iml;
%robust_se_r(8);
quit;

call ExportDataSetToR("r", "data1");
submit / R;
library(sandwich)
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
cov.poimod <- vcovHC(poimod, type="HC0")
std.err <- sqrt(diag(cov.poimod))
estimate=coef(poimod)[!is.na(coef(poimod))]
se=std.err[!is.na(std.err)]
ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
AIC=AIC(poimod)
gof=1-pchisq(poimod$deviance, poimod$df.residual)
full=cbind(estimate, se, ll, ul,p, AIC, gof)
full<-cbind(rownames(full),full)

endsubmit;
call ImportDataSetFromR("r_done", "full");

data r_done_com;
set r_done;
irr=exp(Estimate);
Lower=exp(LL);
Upper=exp(UL);
catnum=&catnum;
run;

data wk.cat_&catnum;
set r_done_com;
run;




%mend();

proc iml;
%robust_se_r(8);
quit;

I've kind of tried all combinations including %inc of the iml part in a .sas file called from within proc iml.

What I'm I doing wrong?

Thanks

Kurt_Bremser · Posted 09-08-2020 05:34 AM

So you need to place this

	submit / R;
	library(sandwich)
	data1<-subset(data, data$catnum==x,)
	df=data1[c(7,9:27)]
	poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
	cov.poimod <- vcovHC(poimod, type="HC0")
	std.err <- sqrt(diag(cov.poimod))
	estimate=coef(poimod)[!is.na(coef(poimod))]
	se=std.err[!is.na(std.err)]
	ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
	ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
	p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
	AIC=AIC(poimod)
	gof=1-pchisq(poimod$deviance, poimod$df.residual)
	full=cbind(estimate, se, ll, ul,p, AIC, gof)
	full<-cbind(rownames(full),full)

	endsubmit;

in a separate file (e.g. /somepath/include.sas), and use

proc iml;
call ExportDataSetToR("r", "data");
%include "/somepath/include.sas";
call ImportDataSetFromR("r_done", "full");
quit;

in the macro.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 09-08-2020 05:02 AM

Since you call the macro unconditionally within itself, this would lead to an infinite recursion.

Start with working non-macro code first that solves a single instance, identify the parts that need to be made dynamic, and THEN start with macro coding.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

td1345 · Posted 09-08-2020 05:09 AM

Thanks for the reply!


proc sql;
create table r as select
* from splinebasis
where catnum eq 1;
quit;


proc iml;
call ExportDataSetToR("r", "data");
	submit / R;
	library(sandwich)
	data1<-subset(data, data$catnum==x,)
	df=data1[c(7,9:27)]
	poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
	cov.poimod <- vcovHC(poimod, type="HC0")
	std.err <- sqrt(diag(cov.poimod))
	estimate=coef(poimod)[!is.na(coef(poimod))]
	se=std.err[!is.na(std.err)]
	ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
	ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
	p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
	AIC=AIC(poimod)
	gof=1-pchisq(poimod$deviance, poimod$df.residual)
	full=cbind(estimate, se, ll, ul,p, AIC, gof)
	full<-cbind(rownames(full),full)

	endsubmit;
	call ImportDataSetFromR("r_done", "full");
quit;

	data r_done_com;
	set r_done;
	irr=exp(Estimate);
	Lower=exp(LL);
	Upper=exp(UL);
	drop V1;
	catnum=1;
	run;

	data cat_1;
	set r_done_com;
	run;

This is my working single instance example!

Kurt_Bremser · Posted 09-08-2020 05:18 AM

And you want the condition

where catnum eq 1

the assignment

catnum=1;

and the dataset name

data cat_1;

to be dynamic, and the whole thing run repeatedly for different values? If yes, how are those different values determined (stored in a dataset, do loop, ...)?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

td1345 · Posted 09-08-2020 05:24 AM

Yes,

And the values are stored in a dataset column and I call it using

data _null_;
set all_catnums;
call execute(cats('%nrstr(%r_analysis(',catnum,'))'));
run;
quit;

Kurt_Bremser · Posted 09-08-2020 05:34 AM

So you need to place this

	submit / R;
	library(sandwich)
	data1<-subset(data, data$catnum==x,)
	df=data1[c(7,9:27)]
	poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
	cov.poimod <- vcovHC(poimod, type="HC0")
	std.err <- sqrt(diag(cov.poimod))
	estimate=coef(poimod)[!is.na(coef(poimod))]
	se=std.err[!is.na(std.err)]
	ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
	ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
	p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
	AIC=AIC(poimod)
	gof=1-pchisq(poimod$deviance, poimod$df.residual)
	full=cbind(estimate, se, ll, ul,p, AIC, gof)
	full<-cbind(rownames(full),full)

	endsubmit;

in a separate file (e.g. /somepath/include.sas), and use

proc iml;
call ExportDataSetToR("r", "data");
%include "/somepath/include.sas";
call ImportDataSetFromR("r_done", "full");
quit;

in the macro.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

td1345 · Posted 09-08-2020 07:25 AM

Thanks alot!

Kurt_Bremser · Posted 09-08-2020 09:38 AM

Mind that I am NO expert for PROC IML at all. In fact we have not even licensed SAS/IML where I work. I only did a quick study of the IML SUBMIT statement documentation (see Maxim 1), and did take note of the message you got and posted in your original question.

@Rick_SAS will be able to provide expert guidance for improving your IML performance, believe me!

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

td1345 · Posted 09-08-2020 05:12 AM

Actually this was my first code!

%macro r_analysis(catnum);

proc sql;
create table r as select
* from splinebasis
where catnum eq &catnum;
quit;


proc iml;
call ExportDataSetToR("r", "data");
	submit / R;
	library(sandwich)
	data1<-subset(data, data$catnum==x,)
	df=data1[c(7,9:27)]
	poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
	cov.poimod <- vcovHC(poimod, type="HC0")
	std.err <- sqrt(diag(cov.poimod))
	estimate=coef(poimod)[!is.na(coef(poimod))]
	se=std.err[!is.na(std.err)]
	ll=coef(poimod)[!is.na(coef(poimod))] - 1.96 * std.err[!is.na(std.err)]
	ul = coef(poimod)[!is.na(coef(poimod))] + 1.96 * std.err[!is.na(std.err)]
	p=2*pnorm(abs(estimate)/se, lower.tail=FALSE)
	AIC=AIC(poimod)
	gof=1-pchisq(poimod$deviance, poimod$df.residual)
	full=cbind(estimate, se, ll, ul,p, AIC, gof)
	full<-cbind(rownames(full),full)

	endsubmit;
	call ImportDataSetFromR("r_done", "full");
quit;

	data r_done_com;
	set r_done;
	irr=exp(Estimate);
	Lower=exp(LL);
	Upper=exp(UL);
	drop V1;
	catnum=&catnum;
	run;

	data wk.cat_&catnum;
	set r_done_com;
	run;
mend;

Rick_SAS · Posted 09-08-2020 06:54 AM

Kurt is doing a good job guiding you.

I will add one important point: There is no need for a macro here, and eliminating the macro would simplify the code. The IML language can read the data using a WHERE clause and can write the output data set(s) directly. There is no need for the PROC SQL or DATA steps. IML also has a looping structure, in case the purpose of the macro is to loop over one or more values of catnum.

Here is a simplified example that shows how to use the %INCLUDE statement to wrap the SUBMIT/R block:

/* 1. FIRST: Get code working without macro */
%let catnum=8;
proc sql;
create table r as select * from splinebasis
where catnum eq &catnum;;
quit;

proc iml;
call ExportDataSetToR("r", "data1");
/* PUT THIS SUBMIT BLOCK INTO A FILE such as C:/temp/RCode.sas */
/***************************/
submit / R;
df=data1[c(7,9:27)]
poimod=glm(formula=df$events ~ . +offset(log(data1$pyrs)), family=poisson, data=df)
endsubmit;
/***************************/
quit;

/* 2. NEXT: CREATE THE FILE */

/*3.  LAST: Include file with %INC */
%macro robust_se_r(catnum);
   proc sql;
   create table r as select * from splinebasis
   where catnum eq &catnum;;
   quit;

   proc iml;
   call ExportDataSetToR("r", "data1");

   /* SUBMIT BLOCK IS IN THIS FILE */
   %inc "C:/temp/RCode.sas";
   quit;
%mend;

%robust_se_r(8);

td1345 · Posted 09-08-2020 07:24 AM

Thanks for excellent guidance!

td1345 · Posted 09-08-2020 08:59 AM

I have 'macro' version working but I guess its way slower as it read-writes a large input file for every catnum. I did try to run it using a do-step but I have not succedded yet. Could you point me in the right direction?

Rick_SAS · Posted 09-08-2020 09:34 AM

If you are sending many large data sets to R, that is probably one reason it is slow. The other is that you are starting/quitting IML and R each time you run the macro, as well as reloading the package.

I guess you are trying to read/analyze/write all subgroups of the data? If PROC GENMOD can perform the Poisson regression analysis that you want, then the fastest way is to perform all the Poisson regressions in SAS by using BY-group analysis and bypass R. That way you don't have to read/write the data between SAS and R.

If you want to use R, the fastest way might be to use the DATA step to create the input data sets, transfer each one to R, and then export them out.

And here's another efficiency tip: Your post-IML DATA steps are unnecessary. Use R to add the exponentiated variables and then import the results directly to the data set name that you want. The following program should help you get started.

data inCat1 inCat2 inCat3 inCat4 inCat5 inCat6 inCat7 inCat8;
set splinebasis;
select (inCatnum);
   when (1)  output inCat1;
   when (2)  output inCat2;
   when (3)  output inCat3;
   when (4)  output inCat4;
   when (5)  output inCat5;
   when (6)  output inCat6;
   when (7)  output inCat7;
   when (8)  output inCat8;
   otherwise;
end;
run;

proc iml;
do catnum = 1 to 8;
   inDSName = cats("inCat", char(catnum));
   outDSName = cats("outCat", char(catnum));
   print catnum inDSName outDSName;
   /*
   call ExportDataSetToR("r", dsname);

   SUBMIT R statements HERE

	call ImportDataSetFromR(outDSName, "full");
   */
end;

Registration is open