- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello - again
Attached is a script where I:
1) generate all conceivable covariance matrices for a regression model A = B + C + B*C
2) strip out any matrix that does not have a positive determinant
3) re-configure the data into Type = COV form to submit to PROC SIMNORM
4) run the regressions on the dataset created by PROC SIMNORM
The reason for this is that I would like to know if there are parts of the covariance space which might give rise to significant (p < .05) interaction terms, even though the data are derived from normal distributions
Experienced programmers will be giggling by now because they will have predicted that I am generating huge datasets, and that PROC REG runs out of memory in the attempt to sore the output as a file.
Therefore, I am seeking help as to how to recode this script so that the very large data sets can be avoided (ideally I would like to have 10k duplications of each of the 651 covariance matrices, and to run each rgression on 10k data points), and proc reg does not run out of memory.
Any help would be very gratefully received.
Piers C
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS says it's looking for &REP_ note the underscore means it's a different macro variable.
Change
data est_&rep_&mtx;
to
data est_&rep._&mtx.;
Basically add periods after the macro variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sql;
create table macro_param_list as
select distinct rep, mtx
from reg;
quit;
%macro reg_data(rep=, mtx=);
proc reg data=reg edf tableout outest=est;
where rep = &rep and mtx = &mtx;
model a = b c b_c;
run;
data est_&rep_&mtx;
set est;
if _TYPE_ ^= 'PVALUE' then delete;
if b_c < 0.05 then INT = 1;
else INT = 0;
keep mtx b_c INT;
run;
%mend;
data run_macros;
set macro_param_list;
str = catt('%reg_data(rep=',
rep,
', mtx=',
mtx,
');');
*str should contain values that look like the following;
*%reg_data(rep=1, mtx=5);
*call execute(str);
run;
data est1;
length source sim_run $45.;
set est_: indsname=source;
sim_run = source;
run;
Follows the principles illustrated here:
Tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
It doesn't create the text files, I'll leave that for you to add in. You may need to test it, SAS Academics on Demand errored out due to the size (and other reasons when a smaller size was used) but I wouldn't expect major issues, just likely bugs around parenthesis or commas. I don't have further time unfortunately, but it should get you started.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reeza
Very many thanks for this. But as a complete newbie to these kinds of approaches, I am struggling to find what I assume is a bug.
I have attached a modified version of my original script with only 10 replications per matrix, and only 10 datapoints per iteration of proc simnorm. It includes your code as well, but runs much more quickly as the dataset is so much smaller.
I succeed apparently in creating both the Macro_param_list and the Run_macros steps if I keep the call execute(str); commented out. I show a short screen shot of the latter:
However, when I include call execute(str); by uncommenting it, I get the following sets of errors. First:
So, somehow the rep part is problematic - or the reference to it, although it does get to the end of the list, where 6510 observations are read, which is correct.
Then I get:
It is very clear that the regressions are being run, but the datasets created for each regression are empty. Also, the filenames for each data set only contain REP_1, REP_2 etc. The Filenames do not contain information about the MTX.
If you have time to take a second look I would be very grateful. After about 3 hours or so, I just cant spot what the error is.
Best
Piers
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS says it's looking for &REP_ note the underscore means it's a different macro variable.
Change
data est_&rep_&mtx;
to
data est_&rep._&mtx.;
Basically add periods after the macro variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have been reading Rick Wicklin's advice, and he is all for avoiding macros, preferring if possible data steps with BY statements. So I have tried to revert to this approach.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Very many thanks.
I found that once I could access a stable disc space area, this simulation works fine using data steps and by statements only - its also quite efficient, even though generating datasets with 3,255,000,000 rows to be fed into the regression model. This only takes ~30mins