BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Piers
Obsidian | Level 7

Hello - again

Attached is a script where I:

 

1) generate all conceivable covariance matrices for a regression model A = B + C + B*C

2) strip out any matrix that does not have a positive determinant

3) re-configure the data into Type = COV form to submit to PROC SIMNORM

4) run the regressions on the dataset created by PROC SIMNORM

 

The reason for this is that I would like to know if there are parts of the covariance space which might give rise to significant (p < .05) interaction terms, even though the data are derived from normal distributions

 

Experienced programmers will be giggling by now because they will have predicted that I am generating huge datasets, and that PROC REG runs out of memory in the attempt to sore the output as a file.

 

Therefore, I am seeking help as to how to recode this script so that the very large data sets can be avoided (ideally I would like to have 10k duplications of each of the 651 covariance matrices, and to run each rgression on 10k data points), and proc reg does not run out of memory.

 

Any help  would be very gratefully received.

 

Piers C

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
It's missing a period after a call of the &rep. FYI - as mentioned your code will not run on SAS UE so it was untested.

SAS says it's looking for &REP_ note the underscore means it's a different macro variable.
Change

data est_&rep_&mtx;

to

data est_&rep._&mtx.;

Basically add periods after the macro variables.

View solution in original post

6 REPLIES 6
Reeza
Super User
proc sql;
create table macro_param_list as
select distinct rep, mtx
from reg;
quit;

%macro reg_data(rep=, mtx=);

proc reg data=reg edf tableout outest=est;
where rep = &rep and mtx = &mtx;
 model a = b c b_c;
run;


data est_&rep_&mtx; 
set est;
 if _TYPE_ ^= 'PVALUE' then delete;
 if b_c < 0.05 then INT = 1;
  else INT = 0;
   keep mtx b_c INT;
run;

%mend;

data run_macros;
set macro_param_list;
str = catt('%reg_data(rep=', 
            rep,
            ', mtx=',
            mtx,
            ');');
*str  should contain values that look like the following;           
*%reg_data(rep=1, mtx=5);

*call execute(str);
run;

data est1;
length source sim_run $45.;
set est_: indsname=source;
sim_run = source;
run;

Follows the principles illustrated here:

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

 

It doesn't create the text files, I'll leave that for you to add in. You may need to test it, SAS Academics on Demand errored out due to the size (and other reasons when a smaller size was used) but I wouldn't expect major issues, just likely bugs around parenthesis or commas. I don't have further time unfortunately, but it should get you started. 

 

 

 

 

Piers
Obsidian | Level 7

Hi Reeza

 

Very many thanks for this. But as a complete newbie to these kinds of approaches, I am struggling to find what I assume is a bug.

 

I have attached a modified version of my original script with only 10 replications per matrix, and only 10 datapoints per iteration of proc simnorm. It includes your code as well, but runs much more quickly as the dataset is so much smaller.

 

I succeed apparently in creating both the Macro_param_list and the Run_macros steps if I keep the      call execute(str);    commented out. I show a short screen shot of the latter:

 

Piers_0-1619500569512.png

 

However, when I include       call execute(str);      by uncommenting it, I get the following sets of errors. First:

 

  

IMG1.PNG

So, somehow the rep part is problematic - or the reference to it, although it does get to the end of the list, where 6510 observations are read, which is correct.

 

Then I get:

IMG2.PNG

It is very clear that the regressions are being run, but the datasets created for each regression are empty. Also, the filenames for each data set only contain REP_1, REP_2 etc. The Filenames do not contain information about the MTX.

 

If you have time to take a second look I would be very grateful. After about 3 hours or so, I just cant spot what the error is.

 

Best

 

Piers

 

 

 

 

Reeza
Super User
It's missing a period after a call of the &rep. FYI - as mentioned your code will not run on SAS UE so it was untested.

SAS says it's looking for &REP_ note the underscore means it's a different macro variable.
Change

data est_&rep_&mtx;

to

data est_&rep._&mtx.;

Basically add periods after the macro variables.
Ksharp
Super User

Or could you try SAS/IML code ? @Rick_SAS  might give you an hand.

Piers
Obsidian | Level 7

I have been reading Rick Wicklin's advice, and he is all for avoiding macros, preferring if possible data steps with BY statements. So I have tried to revert to this approach.

 

 

 

Piers
Obsidian | Level 7

Very many thanks.

 

I found that once I could access a stable disc space area, this simulation works fine using data steps and by statements only - its also quite efficient, even though generating datasets with 3,255,000,000 rows to be fed into the regression model. This only takes ~30mins

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 510 views
  • 2 likes
  • 3 in conversation