BookmarkSubscribeRSS Feed
SAS_user_n
Calcite | Level 5
I have a dataset block, I call it A.
A is the actual data, which is large.

Then I have A_1, A_2, ... A_3,....A_n which are simulated, smaller blocks of data that have the same # of columns as A but a lot less rows than A. (n=1000, say).

I want to regress using the dataset that has A_1 appended to A.
Then, repeat with A_2 appended to A, until A_n appended to A.

Is there any good way to do this as efficiently as possible, without appending the data A_1,...., A_n to A through a loop data step and performing regression?

Thank you.
4 REPLIES 4
WaltSmith
Fluorite | Level 6
Not that I know of (using SAS for 27 years). I would just write a macro loop to append the data for each case and do any analysis. Something like:
[pre]
%macro loopy;
%do i=1 %to 1000;
data Aplus;
set A A_&i;
run;
proc reg data=Aplus;
*--- etc. ----;
run;
%end;
%mend;
[/pre]
%loopy
data_null__
Jade | Level 19
I would try to avoid 1000 data steps followed by 1000 calls to PROC REG.

A data step view seems like a good choice and with a bit of help from the macro language to write it you can run one data step and one call to PROC REG.

[pre]
data A;
set sashelp.class;
run;

*** You don't need this as you already have 1000 sim datas;
%macro simdata(arg);
data
%do i = 1 %to &arg;
A&i
%end;
;
set sashelp.class(obs=5);
run;
%mend simdata;
options mprint=1;

%simdata(1000);

*** Combine each SIM data with a copy of A;
%macro combine(arg);
data all / view=all;
set
%do i = 1 %to &arg;
A&i A
%end;
indsname=indsname open=defer;

retain simgroup;
from = indsname;
if indsname ne 'WORK.A' then simgroup = indsname;
run;
%mend combine;
%combine(1000);

proc print data=all(obs=100);
run;


proc reg data=all noprint outest=est;
by NOTSORTED simgroup;
model weight = age height;
run;
quit;
[/pre]
WaltSmith
Fluorite | Level 6
I absolutely agree with data _null_ - I would want to avoid 1000 datasetps & 1000 proc regs - there is almost always a better solution by restructuring the problem - one way being the one suggested - however, there are rare times when you just gotta bite the bullet and muscle through the 1000 or more steps.
data_null__
Jade | Level 19
I agree too.

I think the program will scale ok using OPEN=DEFER should allow the data step to concatenate the data sets efficiently.

For my similated 1000 data sets I was surprised that I could create 1000 data sets in a single step. If they had more variables it might be a problem.

Even if you could not concatenate 1000 data sets in one step the problem could probably be broken up to smaller say 100 data set groups.

I'd like to see the OP's program that created the 1000 simulated data sets.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 978 views
  • 0 likes
  • 3 in conversation