## Dataset structure for regression

Occasional Contributor
Posts: 15

# Dataset structure for regression

I have a dataset block, I call it A.
A is the actual data, which is large.

Then I have A_1, A_2, ... A_3,....A_n which are simulated, smaller blocks of data that have the same # of columns as A but a lot less rows than A. (n=1000, say).

I want to regress using the dataset that has A_1 appended to A.
Then, repeat with A_2 appended to A, until A_n appended to A.

Is there any good way to do this as efficiently as possible, without appending the data A_1,...., A_n to A through a loop data step and performing regression?

Thank you.
Contributor
Posts: 22

## Re: Dataset structure for regression

Not that I know of (using SAS for 27 years). I would just write a macro loop to append the data for each case and do any analysis. Something like:
[pre]
%macro loopy;
%do i=1 %to 1000;
data Aplus;
set A A_&i;
run;
proc reg data=Aplus;
*--- etc. ----;
run;
%end;
%mend;
[/pre]
%loopy
Posts: 3,852

## Re: Dataset structure for regression

I would try to avoid 1000 data steps followed by 1000 calls to PROC REG.

A data step view seems like a good choice and with a bit of help from the macro language to write it you can run one data step and one call to PROC REG.

[pre]
data A;
set sashelp.class;
run;

*** You don't need this as you already have 1000 sim datas;
%macro simdata(arg);
data
%do i = 1 %to &arg;
A&i
%end;
;
set sashelp.class(obs=5);
run;
%mend simdata;
options mprint=1;

%simdata(1000);

*** Combine each SIM data with a copy of A;
%macro combine(arg);
data all / view=all;
set
%do i = 1 %to &arg;
A&i A
%end;
indsname=indsname open=defer;

retain simgroup;
from = indsname;
if indsname ne 'WORK.A' then simgroup = indsname;
run;
%mend combine;
%combine(1000);

proc print data=all(obs=100);
run;

proc reg data=all noprint outest=est;
by NOTSORTED simgroup;
model weight = age height;
run;
quit;
[/pre]
Contributor
Posts: 22

## Re: Dataset structure for regression

I absolutely agree with data _null_ - I would want to avoid 1000 datasetps & 1000 proc regs - there is almost always a better solution by restructuring the problem - one way being the one suggested - however, there are rare times when you just gotta bite the bullet and muscle through the 1000 or more steps.
Posts: 3,852

## Re: Dataset structure for regression

I agree too.

I think the program will scale ok using OPEN=DEFER should allow the data step to concatenate the data sets efficiently.

For my similated 1000 data sets I was surprised that I could create 1000 data sets in a single step. If they had more variables it might be a problem.

Even if you could not concatenate 1000 data sets in one step the problem could probably be broken up to smaller say 100 data set groups.

I'd like to see the OP's program that created the 1000 simulated data sets.
Discussion stats
• 4 replies
• 159 views
• 0 likes
• 3 in conversation