DATA Step, Macro, Functions and more

Merging a do loop dataset with an existing dataset

Reply
Occasional Contributor
Posts: 16

Merging a do loop dataset with an existing dataset

[ Edited ]

To many seasoned SAS programmers, this may look like a simple problem, I am sure, but this one has been boggling my mind. I do not have access to PROC IML so I have keep myself busy with sas/stat.

 

I have created a nested design with PROC PLAN and various datasteps. The dataset is called final. 

 

 

proc plan seed=6457149;
factors dep=5 pen=4 ordered subpen=2 ordered sex=8 random / noprint;
output out=first sex nvals=(1 1 1 1 2 2 2 2) random;
run;
data first(keep=AnimalID dep pen subpen sex);
set first;
AnimalID=_n_;
run;
proc sort data=first;
by dep pen subpen animalID;
run;
data first;
set first;
Block+1;
by dep pen subpen;
if first.subpen then Block=1;
run;
proc rank
     groups=20
     out=first;
var AnimalID;
ranks PenID2;
run;
data first(drop=PenID2 Pen);
set first;
PenID=PenID2+1;
run;
proc freq data=first nlevels;
table PenID;
run;
proc plan seed=6457149;
factors frequency=1 random comparison=20 random;
output out=randomtreat  
comparison CVALS=(	'1vs2'
					'1vs3'
					'1vs4'
					'1vs5'
					'2vs3'
					'2vs4'
					'2vs5'
					'3vs4'
					'3vs5'
					'4vs5'
					'1vs2'
					'1vs3'
					'1vs4'
					'1vs5'
					'2vs3'
					'2vs4'
					'2vs5'
					'3vs4'
					'3vs5'
					'4vs5');
run;
data randomtreat(drop=frequency);
set randomtreat;
PenID=_N_;
run;
proc sort data=first;
by PenID;
run;
data combined; 
merge first randomtreat;
by PenID;
run;
data final (drop=comparison);
set combined;
if subpen=1 then Trt=scan(comparison,1,'vs');
if subpen=2 then Trt=scan(comparison,2,'vs');
run;

This dataset shows for each animal their specific characteristics (which treatment, sex, dep, pen, block) .

 

Now, to simulate response variables, I want to create columns, for each level of a random factor (dep1-dep5 block1-block8 pen1-pen20) via an array, and I would look that array to be copied 320 times, for each animal the same array. This way, I can then create response variables by just going to the array. However, how to create 320 copied versions of an array I know, but not how to merge with an existing dataset via a do loop. 

 

This is what I have tried so far, but any kind of deviation has failed miserably. 

 

%let nBlock=8;
%let nDep=5;
%let nPen=20; 
%let PigVariance=0.001; 
%let PenVariance=0.0016;
%let BlockVariance=0.001;
%let DepVariance=0.0005;
%let N=320;
data combinedrandom (drop=j k l);
set final;
array b{&nBlock} b1-b&nBlock;
array d{&nDep} d1-d&nDep; 
array p{&nPen} p1-p&nPen;
array t[5] _temporary_ (0.670 0.660 0.650 0.640 0.630);
array s[2] _temporary_ (0 -0.10);
	do j=1 to &nBlock; /* j=1 to 8 */
	b{j} = rand("Normal", 0, sqrt(&BlockVariance)); /* create 8 columns of block */
	end;
		do k=1 to &nDep; 
		d{k} = rand("Normal", 0, sqrt(&DepVariance)); /* create 5 columns of department */
		end;
			do l=1 to &nPen; /* j=1 to 8 */
			p{l} = rand("Normal", 0, sqrt(&PenVariance)); /* create 20 columns of pen */
			end;
				do i=1 to &N; /* if N=320, it builds 320 copies for each row in the dataset final, however, if N=1 it builds 1 'copy' for each row of final --> i want it to copy first row 320 times*/
				if i=1 then output;
		y=t[trt]+s[sex]+ b[block] + d[dep] + p[penID] + rand("Normal", 0, sqrt(&PigVariance));
end;
run;

I hope this all make sense. Thanks in advance

 

Super User
Posts: 6,921

Re: Merging a do loop dataset with an existing dataset

What's the need for 320 sets of columns?  Wouldn't it be relatively easy to create 320 rows instead?

Occasional Contributor
Posts: 16

Re: Merging a do loop dataset with an existing dataset

Posted in reply to Astounding

the need for columns, in my mind, is so i can do this iteratively.... open to any other kind of suggestion

y=t[trt]+s[sex]+ b[block] + d[dep] + p[penID] + rand("Normal", 0, sqrt(&PigVariance))
Super User
Posts: 6,921

Re: Merging a do loop dataset with an existing dataset

[ Edited ]

While I'm not sure how you will be using the data later, this strikes me as a likely suitable approach to output 320 variations.  I didn't make loops for every piece (such as TRT and SEX) but those can easily be added:

 

do j=1 to &nBlock; /* j=1 to 8 */
   block = rand("Normal", 0, sqrt(&BlockVariance));
   do k=1 to &nDep;
      dep = rand("Normal", 0, sqrt(&DepVariance));
      do l=1 to &nPen; /* j=1 to 8 */
         pen = rand("Normal", 0, sqrt(&PenVariance));
         y=t[trt]+s[sex]+ block + dep + pen + rand("Normal", 0, sqrt(&PigVariance));

         output;

      end;

   end;
end;

Super User
Super User
Posts: 8,267

Re: Merging a do loop dataset with an existing dataset


MJ1985 wrote:

the need for columns, in my mind, is so i can do this iteratively.... open to any other kind of suggestion

y=t[trt]+s[sex]+ b[block] + d[dep] + p[penID] + rand("Normal", 0, sqrt(&PigVariance))

If you want to calculate Y using the values of TRT, SEX etc then just do that.

No need for the array indexing.

y=trt+sex+ block + dep + penID + rand("Normal", 0, sqrt(&PigVariance)) ;

If you want to generate data with specific combinations of TRT and SEX for example the DO statement makes that easy. So to generate 3*2*5 observations you could nested do loops like this.

do trt=10,100,1000;
  do sex=0,1 ;
    do dep=1 to 5 ;
      ....
      output;
    end;
  end;
end;
Super User
Super User
Posts: 8,267

Re: Merging a do loop dataset with an existing dataset

Not much sense in storing the data into "array".  Just store it as multiple observations in a dataset.

If your "array" has three dimensions (BLOCK,DEP,PEN) then your dataset needs four variables. One each for the dimensions and one to store the actual data value.

Occasional Contributor
Posts: 16

Re: Merging a do loop dataset with an existing dataset

Dear all, 

 

i am not sure if i explained my request well previously, or perhaps you all think i am doing something strange, but if you use the codes i hope you can follow my reasoning. 

 

the dataset final has, as a nested design using proc plan, for each animal stated which treatment, sex, department, block, and pen it belongs. Now, I want to use a do loop to create the random variables needed following the same nesting structture. It is like connecing a design matrix with an array of coefficients to estimate the response values. However, connecting the design matrix (proc plan) with the set of random variables is hard and in general i have no idea. 

 

None of the suggestions posed now help me in this regard, because the do loops do not create the experimental design i want and only produce the random variables i already now how to. I hope that running the code provided helps to explain what I am tryin to do. 

Super User
Super User
Posts: 8,267

Re: Merging a do loop dataset with an existing dataset

[ Edited ]

MJ1985 wrote:

Dear all, 

 

i am not sure if i explained my request well previously, or perhaps you all think i am doing something strange, but if you use the codes i hope you can follow my reasoning. 

 

the dataset final has, as a nested design using proc plan, for each animal stated which treatment, sex, department, block, and pen it belongs. Now, I want to use a do loop to create the random variables needed following the same nesting structture. It is like connecing a design matrix with an array of coefficients to estimate the response values. However, connecting the design matrix (proc plan) with the set of random variables is hard and in general i have no idea. 

 

None of the suggestions posed now help me in this regard, because the do loops do not create the experimental design i want and only produce the random variables i already now how to. I hope that running the code provided helps to explain what I am tryin to do. 


So the output of your PROC PLAN code is 320 observations with four variables.  

Some how you are also generating N sample (random?) animals and you want to combine them with the design matrix?

So if you had a dataset named ANIMALS with N observations (one per animal perhpas?) you could combine it with the FIRST dataset using something simple like this.

data want ; 
  set animal;
  do p=1 to nobs ;
     set first point=p nobs=nobs; 
      < logic to generate something>
      output; 
  end; 
run;

 And create a dataset with 320*N observations.

 

 

 

Occasional Contributor
Posts: 16

Re: Merging a do loop dataset with an existing dataset

Hi Tom, 

 

thanks for the code --> combining the data i can pull off, but the trick was to do in a do loop so i can use the array and start multiplying columns in order to get a value y. 

 

For instance, a given row has trt=5 sex=1 block=5 pen=2 animal=1 --> then i can get it to look at the coefficients for treatment, sex, and the randomly generated values of block pen and animal (this is a mixed model). By just combining the data, i have a lot of columns, but i cannot do the multiplation because the array is gone.. that is what i am trying to solve....

 

it is probably a vary tedious way, but fro me, using datasetp the only i know to create design matrices and and fixed/random coefficients in the same dataset whilst looping trhough every row to get the desired response variable..... just combing them will delete that option the way i see it...

Occasional Contributor
Posts: 16

Re: Merging a do loop dataset with an existing dataset

What I have now is this, a fixed dataset final which is my design matrix and a file combinedrandom which has all the random coefficients. I can combine the two and use the arrays, do the multiplcations by which i get response variables, and produce a mixed model. however, i want to do this a 1000 times and i cannot figure out how to implement that outer loop when i have so much datasteps reading and setting files. Much help appreciated here. 

 

%let nBlock=8;
%let nDep=5;
%let nPen=20; 
%let N=1;
%let PigVariance=0.006; 
%let PenVariance=0.006;
%let BlockVariance=0.003;
%let DepVariance=0.0005;
data combinedrandom(drop=i j k l);
	array b{&nBlock} b1-b&nBlock;
	array d{&nDep} d1-d&nDep; 
	array p{&nPen} p1-p&nPen;
do j=1 to &nBlock; /* j=1 to 8 */
	b{j} = rand("Normal", 0, &BlockVariance);
	end;
do k=1 to &nDep; 
	d{k} = rand("Normal", 0, &DepVariance);
	end;
do l=1 to &nPen; /* j=1 to 8 */
	p{l} = rand("Normal", 0, &PenVariance);
	end;
do i=1 to &N;
output;
end;	
run;
data want;
set final;
do z=1 to nobs;
set combinedrandom point=z nobs=nobs;
array b{8} b1-b8;
array d{5} d1-d5; 
array p{20} p1-p20;
array t[5] _temporary_ (0.670 0.660 0.650 0.640 0.630);
array s[2] _temporary_ (0 -0.10);
y=t[trt]+s[sex]+b[block]+d[dep]+p[pen]+rand("Normal", 0, sqrt(&PigVariance));
output;
end;
run;
proc mixed data=want covtest plots=all;
class dep block pen sex trt;
model y=trt sex dep / s cl ddfm=kenwardroger2;
random block pen / s cl;
lsmeans trt / pdiff=all adjdfe=row adjust=simulate;
run;
Super User
Super User
Posts: 8,267

Re: Merging a do loop dataset with an existing dataset


MJ1985 wrote:

What I have now is this, a fixed dataset final which is my design matrix and a file combinedrandom which has all the random coefficients. I can combine the two and use the arrays, do the multiplcations by which i get response variables, and produce a mixed model. however, i want to do this a 1000 times and i cannot figure out how to implement that outer loop when i have so much datasteps reading and setting files. Much help appreciated here. 

 

%let nBlock=8;
%let nDep=5;
%let nPen=20; 
%let N=1;
%let PigVariance=0.006; 
%let PenVariance=0.006;
%let BlockVariance=0.003;
%let DepVariance=0.0005;
data combinedrandom(drop=i j k l);
	array b{&nBlock} b1-b&nBlock;
	array d{&nDep} d1-d&nDep; 
	array p{&nPen} p1-p&nPen;
do j=1 to &nBlock; /* j=1 to 8 */
	b{j} = rand("Normal", 0, &BlockVariance);
	end;
do k=1 to &nDep; 
	d{k} = rand("Normal", 0, &DepVariance);
	end;
do l=1 to &nPen; /* j=1 to 8 */
	p{l} = rand("Normal", 0, &PenVariance);
	end;
do i=1 to &N;
output;
end;	
run;
data want;
set final;
do z=1 to nobs;
set combinedrandom point=z nobs=nobs;
array b{8} b1-b8;
array d{5} d1-d5; 
array p{20} p1-p20;
array t[5] _temporary_ (0.670 0.660 0.650 0.640 0.630);
array s[2] _temporary_ (0 -0.10);
y=t[trt]+s[sex]+b[block]+d[dep]+p[pen]+rand("Normal", 0, sqrt(&PigVariance));
output;
end;
run;
proc mixed data=want covtest plots=all;
class dep block pen sex trt;
model y=trt sex dep / s cl ddfm=kenwardroger2;
random block pen / s cl;
lsmeans trt / pdiff=all adjdfe=row adjust=simulate;
run;

What is the outer loop?  Are you doing repitiions?  

If so you could try adding a REP variable and using that. 

data combinedrandom(drop=i j k l);
  do REPNO=1 to 1000;
    ....
  end;
run;
....
proc sort data=want ;
  by REPNO ;
run;

proc mixed data=want covtest plots=all;
  by REPNO;
  ...
run;

You probable do NOT want to print all 1000 repetitions so add some OUTPUT options to your PROC MIXED and store the results you want into datasets and review those.

Super User
Super User
Posts: 8,267

Re: Merging a do loop dataset with an existing dataset

For instance, a given row has trt=5 sex=1 block=5 pen=2 animal=1 --> then i can get it to look at the coefficients for treatment, sex, and the randomly generated values of block pen and animal (this is a mixed model). By just combining the data, i have a lot of columns, but i cannot do the multiplation because the array is gone.. that is what i am trying to solve....

 

Where are the coefficients coming from? What variables are you using to store them.  Let's use CO_ prefix to distinguish between the coefficient and the actual value.  How do the coeffients vary? Is CO_TRT fixed?  Is it a function of TRT?  Of other variables?   That will determine how you store them and how you can combine those stored values back with your fact tables so you can use CO_TRT and CO_SEX, etc in your formulas.

 

If CO_TRT is fixed you could just initialize it using a retain statement.

RETAIN CO_TRT 0.123 ;

If it depends on the value of TRT then you could store it in a table.

TRT CO_TRT
1 0.123
2 0.234 
....

And then join the table on the value of TRT to get the right value of CO_TRT onto the right observations.

Or you could possible pass the value into an array in a data step if you really just had a series of constants that you wanted to paste into your program.

array co_trt_array (5) _temporary_ (0.123 0.234 0.345 0.456 0.567) ;
...
co_trt = co_trt_array(trt);
...

If it depends on values of multiple variables then you will need a bigger table perhaps with multiple index variables (TRT, SEX) and mutliple coefficient variables (CO_TRT, CO_SEX) and then join that table with your fact table.

 

 

Ask a Question
Discussion stats
  • 11 replies
  • 227 views
  • 0 likes
  • 3 in conversation