Obsidian | Level 7

## Computing the Approximate Sample Distribution(ASD) and other Statistics by groups in SAS IML

Hi,

I wrote a simulation in SAS/IML but I am having trouble getting the ASD (Approximation Sample Distribution) by group within SAS IML using some of the [,:] function. Any help would be greatly appreciated. My other questions as referenced in the codes are as follow:

Question 1) Is there a better to write this loop. I am looping over time and there are two replicate samples. The code looks funny and I wondering if there is a better way to write it so that it.

Question 2) I have used an example from the Simulating data with SAS book and used it create my IDs. I just wonder if this is the write way to do it.

Question 3) How to create an ordered matrix that one can readily write into a SAS dataset? In my example, I have had to create the dataset then read it as a matrix, then sort it before creating the final dataset.

Question 4). This is the main question. I wanted to know how to get the ASD by groups (i.e by replicate samples, by time and by replicate samples and by time) from the matrix directly within SAS IML without having to do it using proc means and proc univariate.

Data param;
input replicate pr_pa0 pr_pa0_ith or_pa_ith;
datalines;
1 0.30 0.40 2.0
2 0.20 0.33 1.4
;
run;

proc iml;

/*Functions*/
Start logit(Pr);
n = log(Pr/(1-Pr));
return(n);
finish;

/*parameters*/
use param;
close param;

rep=2; *number of replications;
N =5;  *size each the sample;
time_length=10; *number of timepoints;
time = T( do(1, time_length, 1));

/*Baseline*/
pa0=j(rep,N);
call randgen(pa0, "Bernoulli", pr_pa0); /*initialize the obeservations at baseline*/
pa = pa0 // j(rep*(time_length-1), N,.); /*create a vector of empty cells to be filled in follow-up*/

/*Follow-up*/
pa_ith = j(rep, N); /*temporary variable to store ith value*/;

do i = 2 to time_length;                                      /*Question 1*/
r=rep;
p=i-1;
call randgen(pa_ith, "Bernoulli", logistic(logit(pr_pa0_ith) + or_pa_ith#pa[(r*(p-1) + 1): r*p, ]));
pa[(r*p + 1): r*(p+1), ] =  pa_ith;
end;

print pa;

/*creating an id variable*/                                    /*Question 2*/
id=repeat(1:N, rep*time_length,1);
timeid=repeat(T(1:time_length), 1, rep*N);
sampid=repeat(T(1:rep), time_length, N);

/*creating a dataset or sorting the dataset by sampid id timeid*/
create simreg var {sampid id timeid pa};
append;
close simreg;

use simreg;
read all var _NUM_ into m[colname=VarNames];                     /*Question 3*/
close simreg;

call sort(m, {sampid id timeid });

create simreg2 from m [colname=varNames];
append from m;
close simreg2;
quit;

/*Getting the approximate simulation distribution*/

proc sort data=simreg2; by sampid timeid;run;                     /*Question 4*/
proc means data=simreg2 noprint;
by sampid timeid;
var pa;
output out=Outstats mean=Sample_mean;
run;

proc sort data=outstats; by timeid;
proc univariate data=outstats noprint;
by timeid;
var sample_mean;
output out=results
mean=Sample_mean
pctlpts=2.5 97.5
pctlpre	=pctl_mean
std = sd_Mean;
run;

proc print data= results;
run;

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: Computing the Approximate Sample Distribution(ASD) and other Statistics by groups in SAS IML

> Is there a better to write this loop.

I don't know what "better" or "funny" means. I think the code would be clearer if you define variables that contains the rows:
rows = (r*(p-1) + 1) : r*p;
nextRows = rows + r;

> I have used an example from the Simulating data with SAS book and used it create my IDs. I just wonder if this is the right way to do it.

Run a small example and look at the output. Is it correct? The nested structure looks correct to me, but I'm not certain of the design of your simulation.

> How to create an ordered matrix that one can readily write into a SAS dataset?

m = colvec(sampid) || colvec(id) || colvec(timeid) || colvec(pa);
call sort(m, 1:3);

Although it's not clear you'll need to do this is you can do (4).

> This is the main question. I wanted to know how to get the ASD by groups (i.e by replicate samples, by time and by replicate samples and by time) from the matrix directly within SAS IML without having to do it using proc means and proc univariate.

Use the SHAPE function to reshape the 0/1 data into the appropriate dimensions, then use the subscript reduction operators such as [,:]. For example, to compute the means of the data sorted by SampID and TimeID:

call sort(m, {1 3}); /* sort by sampid and timeID */
sample = shape(m[,4], rep*time_length);  /* extract data only */
mean = sample[,:];
print mean;

2 REPLIES 2
SAS Super FREQ

## Re: Computing the Approximate Sample Distribution(ASD) and other Statistics by groups in SAS IML

> Is there a better to write this loop.

I don't know what "better" or "funny" means. I think the code would be clearer if you define variables that contains the rows:
rows = (r*(p-1) + 1) : r*p;
nextRows = rows + r;

> I have used an example from the Simulating data with SAS book and used it create my IDs. I just wonder if this is the right way to do it.

Run a small example and look at the output. Is it correct? The nested structure looks correct to me, but I'm not certain of the design of your simulation.

> How to create an ordered matrix that one can readily write into a SAS dataset?

m = colvec(sampid) || colvec(id) || colvec(timeid) || colvec(pa);
call sort(m, 1:3);

Although it's not clear you'll need to do this is you can do (4).

> This is the main question. I wanted to know how to get the ASD by groups (i.e by replicate samples, by time and by replicate samples and by time) from the matrix directly within SAS IML without having to do it using proc means and proc univariate.

Use the SHAPE function to reshape the 0/1 data into the appropriate dimensions, then use the subscript reduction operators such as [,:]. For example, to compute the means of the data sorted by SampID and TimeID:

call sort(m, {1 3}); /* sort by sampid and timeID */
sample = shape(m[,4], rep*time_length);  /* extract data only */
mean = sample[,:];
print mean;

Obsidian | Level 7

## Re: Computing the Approximate Sample Distribution(ASD) and other Statistics by groups in SAS IML

Thank you so much, Rick. It worked like a charm

From The DO Loop