## Do loop for multiple variables with different dimensions

# Do loop for multiple variables with different dimensions

Hi everyone,

My question might be addressed somewhere here. I need to run "Do loop" for multiple variables with different dimensions. How can I do it within one data statement?

Specifically, I have the survey data with two measurments A and B. A includes items a1-a5, and B includes items B1-B3. If one respondent has less than half items per measurement are missing, the missing data is replaced by the mean values.

I started my code like this:

data want;set data have;

do i=1 to vardim;

if nmiss(of a1-a5)<3 then do;

if a&i=. then a&i=mean(of a1-a5);

end;

if nmiss(of b1-b3)<2 then do;

if b&i=.  then b&i=mean(of b1-b3);

end;

run;

Could anyone help me correct it?

Thanks a lot!

## Re: Do loop for multiple variables with different dimensions

``````data want;set data have;

array a_arr a1-a5;
array b_arr b1-b3;

do i=1 to dim(a_arr);
if nmiss(of a1-a5)<3 then do;
if a_arr[i]=. then a_arr[i]=mean(of a1-a5);
end;
end;

do i=1 to dim(b_arr);
if nmiss(of b1-b3)<2 then do;
if b_arr[i]=.  then b_arr[i]=mean(of b1-b3);
end;
end;

run;
``````

You may want to exhange the do loop with the if statement.

``````...
if nmiss(of b1-b3)<2 then do;
do i=1 to dim(b_arr);
if b_arr[i]=.  then b_arr[i]=mean(of b1-b3);
end;
end;
...``````

Hope this helps,

- Jan.

## Re: Do loop for multiple variables with different dimensions

When you want to process several variables in the same way, the usual tool for the job is an ARRAY.  Here is one way:

data want;

set have;

array a {5};

array b {3};

a_replacement = mean(of a1-a5);

b_replacement = mean(of b1-b3);

if nmiss(of a1-a5) < 3 then do _n_=1 to 5;

if a{_n_} = . then a{_n_} = a_replacement;

end;

if nmiss(of b1-b3)=1 then do _n_=1 to 5;

if b{_n_}=. then b{_n_} = b_replacement;

end;

drop a_replacement b_replacement;

run;

## Re: Do loop for multiple variables with different dimensions

For this operation you don't need to iterate over the array to fill the missing values.  You can use CALL STDIZE to poke missing values with the mean.  My example uses 2 output statements to show before and after values of array A.

``````data stdize;
array a{5];
input a[*];
output;
if nmiss(of a[*]) lt 3 then do;
mean=mean(of a[*]);
call stdize('none','missing=',mean,of a[*]);
output;
end;
cards;
1 . 3 4 5
1 3 4 . .
10 . . . 20
;;;;
run;
proc print;
run;``````

## Re: Do loop for multiple variables with different dimensions

liziwu wrote:

Specifically, I have the survey data with two measurements A and B. A includes items a1-a5, and B includes items B1-B3. If one respondent has less than half items per measurement are missing, the missing data is replaced by the mean values.

That is a strange requirement. If you have few variable missing, just leave missing. But if you have a lot missing, use the little that is available to impute all missing values?

I would resort to imputation when missing values are scarce.

PG
## Re: Do loop for multiple variables with different dimensions

Thank, PG.

I thought about imputation, but my coworker team decided not to do something complicated.

