SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Do loop for multiple variables with different dimensions

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 93
Accepted Solution

Do loop for multiple variables with different dimensions

Hi everyone,

 

My question might be addressed somewhere here. I need to run "Do loop" for multiple variables with different dimensions. How can I do it within one data statement?

 

Specifically, I have the survey data with two measurments A and B. A includes items a1-a5, and B includes items B1-B3. If one respondent has less than half items per measurement are missing, the missing data is replaced by the mean values.

 

I started my code like this:

 

data want;set data have;

    do i=1 to vardim;

   if nmiss(of a1-a5)<3 then do;

     if a&i=. then a&i=mean(of a1-a5);

  end;

   if nmiss(of b1-b3)<2 then do;

    if b&i=.  then b&i=mean(of b1-b3);

  end;

run;

 

Could anyone help me correct it?

 

Thanks a lot!


Accepted Solutions
Solution
‎06-22-2016 05:36 PM
Super Contributor
Posts: 408

Re: Do loop for multiple variables with different dimensions

Seems like you're looking for the ARRAY statement. How about this (somewhat untested):

 

data want;set data have;

    array a_arr a1-a5;
    array b_arr b1-b3;

    do i=1 to dim(a_arr);
        if nmiss(of a1-a5)<3 then do;
            if a_arr[i]=. then a_arr[i]=mean(of a1-a5);
        end;
    end;

    do i=1 to dim(b_arr);
        if nmiss(of b1-b3)<2 then do;
            if b_arr[i]=.  then b_arr[i]=mean(of b1-b3);
        end;
    end;

run;

 

You may want to exhange the do loop with the if statement.

 

...
    if nmiss(of b1-b3)<2 then do;
        do i=1 to dim(b_arr);
            if b_arr[i]=.  then b_arr[i]=mean(of b1-b3);
        end;
    end;
...

 

Hope this helps,

- Jan.

View solution in original post


All Replies
Solution
‎06-22-2016 05:36 PM
Super Contributor
Posts: 408

Re: Do loop for multiple variables with different dimensions

Seems like you're looking for the ARRAY statement. How about this (somewhat untested):

 

data want;set data have;

    array a_arr a1-a5;
    array b_arr b1-b3;

    do i=1 to dim(a_arr);
        if nmiss(of a1-a5)<3 then do;
            if a_arr[i]=. then a_arr[i]=mean(of a1-a5);
        end;
    end;

    do i=1 to dim(b_arr);
        if nmiss(of b1-b3)<2 then do;
            if b_arr[i]=.  then b_arr[i]=mean(of b1-b3);
        end;
    end;

run;

 

You may want to exhange the do loop with the if statement.

 

...
    if nmiss(of b1-b3)<2 then do;
        do i=1 to dim(b_arr);
            if b_arr[i]=.  then b_arr[i]=mean(of b1-b3);
        end;
    end;
...

 

Hope this helps,

- Jan.

Super User
Posts: 5,079

Re: Do loop for multiple variables with different dimensions

When you want to process several variables in the same way, the usual tool for the job is an ARRAY.  Here is one way:

 

data want;

set have;

array a {5};

array b {3};

a_replacement = mean(of a1-a5);

b_replacement = mean(of b1-b3);

if nmiss(of a1-a5) < 3 then do _n_=1 to 5;

   if a{_n_} = . then a{_n_} = a_replacement;

end;

if nmiss(of b1-b3)=1 then do _n_=1 to 5;

   if b{_n_}=. then b{_n_} = b_replacement;

end;

drop a_replacement b_replacement;

run;

Respected Advisor
Posts: 3,777

Re: Do loop for multiple variables with different dimensions

[ Edited ]

For this operation you don't need to iterate over the array to fill the missing values.  You can use CALL STDIZE to poke missing values with the mean.  My example uses 2 output statements to show before and after values of array A.

 

data stdize;
   array a{5];
   input a[*];
   output;
   if nmiss(of a[*]) lt 3 then do;
      mean=mean(of a[*]);
      call stdize('none','missing=',mean,of a[*]);
      output;
      end;
   cards;
1 . 3 4 5
1 3 4 . .
10 . . . 20
;;;;
   run;
proc print;
   run;

Capture.PNG

Respected Advisor
Posts: 4,641

Re: Do loop for multiple variables with different dimensions


liziwu wrote:

 

Specifically, I have the survey data with two measurements A and B. A includes items a1-a5, and B includes items B1-B3. If one respondent has less than half items per measurement are missing, the missing data is replaced by the mean values.

 


That is a strange requirement. If you have few variable missing, just leave missing. But if you have a lot missing, use the little that is available to impute all missing values? 

 

I would resort to imputation when missing values are scarce.

 

 

 

 

 

PG
Frequent Contributor
Posts: 93

Re: Do loop for multiple variables with different dimensions

Thank, PG.

 

I thought about imputation, but my coworker team decided not to do something complicated.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 511 views
  • 6 likes
  • 5 in conversation