Hi everyone,
My question might be addressed somewhere here. I need to run "Do loop" for multiple variables with different dimensions. How can I do it within one data statement?
Specifically, I have the survey data with two measurments A and B. A includes items a1-a5, and B includes items B1-B3. If one respondent has less than half items per measurement are missing, the missing data is replaced by the mean values.
I started my code like this:
data want;set data have;
do i=1 to vardim;
if nmiss(of a1-a5)<3 then do;
if a&i=. then a&i=mean(of a1-a5);
end;
if nmiss(of b1-b3)<2 then do;
if b&i=. then b&i=mean(of b1-b3);
end;
run;
Could anyone help me correct it?
Thanks a lot!
Seems like you're looking for the ARRAY statement. How about this (somewhat untested):
data want;set data have;
array a_arr a1-a5;
array b_arr b1-b3;
do i=1 to dim(a_arr);
if nmiss(of a1-a5)<3 then do;
if a_arr[i]=. then a_arr[i]=mean(of a1-a5);
end;
end;
do i=1 to dim(b_arr);
if nmiss(of b1-b3)<2 then do;
if b_arr[i]=. then b_arr[i]=mean(of b1-b3);
end;
end;
run;
You may want to exhange the do loop with the if statement.
...
if nmiss(of b1-b3)<2 then do;
do i=1 to dim(b_arr);
if b_arr[i]=. then b_arr[i]=mean(of b1-b3);
end;
end;
...
Hope this helps,
- Jan.
Seems like you're looking for the ARRAY statement. How about this (somewhat untested):
data want;set data have;
array a_arr a1-a5;
array b_arr b1-b3;
do i=1 to dim(a_arr);
if nmiss(of a1-a5)<3 then do;
if a_arr[i]=. then a_arr[i]=mean(of a1-a5);
end;
end;
do i=1 to dim(b_arr);
if nmiss(of b1-b3)<2 then do;
if b_arr[i]=. then b_arr[i]=mean(of b1-b3);
end;
end;
run;
You may want to exhange the do loop with the if statement.
...
if nmiss(of b1-b3)<2 then do;
do i=1 to dim(b_arr);
if b_arr[i]=. then b_arr[i]=mean(of b1-b3);
end;
end;
...
Hope this helps,
- Jan.
When you want to process several variables in the same way, the usual tool for the job is an ARRAY. Here is one way:
data want;
set have;
array a {5};
array b {3};
a_replacement = mean(of a1-a5);
b_replacement = mean(of b1-b3);
if nmiss(of a1-a5) < 3 then do _n_=1 to 5;
if a{_n_} = . then a{_n_} = a_replacement;
end;
if nmiss(of b1-b3)=1 then do _n_=1 to 5;
if b{_n_}=. then b{_n_} = b_replacement;
end;
drop a_replacement b_replacement;
run;
For this operation you don't need to iterate over the array to fill the missing values. You can use CALL STDIZE to poke missing values with the mean. My example uses 2 output statements to show before and after values of array A.
data stdize;
array a{5];
input a[*];
output;
if nmiss(of a[*]) lt 3 then do;
mean=mean(of a[*]);
call stdize('none','missing=',mean,of a[*]);
output;
end;
cards;
1 . 3 4 5
1 3 4 . .
10 . . . 20
;;;;
run;
proc print;
run;
@lizzy28 wrote:
Specifically, I have the survey data with two measurements A and B. A includes items a1-a5, and B includes items B1-B3. If one respondent has less than half items per measurement are missing, the missing data is replaced by the mean values.
That is a strange requirement. If you have few variable missing, just leave missing. But if you have a lot missing, use the little that is available to impute all missing values?
I would resort to imputation when missing values are scarce.
Thank, PG.
I thought about imputation, but my coworker team decided not to do something complicated.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.