good day,
can i write a do loop for below script so that i can easily repeat the script itself?
data part1v4;
set part1v3;
format percent percent10.;
sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;
if column7^="" then last_work=column7;
else if column6^="" and last_work="" then last_work=column6;
else if column5^="" and last_work="" then last_work=column5;
else if column4^="" and last_work="" then last_work=column4;
else if column3^="" and last_work="" then last_work=column3;
else if column2^="" and last_work="" then last_work=column2;
else if column1^="" and last_work="" then last_work=column1;
if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;
clean_by_word2=lag(CLEANED_BY_WORD);
if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;
drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;
run;
thanks in advance,
Harry
Yes, you can - just define two arrays and use two loops, have a look at DO Statement: Iterative for details on how to implement the loops.
you can try arrays as below
data part1v4;
set part1v3;
format percent percent10.;
sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;
array vars1(7)$ column1-column7;
array vars2(7)$ checking1-checking7;
do i = 1 to 7 ;
if vars1(i)^="" then last_work=vars1(i);
if vars2(i)="" then wrong_first_appear=i;
end;
clean_by_word2=lag(CLEANED_BY_WORD);
if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;
drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;
run;
hi
thanks for your help.
here is my new modified code.
can i repeat this data part1v4 step for three to four times by do looping
as you may notice, i will drop the new added array in this program last phrase and return this data into original form.
i am working on some data cleansing step so my idea is to repeatedly run
part1v4 X4.
data part1v4;
set part1v4;
array column(7) $ 50;
do i=1 to countw(CLEANED_BY_WORD);
column(i) = scan(CLEANED_BY_WORD,i," ");
end;
drop i;
array lag_column(7) $ 50;
do i= 1 to 7;
lag_column(i)=lag(column(i));
end;
if lag_column1=column1 and column1 ^="" and lag_column1^="" then checking1 =1;
if lag_column2=column2 and column2 ^="" and lag_column2^="" then checking2 =1;
if lag_column3=column3 and column3 ^="" and lag_column3^="" then checking3 =1;
if lag_column4=column4 and column4 ^="" and lag_column4^="" then checking4 =1;
if lag_column5=column5 and column5 ^="" and lag_column5^="" then checking5 =1;
if lag_column6=column6 and column6 ^="" and lag_column6^="" then checking6 =1;
if lag_column7=column7 and column7 ^="" and lag_column7^="" then checking7 =1;
;
format percent percent10.;
sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;
if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;
clean_by_word2=lag(CLEANED_BY_WORD);
if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD not in ("A AND A GEMS")then modifed_name=clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;
drop
column1 column2 column3 column4 column5 column6 column7
checking1 checking2 checking3 checking4 checking5 checking6 checking7
lag_column1 lag_column2 lag_column3 lag_column4 lag_column5 lag_column6 lag_column7
sum_checking count_word percent clean_by_word2 modifed_name wrong_first_appear last_work
;
run;
A sum of elements divided by the number of elements is a MEAN.
So this code:
sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7); count_word=countw(CLEANED_BY_WORD); percent=sum_checking/count_word;
Assuming that "cleaned_by_word" matches the number of checking values then perhaps could be replaced with
percent_sum = mean(checking1, checking2, checking3, checking4,checking5, checking6, checking7);
I am a bit leery of this code:
if percent=0.8 then modifed_name=clean_by_word2; if percent=0.75
because with 7 elements I would not expect to get 0.8 of 0.75 for any division by 7. 7* 0.8 = 5.6, 7*0.75 =5.25.
There are similar issues with most of the possible counts. I suspect that you maybe should be checking a range of values for percent but without any actual data and desired outcome this is a bit difficult to verify.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.