good day,
can i write a do loop for below script so that i can easily repeat the script itself?
data part1v4;
set part1v3;
format percent percent10.;
sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;
if column7^="" then last_work=column7;
else if column6^="" and last_work="" then last_work=column6;
else if column5^="" and last_work="" then last_work=column5;
else if column4^="" and last_work="" then last_work=column4;
else if column3^="" and last_work="" then last_work=column3;
else if column2^="" and last_work="" then last_work=column2;
else if column1^="" and last_work="" then last_work=column1;
if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;
clean_by_word2=lag(CLEANED_BY_WORD);
if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;
drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;
run;
thanks in advance,
Harry
Yes, you can - just define two arrays and use two loops, have a look at DO Statement: Iterative for details on how to implement the loops.
you can try arrays as below
data part1v4;
set part1v3;
format percent percent10.;
sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;
array vars1(7)$ column1-column7;
array vars2(7)$ checking1-checking7;
do i = 1 to 7 ;
if vars1(i)^="" then last_work=vars1(i);
if vars2(i)="" then wrong_first_appear=i;
end;
clean_by_word2=lag(CLEANED_BY_WORD);
if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;
drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;
run;
hi
thanks for your help.
here is my new modified code.
can i repeat this data part1v4 step for three to four times by do looping
as you may notice, i will drop the new added array in this program last phrase and return this data into original form.
i am working on some data cleansing step so my idea is to repeatedly run
part1v4 X4.
data part1v4;
set part1v4;
array column(7) $ 50;
do i=1 to countw(CLEANED_BY_WORD);
column(i) = scan(CLEANED_BY_WORD,i," ");
end;
drop i;
array lag_column(7) $ 50;
do i= 1 to 7;
lag_column(i)=lag(column(i));
end;
if lag_column1=column1 and column1 ^="" and lag_column1^="" then checking1 =1;
if lag_column2=column2 and column2 ^="" and lag_column2^="" then checking2 =1;
if lag_column3=column3 and column3 ^="" and lag_column3^="" then checking3 =1;
if lag_column4=column4 and column4 ^="" and lag_column4^="" then checking4 =1;
if lag_column5=column5 and column5 ^="" and lag_column5^="" then checking5 =1;
if lag_column6=column6 and column6 ^="" and lag_column6^="" then checking6 =1;
if lag_column7=column7 and column7 ^="" and lag_column7^="" then checking7 =1;
;
format percent percent10.;
sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;
if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;
clean_by_word2=lag(CLEANED_BY_WORD);
if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD not in ("A AND A GEMS")then modifed_name=clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;
drop
column1 column2 column3 column4 column5 column6 column7
checking1 checking2 checking3 checking4 checking5 checking6 checking7
lag_column1 lag_column2 lag_column3 lag_column4 lag_column5 lag_column6 lag_column7
sum_checking count_word percent clean_by_word2 modifed_name wrong_first_appear last_work
;
run;
A sum of elements divided by the number of elements is a MEAN.
So this code:
sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7); count_word=countw(CLEANED_BY_WORD); percent=sum_checking/count_word;
Assuming that "cleaned_by_word" matches the number of checking values then perhaps could be replaced with
percent_sum = mean(checking1, checking2, checking3, checking4,checking5, checking6, checking7);
I am a bit leery of this code:
if percent=0.8 then modifed_name=clean_by_word2; if percent=0.75
because with 7 elements I would not expect to get 0.8 of 0.75 for any division by 7. 7* 0.8 = 5.6, 7*0.75 =5.25.
There are similar issues with most of the possible counts. I suspect that you maybe should be checking a range of values for percent but without any actual data and desired outcome this is a bit difficult to verify.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.