do loop coding

harrylui · Posted 04-23-2020 01:29 AM

good day,

can i write a do loop for below script so that i can easily repeat the script itself?

data part1v4;
set part1v3;
format percent percent10.;

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

if column7^="" then last_work=column7;
else if column6^="" and last_work="" then last_work=column6;
else if column5^="" and last_work="" then last_work=column5;
else if column4^="" and last_work="" then last_work=column4;
else if column3^="" and last_work="" then last_work=column3;
else if column2^="" and last_work="" then last_work=column2;
else if column1^="" and last_work="" then last_work=column1;

if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;

clean_by_word2=lag(CLEANED_BY_WORD);

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;

drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;

run;

thanks in advance,

Harry

andreas_lds · Posted 04-23-2020 01:43 AM

Yes, you can - just define two arrays and use two loops, have a look at DO Statement: Iterative for details on how to implement the loops.

Jagadishkatam · Posted 04-23-2020 03:04 AM

you can try arrays as below

data part1v4;
set part1v3;
format percent percent10.;

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

array vars1(7)$ column1-column7;
array vars2(7)$ checking1-checking7;

do i = 1 to 7 ;
if vars1(i)^="" then last_work=vars1(i);

if vars2(i)="" then wrong_first_appear=i;
end;
clean_by_word2=lag(CLEANED_BY_WORD);

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;

drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;
run;

Thanks,
Jag

harrylui · Posted 04-23-2020 04:32 AM

hi

thanks for your help.

here is my new modified code.

can i repeat this data part1v4 step for three to four times by do looping

as you may notice, i will drop the new added array in this program last phrase and return this data into original form.

i am working on some data cleansing step so my idea is to repeatedly run

part1v4 X4.

data part1v4;
set part1v4;

array column(7) $ 50;

do i=1 to countw(CLEANED_BY_WORD);
column(i) = scan(CLEANED_BY_WORD,i," ");
end;
drop i;
array lag_column(7) $ 50;
do i= 1 to 7;
lag_column(i)=lag(column(i));
end;
if lag_column1=column1 and column1 ^="" and lag_column1^="" then checking1 =1;
if lag_column2=column2 and column2 ^="" and lag_column2^="" then checking2 =1;
if lag_column3=column3 and column3 ^="" and lag_column3^="" then checking3 =1;
if lag_column4=column4 and column4 ^="" and lag_column4^="" then checking4 =1;
if lag_column5=column5 and column5 ^="" and lag_column5^="" then checking5 =1;
if lag_column6=column6 and column6 ^="" and lag_column6^="" then checking6 =1;
if lag_column7=column7 and column7 ^="" and lag_column7^="" then checking7 =1;
;
format percent percent10.;

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;

clean_by_word2=lag(CLEANED_BY_WORD);

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD not in ("A AND A GEMS")then modifed_name=clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;

drop
column1 column2 column3 column4 column5 column6 column7
checking1 checking2 checking3 checking4 checking5 checking6 checking7
lag_column1 lag_column2 lag_column3 lag_column4 lag_column5 lag_column6 lag_column7
sum_checking count_word percent clean_by_word2 modifed_name wrong_first_appear last_work
;
run;

Jagadishkatam · Posted 04-23-2020 04:52 AM

It is always a good practice to use a different dataset name in set statement and create a new dataset with a new name.

Thanks,
Jag

ballardw · Posted 04-23-2020 10:23 AM

A sum of elements divided by the number of elements is a MEAN.

So this code:

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

Assuming that "cleaned_by_word" matches the number of checking values then perhaps could be replaced with

percent_sum = mean(checking1, checking2, checking3, checking4,checking5, checking6, checking7);

I am a bit leery of this code:

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75

because with 7 elements I would not expect to get 0.8 of 0.75 for any division by 7. 7* 0.8 = 5.6, 7*0.75 =5.25.

There are similar issues with most of the possible counts. I suspect that you maybe should be checking a range of values for percent but without any actual data and desired outcome this is a bit difficult to verify.

do loop coding

Re: do loop coding

Re: do loop coding

Re: do loop coding

Re: do loop coding

Re: do loop coding

SAS Innovate 2026 Registration is Open

do loop coding

Re: do loop coding

Re: do loop coding

Re: do loop coding

Re: do loop coding

Re: do loop coding

SAS Innovate 2026 Registration is Open

SAS Training: Just a Click Away