BookmarkSubscribeRSS Feed
harrylui
Obsidian | Level 7

good day,

 

can i write a do loop for below script so that i can easily repeat the script itself?

 


data part1v4;
set part1v3;
format percent percent10.;

 

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

 

if column7^="" then last_work=column7;
else if column6^="" and last_work="" then last_work=column6;
else if column5^="" and last_work="" then last_work=column5;
else if column4^="" and last_work="" then last_work=column4;
else if column3^="" and last_work="" then last_work=column3;
else if column2^="" and last_work="" then last_work=column2;
else if column1^="" and last_work="" then last_work=column1;


if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;

clean_by_word2=lag(CLEANED_BY_WORD);


if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;

drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;

 run;

 

thanks in advance,

Harry

5 REPLIES 5
andreas_lds
Jade | Level 19

Yes, you can - just define two arrays and use two loops, have a look at DO Statement: Iterative for details on how to implement the loops.

Jagadishkatam
Amethyst | Level 16

you can try arrays as below

 

data part1v4;
set part1v3;
format percent percent10.;

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

array vars1(7)$ column1-column7;
array vars2(7)$ checking1-checking7;

do i = 1 to 7 ;
if vars1(i)^="" then last_work=vars1(i);

if vars2(i)="" then wrong_first_appear=i;
end;
clean_by_word2=lag(CLEANED_BY_WORD);

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;

drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;
run;
Thanks,
Jag
harrylui
Obsidian | Level 7

hi

thanks for your help.

here is my new modified code.

can i repeat this data part1v4 step for three to four times by do looping

as you may notice, i will drop the new added array in this program last phrase and return this data into original form.

i am working on some data cleansing step so my idea is to repeatedly run 

part1v4 X4.

 

data part1v4;
set part1v4;

array column(7) $ 50;

do i=1 to countw(CLEANED_BY_WORD);
column(i) = scan(CLEANED_BY_WORD,i," ");
end;
drop i;
array lag_column(7) $ 50;
do i= 1 to 7;
lag_column(i)=lag(column(i));
end;
if lag_column1=column1 and column1 ^="" and lag_column1^="" then checking1 =1;
if lag_column2=column2 and column2 ^="" and lag_column2^="" then checking2 =1;
if lag_column3=column3 and column3 ^="" and lag_column3^="" then checking3 =1;
if lag_column4=column4 and column4 ^="" and lag_column4^="" then checking4 =1;
if lag_column5=column5 and column5 ^="" and lag_column5^="" then checking5 =1;
if lag_column6=column6 and column6 ^="" and lag_column6^="" then checking6 =1;
if lag_column7=column7 and column7 ^="" and lag_column7^="" then checking7 =1;
;
format percent percent10.;

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;

clean_by_word2=lag(CLEANED_BY_WORD);

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD not in ("A AND A GEMS")then modifed_name=clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;

drop
column1 column2 column3 column4 column5 column6 column7
checking1 checking2 checking3 checking4 checking5 checking6 checking7
lag_column1 lag_column2 lag_column3 lag_column4 lag_column5 lag_column6 lag_column7
sum_checking count_word percent clean_by_word2 modifed_name wrong_first_appear last_work
;
run;

Jagadishkatam
Amethyst | Level 16
It is always a good practice to use a different dataset name in set statement and create a new dataset with a new name.
Thanks,
Jag
ballardw
Super User

A sum of elements divided by the number of elements is a MEAN.

So this code:

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

Assuming that "cleaned_by_word" matches the number of checking values then perhaps could be replaced with

percent_sum = mean(checking1, checking2, checking3, checking4,checking5, checking6, checking7);

I am a bit leery of this code:

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 

because with 7 elements I would not expect to get 0.8 of 0.75 for any division by 7.  7* 0.8 = 5.6, 7*0.75 =5.25.

There are similar issues with most of the possible counts. I suspect that you maybe should be checking a range of values for percent but without any actual data and desired outcome this is a bit difficult to verify.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 782 views
  • 0 likes
  • 4 in conversation