BookmarkSubscribeRSS Feed
harrylui
Obsidian | Level 7

good day,

 

can i write a do loop for below script so that i can easily repeat the script itself?

 


data part1v4;
set part1v3;
format percent percent10.;

 

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

 

if column7^="" then last_work=column7;
else if column6^="" and last_work="" then last_work=column6;
else if column5^="" and last_work="" then last_work=column5;
else if column4^="" and last_work="" then last_work=column4;
else if column3^="" and last_work="" then last_work=column3;
else if column2^="" and last_work="" then last_work=column2;
else if column1^="" and last_work="" then last_work=column1;


if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;

clean_by_word2=lag(CLEANED_BY_WORD);


if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;

drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;

 run;

 

thanks in advance,

Harry

5 REPLIES 5
andreas_lds
Jade | Level 19

Yes, you can - just define two arrays and use two loops, have a look at DO Statement: Iterative for details on how to implement the loops.

Jagadishkatam
Amethyst | Level 16

you can try arrays as below

 

data part1v4;
set part1v3;
format percent percent10.;

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

array vars1(7)$ column1-column7;
array vars2(7)$ checking1-checking7;

do i = 1 to 7 ;
if vars1(i)^="" then last_work=vars1(i);

if vars2(i)="" then wrong_first_appear=i;
end;
clean_by_word2=lag(CLEANED_BY_WORD);

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD then modifed_name=clean_by_word2;

drop clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;
run;
Thanks,
Jag
harrylui
Obsidian | Level 7

hi

thanks for your help.

here is my new modified code.

can i repeat this data part1v4 step for three to four times by do looping

as you may notice, i will drop the new added array in this program last phrase and return this data into original form.

i am working on some data cleansing step so my idea is to repeatedly run 

part1v4 X4.

 

data part1v4;
set part1v4;

array column(7) $ 50;

do i=1 to countw(CLEANED_BY_WORD);
column(i) = scan(CLEANED_BY_WORD,i," ");
end;
drop i;
array lag_column(7) $ 50;
do i= 1 to 7;
lag_column(i)=lag(column(i));
end;
if lag_column1=column1 and column1 ^="" and lag_column1^="" then checking1 =1;
if lag_column2=column2 and column2 ^="" and lag_column2^="" then checking2 =1;
if lag_column3=column3 and column3 ^="" and lag_column3^="" then checking3 =1;
if lag_column4=column4 and column4 ^="" and lag_column4^="" then checking4 =1;
if lag_column5=column5 and column5 ^="" and lag_column5^="" then checking5 =1;
if lag_column6=column6 and column6 ^="" and lag_column6^="" then checking6 =1;
if lag_column7=column7 and column7 ^="" and lag_column7^="" then checking7 =1;
;
format percent percent10.;

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

if checking1="" then wrong_first_appear=1;
else if checking2="" then wrong_first_appear=2;
else if checking3="" then wrong_first_appear=3;
else if checking4="" then wrong_first_appear=4;
else if checking5="" then wrong_first_appear=5;
else if checking6="" then wrong_first_appear=6;
else if checking7="" then wrong_first_appear=7;

clean_by_word2=lag(CLEANED_BY_WORD);

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 and wrong_first_appear=count_word and CLEANED_BY_WORD not in ("A AND A GEMS")then modifed_name=clean_by_word2;
if modifed_name^="" then CLEANED_BY_WORD=modifed_name;

drop
column1 column2 column3 column4 column5 column6 column7
checking1 checking2 checking3 checking4 checking5 checking6 checking7
lag_column1 lag_column2 lag_column3 lag_column4 lag_column5 lag_column6 lag_column7
sum_checking count_word percent clean_by_word2 modifed_name wrong_first_appear last_work
;
run;

Jagadishkatam
Amethyst | Level 16
It is always a good practice to use a different dataset name in set statement and create a new dataset with a new name.
Thanks,
Jag
ballardw
Super User

A sum of elements divided by the number of elements is a MEAN.

So this code:

sum_checking=sum(checking1,checking2,checking3,checking4,checking5,checking6,checking7);
count_word=countw(CLEANED_BY_WORD);
percent=sum_checking/count_word;

Assuming that "cleaned_by_word" matches the number of checking values then perhaps could be replaced with

percent_sum = mean(checking1, checking2, checking3, checking4,checking5, checking6, checking7);

I am a bit leery of this code:

if percent=0.8 then modifed_name=clean_by_word2;
if percent=0.75 

because with 7 elements I would not expect to get 0.8 of 0.75 for any division by 7.  7* 0.8 = 5.6, 7*0.75 =5.25.

There are similar issues with most of the possible counts. I suspect that you maybe should be checking a range of values for percent but without any actual data and desired outcome this is a bit difficult to verify.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1294 views
  • 0 likes
  • 4 in conversation