Solved: Iteration counting confusion in looing up value

Phil_NZ · Posted 01-24-2021 03:21 PM

Hi SAS Users,

After a while of discussion, I created and adjusted two versions of deleting some observations along with tracking the order of deleted observations. Both give me the right results, but the values of iteration "i" in these two codes are different which confused me quite a bit.

Code 1:

data screen123489116    (drop=row)
     deleted (keep=row)
;
   set screen12348911;
   row=_n_;
   array delwords {84} $ 15 _temporary_ ('DUPLICATE', 'DUPL', 'DUP', 'DUPE','DULP', 'DUPLI',
'1000DUPL','XSQ','XET','ADR','GDR','PREFERRED','PF','PFD','PREF','PF','WARRANT','WARRANTS','WTS','WTS2','WARRT',
'DEB','DB','DCB','DEBT','DEBENTURES','DEBENTURE','RLST IT','INVESTMENT TRUST','INV TST','UNIT TRUST','UNT TST',
'TRUST UNITS','TST UNITS','TRUST UNIT','TST UNIT','UT','IT.','IT','500','BOND','DEFER','DEP','DEPY','ELKS','ETF','FUND','FD','IDX','INDEX','LP','MIPS','MITS','MITT','MPS','NIKKEI','NOTE',
'PERQS','PINES','PRTF','PTNS','PTSHP','QUIBS','QUIDS','RATE','RCPTS',
'RECEIPTS','REIT','RETUR','SCORE','SPDR','STRYPES','TOPRS','UNIT','UNT',
'UTS','WTS','XXXXX','YIELD','YLD','EXPIRED','EXPD','EXPIRY','EXPY');

do i= 1 to dim(delwords);
 if findw(ENAME,delwords[i],'','eir') >0 then do; /* e:return the order of word in the string     i: ignore case letter     r: removes traling or leading delimiters*/
      output deleted;
      delete;
      Leave;
   end;
   else output screen123489116;
end;
run;

So, with this code, the iteration number "I" is incrementally repeated by observation. As the picture below "i" increase from 1 to 84 from observation 1st and 84th, then i has value 1 at the observation 85th, so on and so forth

Code 2:

data screen123489116    (drop=row)
     deleted (keep=row)
;
   set screen12348911;
   row=_n_;
   array delwords {84} $ 15 _temporary_ ('DUPLICATE', 'DUPL', 'DUP', 'DUPE','DULP', 'DUPLI',
'1000DUPL','XSQ','XET','ADR','GDR','PREFERRED','PF','PFD','PREF','PF','WARRANT','WARRANTS','WTS','WTS2','WARRT',
'DEB','DB','DCB','DEBT','DEBENTURES','DEBENTURE','RLST IT','INVESTMENT TRUST','INV TST','UNIT TRUST','UNT TST',
'TRUST UNITS','TST UNITS','TRUST UNIT','TST UNIT','UT','IT.','IT','500','BOND','DEFER','DEP','DEPY','ELKS','ETF','FUND','FD','IDX','INDEX','LP','MIPS','MITS','MITT','MPS','NIKKEI','NOTE',
'PERQS','PINES','PRTF','PTNS','PTSHP','QUIBS','QUIDS','RATE','RCPTS',
'RECEIPTS','REIT','RETUR','SCORE','SPDR','STRYPES','TOPRS','UNIT','UNT',
'UTS','WTS','XXXXX','YIELD','YLD','EXPIRED','EXPD','EXPIRY','EXPY');

do i= 1 to dim(delwords)until (f>0);
 f=findw(ENAME,delwords[i],'','eir');
end;
if f>0 then do;
      output deleted;
   end;
   else output screen123489116;
run;

with this code, it looks like that "i" run through all words in the array while the value of "i" is constant through all observation in variable ENAME of the dataset screen12348911.

I am wondering why there is such a difference and is there anything wrong with these two sets of codes?

Many thanks and warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Kurt_Bremser · Posted 01-24-2021 05:12 PM

In the first code, since no matches are found, and the OUTPUT is in the loop, 84 observations are output for every input observation.

In the second code, the OUTPUT is outside of the loop, so one output observation per input observation.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Shmuel · Posted 01-24-2021 03:48 PM

In code 2 the condition if f>0 deals with the last value only and not with each iteration.

First part is a loop but do nothing but assigning value to f. The last value is used in next part:

if f>0 then do;
      output deleted;
   end;
   else output screen123489116;
run;

If you want to execute above as part of the loop then the code should be:

do i= 1 to dim(delwords)until (f>0);
	f=findw(ENAME,delwords[i],'','eir');
	if f>0 then output deleted;  
	else output screen123489116;
end;
run;

Phil_NZ · Posted 01-24-2021 04:51 PM

Hi @Shmuel !

I really appreciate it and thank you for your answer!

It is really interesting to me now because at first, I think the first code is the one that has a problem rather than the second one. Why did I say so?

Because I think the first code, there is no reason that I increase by 1 after each observation like that, because the first 22 observation belongs to a company with the same ENAME, so "I" in these observations should be the same.

Apart from that, can you please explain to me what do you mean regarding "the last value"? the last one in the whole array or the last one that makes f >0 ? And do you think my second set of code is wrong? What I understand from the second code is that: for each observation of dataset screen123489116, SAS will find the "first word" where the word in the array exists in "ENAME" of this observation

do i= 1 to dim(delwords)until (f>0);
 f=findw(ENAME,delwords[i],'','eir');
end;

then SAS will delete (or write to output deleted) if f>0 in this observation, if not, SAS will write this observation to output "screen123489116", whether I explain it properly?

if f>0 then do;
      output deleted;
   end;
   else output screen123489116;

So, this leads to a question, so what is the main difference between your code and my second code in my situation?

do i= 1 to dim(delwords)until (f>0);
	f=findw(ENAME,delwords[i],'','eir');
	if f>0 then output deleted;  
	else output screen123489116;
end;
run;

Many thanks and warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Kurt_Bremser · Posted 01-24-2021 05:12 PM

In the first code, since no matches are found, and the OUTPUT is in the loop, 84 observations are output for every input observation.

In the second code, the OUTPUT is outside of the loop, so one output observation per input observation.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Phil_NZ · Posted 01-24-2021 09:46 PM

Hi @Kurt_Bremser !

Thank you for a clear answer. So, do you mean the first code is wrong?

Many thanks!

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Phil_NZ · Posted 01-24-2021 09:55 PM

Hi @Kurt_Bremser ,

So you mean, regarding the first code, it will check if "DUPLICATE" in the ENAME of the first observation, it will check if "DUPL" in the second observation and the list goes on? Am I explaining what you mean and what the code said properly? If so, it seems that the first code is wrong ...

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Kurt_Bremser · Posted 01-26-2021 04:09 AM

No. It will check ALL the words for the first observation, writing an output observation for every iteration of the DO loop, then advance to the second observation, check ALL words and write observations, and so on. You'll have 84 observations output for every input observation.

Proper code formatting (consistent indentation) will help greatly in determining this:

do i = 1 to dim(delwords);
  if findw(ENAME,delwords[i],'','eir') > 0
  then do;
    output deleted;
    delete;
    leave;
  end;
  else output screen123489116;
end;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX