BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi SAS Users,

 

After a while of discussion, I created and adjusted two versions of deleting some observations along with tracking the order of deleted observations. Both give me the right results, but the values of iteration "i" in these two codes are different which confused me quite a bit.

Code 1:

data screen123489116    (drop=row)
     deleted (keep=row)
;
   set screen12348911;
   row=_n_;
   array delwords {84} $ 15 _temporary_ ('DUPLICATE', 'DUPL', 'DUP', 'DUPE','DULP', 'DUPLI',
'1000DUPL','XSQ','XET','ADR','GDR','PREFERRED','PF','PFD','PREF','PF','WARRANT','WARRANTS','WTS','WTS2','WARRT',
'DEB','DB','DCB','DEBT','DEBENTURES','DEBENTURE','RLST IT','INVESTMENT TRUST','INV TST','UNIT TRUST','UNT TST',
'TRUST UNITS','TST UNITS','TRUST UNIT','TST UNIT','UT','IT.','IT','500','BOND','DEFER','DEP','DEPY','ELKS','ETF','FUND','FD','IDX','INDEX','LP','MIPS','MITS','MITT','MPS','NIKKEI','NOTE',
'PERQS','PINES','PRTF','PTNS','PTSHP','QUIBS','QUIDS','RATE','RCPTS',
'RECEIPTS','REIT','RETUR','SCORE','SPDR','STRYPES','TOPRS','UNIT','UNT',
'UTS','WTS','XXXXX','YIELD','YLD','EXPIRED','EXPD','EXPIRY','EXPY');

do i= 1 to dim(delwords);
 if findw(ENAME,delwords[i],'','eir') >0 then do; /* e:return the order of word in the string     i: ignore case letter     r: removes traling or leading delimiters*/
      output deleted;
      delete;
      Leave;
   end;
   else output screen123489116;
end;
run;

So, with this code, the iteration number "I" is incrementally repeated by observation. As the picture below "i" increase from 1 to 84 from observation 1st and 84th, then i has value 1 at the observation 85th, so on and so forth

My97_0-1611519347465.png

 

Code 2:

data screen123489116    (drop=row)
     deleted (keep=row)
;
   set screen12348911;
   row=_n_;
   array delwords {84} $ 15 _temporary_ ('DUPLICATE', 'DUPL', 'DUP', 'DUPE','DULP', 'DUPLI',
'1000DUPL','XSQ','XET','ADR','GDR','PREFERRED','PF','PFD','PREF','PF','WARRANT','WARRANTS','WTS','WTS2','WARRT',
'DEB','DB','DCB','DEBT','DEBENTURES','DEBENTURE','RLST IT','INVESTMENT TRUST','INV TST','UNIT TRUST','UNT TST',
'TRUST UNITS','TST UNITS','TRUST UNIT','TST UNIT','UT','IT.','IT','500','BOND','DEFER','DEP','DEPY','ELKS','ETF','FUND','FD','IDX','INDEX','LP','MIPS','MITS','MITT','MPS','NIKKEI','NOTE',
'PERQS','PINES','PRTF','PTNS','PTSHP','QUIBS','QUIDS','RATE','RCPTS',
'RECEIPTS','REIT','RETUR','SCORE','SPDR','STRYPES','TOPRS','UNIT','UNT',
'UTS','WTS','XXXXX','YIELD','YLD','EXPIRED','EXPD','EXPIRY','EXPY');

do i= 1 to dim(delwords)until (f>0);
 f=findw(ENAME,delwords[i],'','eir');
end;
if f>0 then do;
      output deleted;
   end;
   else output screen123489116;
run;

My97_1-1611519699918.png

 

with this code, it looks like that "i" run through all words in the array while the value of "i" is constant through all observation in variable ENAME of the dataset screen12348911.

 

I am wondering why there is such a difference and is there anything wrong with these two sets of codes?

 

Many thanks and warm regards.

 

 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

In the first code, since no matches are found, and the OUTPUT is in the loop, 84 observations are output for every input observation.

In the second code, the OUTPUT is outside of the loop, so one output observation per input observation.

View solution in original post

6 REPLIES 6
Shmuel
Garnet | Level 18

In code 2 the condition if f>0 deals with the last value only and not with each iteration.

First part is a loop but do nothing but assigning value to f. The last value is used in next part:

if f>0 then do;
      output deleted;
   end;
   else output screen123489116;
run;

If you want to execute above as part of the loop then the code should be:

do i= 1 to dim(delwords)until (f>0);
	f=findw(ENAME,delwords[i],'','eir');
	if f>0 then output deleted;  
	else output screen123489116;
end;
run;
Phil_NZ
Barite | Level 11

Hi @Shmuel !

 

I really appreciate it and thank you for your answer!

 

It is really interesting to me now because at first, I think the first code is the one that has a problem rather than the second one. Why did I say so?

 

Because I think the first code, there is no reason that I increase by 1 after each observation like that, because the first 22 observation belongs to a company with the same ENAME, so "I" in these observations should be the same.

 

Apart from that,  can you please explain to me what do you mean regarding "the last value"? the last one in the whole array or the last one that makes f >0 ? And do you think my second set of code is wrong? What I understand from the second code is that: for each observation of dataset screen123489116, SAS will find the "first word" where the word in the array exists in "ENAME" of this observation

do i= 1 to dim(delwords)until (f>0);
 f=findw(ENAME,delwords[i],'','eir');
end;

then SAS will delete (or write to output deleted) if f>0 in this observation, if not, SAS will write this observation to output "screen123489116", whether I explain it properly?

if f>0 then do;
      output deleted;
   end;
   else output screen123489116;

So, this leads to a question, so what is the main difference between your code and my second code in my situation?

do i= 1 to dim(delwords)until (f>0);
	f=findw(ENAME,delwords[i],'','eir');
	if f>0 then output deleted;  
	else output screen123489116;
end;
run;

Many thanks and warmest regards.

 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Kurt_Bremser
Super User

In the first code, since no matches are found, and the OUTPUT is in the loop, 84 observations are output for every input observation.

In the second code, the OUTPUT is outside of the loop, so one output observation per input observation.

Phil_NZ
Barite | Level 11

Hi @Kurt_Bremser !

Thank you for a clear answer. So, do you mean the first code is wrong?

 

Many thanks!

 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Phil_NZ
Barite | Level 11

Hi @Kurt_Bremser ,

 

So you mean, regarding the first code, it will check if "DUPLICATE" in the ENAME of the first observation, it will check if "DUPL" in the second observation and the list goes on? Am I explaining what you mean and what the code said properly? If so, it seems that the first code is wrong ...

 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Kurt_Bremser
Super User

No. It will check ALL the words for the first observation, writing an output observation for every iteration of the DO loop, then advance to the second observation, check ALL words and write observations, and so on. You'll have 84 observations output for every input observation.

 

Proper code formatting (consistent indentation) will help greatly in determining this:

do i = 1 to dim(delwords);
  if findw(ENAME,delwords[i],'','eir') > 0
  then do;
    output deleted;
    delete;
    leave;
  end;
  else output screen123489116;
end;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 650 views
  • 2 likes
  • 3 in conversation