BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi SAS Users,

 

Recently I built a code to filter my data, and I also use an array to delete some observations containing some specific words. The code is as below

OPTIONS MPRINT;
data arg_merge2;
  set 'C:\Users\pnguyen\Desktop\arg_merge2';

run;

data screen12348911;
   set arg_merge2;
   if TYPE_1 = "EQ" 
   /*SCREEN 1*/
   and INDC3 not in ('UTILS', 'BANKS', 'FINSV', 'RLEST', 'INSUR')
   /*SCREEN 2*/
   and MAJOR="Y" 
   /*SCREEN 3*/
   and substr(GEOGN,1,3)= "ARG"
   /*SCREEN 4*/
   /*PUT 
   substr(GEOGN,1,3)=short_fn
   IN MACRO*/
   and 
   s3 ge 0
   /*SCREEN 8*/
   and
   s21 ge 0 and s22 ge 0
   /*SCREEN 9*/
   and
   s2 ge 0
   /*SCREEN 11*/

;
run;

data screen123489116;
/*SCREEN 6*/
set screen12348911;
 array delwords {84} $ 15 _temporary_ ('DUPLICATE', 'DUPL', 'DUP', 'DUPE','DULP', 'DUPLI',
'1000DUPL','XSQ','XET','ADR','GDR','PREFERRED','PF','PFD','PREF',"'PF'",'WARRANT','WARRANTS',
'WTS','WTS2','WARRT','DEB','DB','DCB','DEBT','DEBENTURES','DEBENTURE','RLST IT','INVESTMENT TRUST',
'INV TST','UNIT TRUST','UNT TST','TRUST UNITS','TST UNITS','TRUST UNIT','TST UNIT','UT','IT.',
'IT','500','BOND','DEFER','DEP','DEPY','ELKS','ETF','FUND','FD','IDX','INDEX','LP','MIPS','MITS',
'MITT','MPS','NIKKEI','NOTE','PERQS','PINES','PRTF','PTNS','PTSHP','QUIBS','QUIDS','RATE','RCPTS',
'RECEIPTS','REIT','RETUR','SCORE','SPDR','STRYPES','TOPRS','UNIT','UNT',
'UTS','WTS','XXXXX','YIELD','YLD','EXPIRED','EXPD','EXPIRY','EXPY');
do i= 1 to dim(delwords);
 if findw(ENAME,delwords[i],'','eir') >0 then delete;
end;
run;

I want to highlight here is screen 6, so there are 84 elements in array "del words" that I want to delete. However, the result "screen123489116" is as below:

 
 

There are two things here that surprised me:

1. Why "I" automatically pop up in my result?

2. And if it pops up like that, it should be 84 rather than 85?

3. And whether "ge" in screen 11 equals to "greater than or equal to" ? Can I use ">=" instead of "ge" as the definition of the operator as the link following?

(https://v8doc.sas.com/sashtml/lgref/z0208245.htm#:~:text=The%20result%20of%20a%20comparison,versions.... )

 

Many thanks and warmest regards.

 

 

 

 

 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@Phil_NZ wrote:

Hi SAS Users,

There are two things here that surprised me:

1. Why "I" automatically pop up in my result?

2. And if it pops up like that, it should be 84 rather than 85?

3. And whether "ge" in screen 11 equals to "greater than or equal to" ? Can I use ">=" instead of "ge" as the definition of the operator as the link following?


  1. "i" is a new variable that you created as the iterator value in the do loop.  To SAS, it's just another variable you created, to be automatically kept in the output.  Use a "(drop=i)" parameter in your DATA statement to get rid of it.

  2. YOU have 
    do i= 1 to dim(delwords);
     if findw(ENAME,delwords[i],'','eir') >0 then delete;
    end;​

    Your DO loop increments i by 1 at the end of each iteration.  The last complete iteration in your case is 84.  The DO loop increments it to 85.  The loop returns to the top as normal, but then finds that i is out of range, and the loop stops.

    Run this code:

    data _null_;
      do i=1 to 84;              end;  put i=;
      do j=1 to 84 until (j=84); end;  put j=;
    run;

    Both of the loop have 84 complete iterations.  But j never gets to 85, while i does.  That's because the "until" condition in the second loop is evaluated at the end of the loop, before incrementing the loop iterator.  When the until condition is met, no further incrementing occurs.

    Edited additional comment: I guess you could

    do i=1 by 1 until (i=dim(delwords));

    but there's really no point.

  3. This is a question best answered by giving it a try.  

 

And speaking of the UNTIL clause, I should have added that you can make your loop much more efficient.  There is no need to go through the 2nd through 84th iterations, if the 1st iteration finds a delete-word.  You want to escape the do loop the first time you find a delete-word.

  do i= 1 to dim(delwords) until (f>0);
   f=findw(ENAME,delwords[i],'','eir');
  end;
  if f>0 then delete;

If all 84 iterations fail to find a delete-word, then f is a zero (84 times, but the only time you test it is the last). 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
mkeintz
PROC Star

@Phil_NZ wrote:

Hi SAS Users,

There are two things here that surprised me:

1. Why "I" automatically pop up in my result?

2. And if it pops up like that, it should be 84 rather than 85?

3. And whether "ge" in screen 11 equals to "greater than or equal to" ? Can I use ">=" instead of "ge" as the definition of the operator as the link following?


  1. "i" is a new variable that you created as the iterator value in the do loop.  To SAS, it's just another variable you created, to be automatically kept in the output.  Use a "(drop=i)" parameter in your DATA statement to get rid of it.

  2. YOU have 
    do i= 1 to dim(delwords);
     if findw(ENAME,delwords[i],'','eir') >0 then delete;
    end;​

    Your DO loop increments i by 1 at the end of each iteration.  The last complete iteration in your case is 84.  The DO loop increments it to 85.  The loop returns to the top as normal, but then finds that i is out of range, and the loop stops.

    Run this code:

    data _null_;
      do i=1 to 84;              end;  put i=;
      do j=1 to 84 until (j=84); end;  put j=;
    run;

    Both of the loop have 84 complete iterations.  But j never gets to 85, while i does.  That's because the "until" condition in the second loop is evaluated at the end of the loop, before incrementing the loop iterator.  When the until condition is met, no further incrementing occurs.

    Edited additional comment: I guess you could

    do i=1 by 1 until (i=dim(delwords));

    but there's really no point.

  3. This is a question best answered by giving it a try.  

 

And speaking of the UNTIL clause, I should have added that you can make your loop much more efficient.  There is no need to go through the 2nd through 84th iterations, if the 1st iteration finds a delete-word.  You want to escape the do loop the first time you find a delete-word.

  do i= 1 to dim(delwords) until (f>0);
   f=findw(ENAME,delwords[i],'','eir');
  end;
  if f>0 then delete;

If all 84 iterations fail to find a delete-word, then f is a zero (84 times, but the only time you test it is the last). 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Phil_NZ
Barite | Level 11

Hi @mkeintz !

Thank you very much for your straightforward answer. I benefit a lot from reading your suggestion. However, I have two points that I am confused about:

 

1. Regarding the code

do i=1 by 1 until (i=dim(delwords));

whether we can also write

do i=1 to dim(delwords) by 1 until (i=dim(delwords));

because I read examples  from other material is that

do i=... to... by...

I know it is a very novice question but I am quite prudent regarding code.

 

2. Regarding the extra part of UNTIL clause, do you mean that while we delete one observation because the ENAME of this observation contains a specific word in the array, we do not need to go through all specific words after this word for this observation because this observation was deleted already? If it is the case, I totally agree with you, and thank you very much for the code enhancement.

 
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Phil_NZ
Barite | Level 11
Regarding the second point, I mean, if what you intend to do is similar to what I understand, so there is no difference between your code and my code previously. Because previously, if a specific word in the array was found in an observation of ENAME, then we delete this observation immediately, then I think the program will not continue with the rest of the list of words for this deleted observation?
Please correct me if I understand you improperly!
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
mkeintz
PROC Star

@Phil_NZ wrote:
Regarding the second point, I mean, if what you intend to do is similar to what I understand, so there is no difference between your code and my code previously. Because previously, if a specific word in the array was found in an observation of ENAME, then we delete this observation immediately, then I think the program will not continue with the rest of the list of words for this deleted observation?
Please correct me if I understand you improperly!

You are correct.  The loop will stop at the DELETE statement.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

1. ge is short for: greater equal

    gt is short for greater than

2. You can write your test without a loop, like this:

if prxmatch("/\b('DUPLICATE|DUPL|DUP|DUPE|DULP|DUPLI|1000DUPL|XSQ|XET|ADR|GDR|PREFERRED|PFD?|PREF|PFD?|'PF'|WARRANTS?|WTS2|WARRT|DEB|DB|DCB|DEBT|DEBENTURES?|RLST IT|INVESTMENT TRUST|INV TST|UNI?T|UTS|UNIT TRUST|UNT TST|T(RU)?ST UNITS?|UT|IT\.|IT|500|BOND|DEFER|DEP|DEPY|ELKS|ETF|FUND|FD|IDX|INDEX|LP|MIPS|MITS|MITT|MPS|NIKKEI|NOTE|PERQS|PINES|PRTF|PTNS|PTSHP|QUIBS|QUIDS|RATE|RCPTS|RECEIPTS|REIT|RETUR|SCORE|SPDR|STRYPES|TOPRS|WTS|XXXXX|YIELD|YLD|EXPIRED|EXPD|EXPIRY|EXPY)\b/i",ENAME);

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 558 views
  • 3 likes
  • 3 in conversation