Hi SAS Users,
Recently I built a code to filter my data, and I also use an array to delete some observations containing some specific words. The code is as below
OPTIONS MPRINT;
data arg_merge2;
set 'C:\Users\pnguyen\Desktop\arg_merge2';
run;
data screen12348911;
set arg_merge2;
if TYPE_1 = "EQ"
/*SCREEN 1*/
and INDC3 not in ('UTILS', 'BANKS', 'FINSV', 'RLEST', 'INSUR')
/*SCREEN 2*/
and MAJOR="Y"
/*SCREEN 3*/
and substr(GEOGN,1,3)= "ARG"
/*SCREEN 4*/
/*PUT
substr(GEOGN,1,3)=short_fn
IN MACRO*/
and
s3 ge 0
/*SCREEN 8*/
and
s21 ge 0 and s22 ge 0
/*SCREEN 9*/
and
s2 ge 0
/*SCREEN 11*/
;
run;
data screen123489116;
/*SCREEN 6*/
set screen12348911;
array delwords {84} $ 15 _temporary_ ('DUPLICATE', 'DUPL', 'DUP', 'DUPE','DULP', 'DUPLI',
'1000DUPL','XSQ','XET','ADR','GDR','PREFERRED','PF','PFD','PREF',"'PF'",'WARRANT','WARRANTS',
'WTS','WTS2','WARRT','DEB','DB','DCB','DEBT','DEBENTURES','DEBENTURE','RLST IT','INVESTMENT TRUST',
'INV TST','UNIT TRUST','UNT TST','TRUST UNITS','TST UNITS','TRUST UNIT','TST UNIT','UT','IT.',
'IT','500','BOND','DEFER','DEP','DEPY','ELKS','ETF','FUND','FD','IDX','INDEX','LP','MIPS','MITS',
'MITT','MPS','NIKKEI','NOTE','PERQS','PINES','PRTF','PTNS','PTSHP','QUIBS','QUIDS','RATE','RCPTS',
'RECEIPTS','REIT','RETUR','SCORE','SPDR','STRYPES','TOPRS','UNIT','UNT',
'UTS','WTS','XXXXX','YIELD','YLD','EXPIRED','EXPD','EXPIRY','EXPY');
do i= 1 to dim(delwords);
if findw(ENAME,delwords[i],'','eir') >0 then delete;
end;
run;
I want to highlight here is screen 6, so there are 84 elements in array "del words" that I want to delete. However, the result "screen123489116" is as below:
There are two things here that surprised me:
1. Why "I" automatically pop up in my result?
2. And if it pops up like that, it should be 84 rather than 85?
3. And whether "ge" in screen 11 equals to "greater than or equal to" ? Can I use ">=" instead of "ge" as the definition of the operator as the link following?
Many thanks and warmest regards.
@Phil_NZ wrote:
Hi SAS Users,
There are two things here that surprised me:
1. Why "I" automatically pop up in my result?
2. And if it pops up like that, it should be 84 rather than 85?
3. And whether "ge" in screen 11 equals to "greater than or equal to" ? Can I use ">=" instead of "ge" as the definition of the operator as the link following?
do i= 1 to dim(delwords);
if findw(ENAME,delwords[i],'','eir') >0 then delete;
end;
Your DO loop increments i by 1 at the end of each iteration. The last complete iteration in your case is 84. The DO loop increments it to 85. The loop returns to the top as normal, but then finds that i is out of range, and the loop stops.
Run this code:
data _null_;
do i=1 to 84; end; put i=;
do j=1 to 84 until (j=84); end; put j=;
run;
Both of the loop have 84 complete iterations. But j never gets to 85, while i does. That's because the "until" condition in the second loop is evaluated at the end of the loop, before incrementing the loop iterator. When the until condition is met, no further incrementing occurs.
Edited additional comment: I guess you could
do i=1 by 1 until (i=dim(delwords));
but there's really no point.
And speaking of the UNTIL clause, I should have added that you can make your loop much more efficient. There is no need to go through the 2nd through 84th iterations, if the 1st iteration finds a delete-word. You want to escape the do loop the first time you find a delete-word.
do i= 1 to dim(delwords) until (f>0);
f=findw(ENAME,delwords[i],'','eir');
end;
if f>0 then delete;
If all 84 iterations fail to find a delete-word, then f is a zero (84 times, but the only time you test it is the last).
@Phil_NZ wrote:
Hi SAS Users,
There are two things here that surprised me:
1. Why "I" automatically pop up in my result?
2. And if it pops up like that, it should be 84 rather than 85?
3. And whether "ge" in screen 11 equals to "greater than or equal to" ? Can I use ">=" instead of "ge" as the definition of the operator as the link following?
do i= 1 to dim(delwords);
if findw(ENAME,delwords[i],'','eir') >0 then delete;
end;
Your DO loop increments i by 1 at the end of each iteration. The last complete iteration in your case is 84. The DO loop increments it to 85. The loop returns to the top as normal, but then finds that i is out of range, and the loop stops.
Run this code:
data _null_;
do i=1 to 84; end; put i=;
do j=1 to 84 until (j=84); end; put j=;
run;
Both of the loop have 84 complete iterations. But j never gets to 85, while i does. That's because the "until" condition in the second loop is evaluated at the end of the loop, before incrementing the loop iterator. When the until condition is met, no further incrementing occurs.
Edited additional comment: I guess you could
do i=1 by 1 until (i=dim(delwords));
but there's really no point.
And speaking of the UNTIL clause, I should have added that you can make your loop much more efficient. There is no need to go through the 2nd through 84th iterations, if the 1st iteration finds a delete-word. You want to escape the do loop the first time you find a delete-word.
do i= 1 to dim(delwords) until (f>0);
f=findw(ENAME,delwords[i],'','eir');
end;
if f>0 then delete;
If all 84 iterations fail to find a delete-word, then f is a zero (84 times, but the only time you test it is the last).
Hi @mkeintz !
Thank you very much for your straightforward answer. I benefit a lot from reading your suggestion. However, I have two points that I am confused about:
1. Regarding the code
do i=1 by 1 until (i=dim(delwords));
whether we can also write
do i=1 to dim(delwords) by 1 until (i=dim(delwords));
because I read examples from other material is that
do i=... to... by...
I know it is a very novice question but I am quite prudent regarding code.
2. Regarding the extra part of UNTIL clause, do you mean that while we delete one observation because the ENAME of this observation contains a specific word in the array, we do not need to go through all specific words after this word for this observation because this observation was deleted already? If it is the case, I totally agree with you, and thank you very much for the code enhancement.
@Phil_NZ wrote:
Regarding the second point, I mean, if what you intend to do is similar to what I understand, so there is no difference between your code and my code previously. Because previously, if a specific word in the array was found in an observation of ENAME, then we delete this observation immediately, then I think the program will not continue with the rest of the list of words for this deleted observation?
Please correct me if I understand you improperly!
You are correct. The loop will stop at the DELETE statement.
1. ge is short for: greater equal
gt is short for greater than
2. You can write your test without a loop, like this:
if prxmatch("/\b('DUPLICATE|DUPL|DUP|DUPE|DULP|DUPLI|1000DUPL|XSQ|XET|ADR|GDR|PREFERRED|PFD?|PREF|PFD?|'PF'|WARRANTS?|WTS2|WARRT|DEB|DB|DCB|DEBT|DEBENTURES?|RLST IT|INVESTMENT TRUST|INV TST|UNI?T|UTS|UNIT TRUST|UNT TST|T(RU)?ST UNITS?|UT|IT\.|IT|500|BOND|DEFER|DEP|DEPY|ELKS|ETF|FUND|FD|IDX|INDEX|LP|MIPS|MITS|MITT|MPS|NIKKEI|NOTE|PERQS|PINES|PRTF|PTNS|PTSHP|QUIBS|QUIDS|RATE|RCPTS|RECEIPTS|REIT|RETUR|SCORE|SPDR|STRYPES|TOPRS|WTS|XXXXX|YIELD|YLD|EXPIRED|EXPD|EXPIRY|EXPY)\b/i",ENAME);
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.