BookmarkSubscribeRSS Feed
aknight1
Obsidian | Level 7

Hello,

 

While testing a macro to return email addresses from a metadata dataset, I get (seemingly) inconsistent results from the FINDW and INDEXW functions. When running the code given below, I am really curious why the datasets A1 and A2 are returning different results, and why the datasets B1 and B2 are returning different results. It seems like the order in which I check things influences the result.

 

My feeling is that A1 and A2 should be same, and that B1 and B2 should be same. Can anybody give me a hint why they are not? 

 

I get:

NOTE: Table WORK.TEST_A1 created, with 1 rows and 2 columns.

NOTE: Table WORK.TEST_A2 created, with 0 rows and 2 columns.

NOTE: Table WORK.TEST_B1 created, with 0 rows and 2 columns.

NOTE: Table WORK.TEST_B2 created, with 1 rows and 2 columns.

 

I am running in SAS EG 7.1 on UNIX, and in SAS 9.4 on WINDOWS.

 

data T_USER_EMAIL;
   length userid $6 email $100;
   userid='ABC090'; email='john.smith@xyz.com';    output;
   userid='DEF090'; email='joanna.smythe@xyz.com'; output;
   userid='HIJ090'; email=' ';                     output;
run;

* test with Name;
proc sql noprint;
   create table test_A1 as
   select *
     from T_USER_EMAIL
    where index(upcase(email),'@XYZ.COM')
      and ( (FINDW(" JOHN.SMITH xxx",strip(upcase(scan(email,1,'@'))),' ')) or
            (FINDW(" JOHN.SMITH xxx",strip(upcase(userid)))) ) ;
quit;

proc sql noprint;
   create table test_A2 as
   select *
     from T_USER_EMAIL
    where index(upcase(email),'@XYZ.COM')
      and ( (FINDW(" JOHN.SMITH xxx",strip(upcase(userid)))) or 
            (FINDW(" JOHN.SMITH xxx",strip(upcase(scan(email,1,'@'))),' ')) ) ;
quit;


* test with UserId;
proc sql noprint;
   create table test_B1 as
   select *
     from T_USER_EMAIL
    where index(upcase(email),'@XYZ.COM')
      and ( (FINDW(" DEF090 xxx",strip(upcase(scan(email,1,'@'))),' ')) or
            (FINDW(" DEF090 xxx",strip(upcase(userid)))) ) ;
quit;

proc sql noprint;
   create table test_B2 as
   select *
     from T_USER_EMAIL
    where index(upcase(email),'@XYZ.COM')
      and ( (FINDW(" DEF090 xxx",strip(upcase(userid)))) or 
            (FINDW(" DEF090 xxx",strip(upcase(scan(email,1,'@'))),' ')) ) ;
quit;

 

 

 

(Just fyi, I want Users to be able to provide UserIds and/or Names to the macro ... that is why the code generated by the macro is checking in both the userid and email columns) 

22 REPLIES 22
Kurt_Bremser
Super User

I think you're on to something here. Since the condition works when the first part of the "or" returns a non-zero value, but fails when that is zero, I have a suspicion that the condition optimizer does not honor the brackets as it should and breaks off too early.

Clearly a case for SAS TS, IMO.

PeterClemmensen
Tourmaline | Level 20

I agree, this seems strange. And seems to limit itself to the WHERE Clause. The two data sets created with the following code are identical

 

data T_USER_EMAIL;
   length userid $6 email $100;
   userid='ABC090'; email='john.smith@xyz.com';    output;
   userid='DEF090'; email='joanna.smythe@xyz.com'; output;
   userid='HIJ090'; email=' ';                     output;
run;

* test with Name;
proc sql noprint;
   create table test_A1 as
   select *
         ,(index(upcase(email),'@XYZ.COM')
      and ( (FINDW(" JOHN.SMITH xxx",strip(upcase(scan(email,1,'@'))),' ')) or
            (FINDW(" JOHN.SMITH xxx",strip(upcase(userid)))) )) as testvar
     from T_USER_EMAIL;
quit;

proc sql noprint;
   create table test_A2 as
   select *
         ,(index(upcase(email),'@XYZ.COM')
      and ( (FINDW(" JOHN.SMITH xxx",strip(upcase(userid)))) or 
            (FINDW(" JOHN.SMITH xxx",strip(upcase(scan(email,1,'@'))),' ')) )) as testvar
     from T_USER_EMAIL;
quit;

proc compare base=test_A1 compare=test_A2;
run;
FreelanceReinh
Jade | Level 19

Hello @aknight1,

 

I fully agree with @Kurt_Bremser and @PeterClemmensen and strongly suspect a SAS bug, a pretty substantial one indeed, which should be reported to Technical Support. Here is a boiled down example:

data have;
c=' ';
d='x';
run;

data want;
set have;
where find('x',c) or find('x',d);
run;

data want;
set have;
where find('y',c) or find('x',d);
run;

data want;
set have;
where find('x',c) | find(lowcase(left(' X')),d);
run;

data want;
set have;
where find('x',c) | find('x',d) | length(d)=2;
run;

Log (from SAS 9.4 TS1M2 on the X64_7PRO platform):

1    data have;
2    c=' ';
3    d='x';
4    run;

NOTE: The data set WORK.HAVE has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


5
6    data want;
7    set have;
8    where find('x',c) or find('x',d);
9    run;

NOTE: There were 0 observations read from the data set WORK.HAVE.
      WHERE not (not FIND('x', c));
NOTE: The data set WORK.WANT has 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


10
11   data want;
12   set have;
13   where find('y',c) or find('x',d);
14   run;

NOTE: There were 1 observations read from the data set WORK.HAVE.
      WHERE FIND('y', c) or FIND('x', d);
NOTE: The data set WORK.WANT has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


15
16   data want;
17   set have;
18   where find('x',c) | find(lowcase(left(' X')),d);
19   run;

NOTE: There were 0 observations read from the data set WORK.HAVE.
      WHERE not (not FIND('x', c));
NOTE: The data set WORK.WANT has 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


20
21   data want;
22   set have;
23   where find('x',c) | find('x',d) | length(d)=2;
24   run;

NOTE: There were 0 observations read from the data set WORK.HAVE.
      WHERE FIND('x', c) or (LENGTH(d)=2);
NOTE: The data set WORK.WANT has 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Obviously, dataset WANT should contain 1 observation in all four cases.

 

Preliminary conclusions:

 

  • The bug is not limited to FINDW and INDEXW.
  • It seems unrelated to PROC SQL.
  • The WHERE condition, as shown in the NOTEs, is inappropriately modified ("optimized"?).
  • The same modification occurs if "or" is replaced by "and" (not shown above).
  • Different types of incorrect modifications can take place (see first vs. last example).
  • The bug seems to occur as soon as the first arguments of the function calls (with the same function) result in the same string.

 

aknight1
Obsidian | Level 7

Hi ...

 

Thanks for the replies so far ... for your information, I have now reported this to SAS Technical Support. In the ticket I have also notified them of this thread, so that they can gain any insight from the Users here.

 

Thanks all Smiley Happy

Antony 

Quentin
Super User

Wow, this could ruin my Friday. : )

 

Here's an example of the bug.  The compiler seems to have munged the where clause, completely ignoring the find('x',x);

 

1    data have;
2    x='x' ;
3    y='y' ;
4    run;

NOTE: The data set WORK.HAVE has 1 observations and 2 variables.

5
6    *bug: returns 0 and where clause is reported in log as  ;
7    *WHERE not (not FIND('x', y));
8    *so its ignorning second part of clause ;
9    data want;
10   set have;
11   where find('x',y) or find('x',x);
12   run;

NOTE: There were 0 observations read from the data set WORK.HAVE.
      WHERE not (not FIND('x', y));
NOTE: The data set WORK.WANT has 0 observations and 2 variables.

Changing the expression slightly makes it work:

14   *works as expected just by adding to expression;
15   data want;
16   set have;
17   where find('x',y) or (find('x',x)=1);
18   run;

NOTE: There were 1 observations read from the data set WORK.HAVE.
      WHERE FIND('x', y) or (FIND('x', x)=1);

Adding a blank to the literal string makes it work

20   *works as expected ;
21   data want;
22   set have;
23   where find('x',y) or find(' x',x);
24   run;

NOTE: There were 1 observations read from the data set WORK.HAVE.
      WHERE FIND('x', y) or FIND(' x', x);
NOTE: The data set WORK.WANT has 1 observations and 2 variables.

If the second half of the where clause is find('y',y) you do not get the same bug as find('x',x). 

26   *doesnt happen with variable named y?  ;
27   data want;
28   set have;
29   where find('x',y) or find('y',y);
30   run;

NOTE: There were 1 observations read from the data set WORK.HAVE.
      WHERE FIND('x', y) or FIND('y', y);

 

So it's somehow related to a pesky  'x' being mis-compiled?  My  head is hurting... 

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
FreelanceReinh
Jade | Level 19

@Quentin wrote:

 

If the second half of the where clause is find('y',y) you do not get the same bug as find('x',x). 

26   *doesnt happen with variable named y?  ;
27   data want;
28   set have;
29   where find('x',y) or find('y',y);
30   run;

NOTE: There were 1 observations read from the data set WORK.HAVE.
      WHERE FIND('x', y) or FIND('y', y);

 

So it's somehow related to a pesky  'x' being mis-compiled?  My  head is hurting... 


I don't think it has to do with variable names, y is affected in the same way (using your dataset HAVE):

 

88   data want;
89   set have;
90   where find('y',' ') or find('y',y);
91   run;

NOTE: There were 0 observations read from the data set WORK.HAVE.
      WHERE 0 /* an obviously FALSE WHERE clause */ ;
NOTE: The data set WORK.WANT has 0 observations and 2 variables.

It still seems to me that the equality of the first arguments (after evaluation, if any) is necessary for the bug to occur (but not sufficient, as your example with "...=1" shows).

 

Tom
Super User Tom
Super User

So if the conditions in the WHERE statement has multiple conditions that treat a function call (so far FIND,INDEX,FINDW,INDEX) as a boolean result AND where the first argument in all of them is the same constant value (even if the later arguments are different) then it is being optimized incorrectly.

 

data bad;
  set sashelp.class ;
  where index('Alice M',strip(name)) or index('Alice M',strip(sex));
run;

data good;
  set sashelp.class ;
  if index('Alice M',strip(name)) or index('Alice M',strip(sex));
run;

One fix is to explicitly convert the function call into a boolean expression.

data good2;
  set sashelp.class ;
  where 0^=index('Alice M',strip(name)) or 0^=index('Alice M',strip(sex));
run;

UPDATE

 The where statement works properly in SAS 9.3, but not in SAS 9.4M3 or SAS 9.4M5.

FreelanceReinh
Jade | Level 19

@Tom wrote:

 

UPDATE

 The where statement works properly in SAS 9.3, but not in SAS 9.4M3 or SAS 9.4M5.


This is very interesting (and good news for older projects).

 

Here's another example, still following the pattern @Tom has described, but now using a function that produces a character result (i.e. with non-blank values being evaluated as TRUE) and without any involvement of variables:

105  data have;
106  run;

NOTE: The data set WORK.HAVE has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


107
108  data want;
109  set have;
110  where cat('x','y') | cat('x');
111  run;

NOTE: There were 1 observations read from the data set WORK.HAVE.
      WHERE not (not 'xy');
Quentin
Super User

Thanks @FreelanceReinh .  I see your point.  Glad it wasn't variable names.

 

I agree with you (and @Tom) looks like it happens when the first argument is a literal value, and the same same literal value is used in  multiple calls to the function, e.g.:

 

78   data want;
79   set have;
80   where find('FOO',x) or find('FOO',y);
81   run;

NOTE: There were 0 observations read from the data set WORK.HAVE.
      WHERE not (not FIND('FOO', x));

That's a scary bug.

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
aknight1
Obsidian | Level 7

Hi all ...

 

Just for your information ...

 

Today I was notified by SAS Tech Support that the Hot-Fix for this bug is now available.

 

Product: Base SAS 9.4_M3
Hot Fix: V01099


This hot fix is now available to customers on our external download site:
http://ftp.sas.com/techsup/download/hotfix/HF2/V01.html#V01099

V01099 is currently available on the following platform(s):

Mainframe:
z/OS, z/OS 64-bit
PC:
Windows, Windows for x64
UNIX:
64-bit Enabled Solaris, 64-bit Enabled AIX, HP-UX IPF, Linux for x64, Solaris for x64

 

Thanks to all who responded to my question, and encouraged me to contact SAS Tech Support.

 

Special thanks to @FreelanceReinh ... your concise examples were very helpful in convincing SAS Tech Support that this is a genuine problem that needs to be fixed.

 

Regards,

Antony 

 

aknight1
Obsidian | Level 7

Good Morning Mr. Knight,

 

good news - we got the hot fix for SAS 9.4 M4. 

Please find the links to the download area of the fix below.

 

Product: Base SAS 9.4_M4 Hot Fix: A3Z053 This hot fix is now available to customers on our external download site: http://ftp.sas.com/techsup/download/hotfix/HF2/A3Z.html#A3Z053

http://support.sas.com/kb/62/724.html

 

A3Z053 is currently available on the following platform(s):

 

Mainframe: z/OS

PC: Windows, Windows for x64

UNIX: 64-bit Enabled Solaris, 64-bit Enabled AIX, HP-UX IPF, Linux for x64, Solaris for x64

 

SAS Notes have been updated. A "Hot Fix tab" will appear on SAS Note(s) after the overnight update job has run to push the tab beyond the firewall. The availability of this hot fix will be announced within 48 hours on the SAS Hot Fix Communities Site.

 

The fix for SAS 9.4 M5 should follow soon.

Quentin
Super User

I find the title and description of the problem note really confusing.  

Problem Note 62724: A WHERE clause that contains an OR condition returns incorrect results when the ...

 

The example in the note is:

data test;
   var1="abc001"; var2="LOOK FOR ME(HERE)"; output;
   var1="def001"; var2="I AM NOT HERE    "; output;
run;
data ex1;
   set test;
   where (findw("LOOK FOR ME",strip(upcase(var1)))) or
         (findw("LOOK FOR ME",strip(upcase(scan(var2,1,'('))),' '));
run;

Minor problems:  OR is an operator, not a condition.  Operators do not have arguments.  Clearly in the example, the two expressions on the WHERE statement are not identical.

 

What would be a better description? Something like "When a character function is used multiple times in WHERE statement  expressions joined by an OR operator, with identical literal values as the first argument, incorrect results may be returned"?  I don't love that either.  Can someone write a better version of the support note?  

 

Or did I misunderstand the scope of this bug?

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Kurt_Bremser
Super User

Ouch. This goes deeper than I initially suspected, and can turn out to be the first really SERIOUS bug I've seen with respect to SAS. It's much more serious insofar as it can go undetected for quite some time and produce false results. Can hurt much more than a simple Segmentation Violation that causes a very noticeable crash (I've found one of those quite recently in the 9.4M5 code for SAS/SHARE, see http://support.sas.com/kb/62/346.html).

ballardw
Super User

Not going to go into more testing but you could try simplifying the FINDW code by using the I and R (or T) options and drop the strip(upcase()) code.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 2747 views
  • 14 likes
  • 9 in conversation