BookmarkSubscribeRSS Feed

tl;dr; For a case-sensitive database in SQL Server, the ODBC engine properly handles column names, but not table names.

 

Note this is related to https://communities.sas.com/t5/SAS-Data-Management/Case-sensitivity-issue-with-ODBC-engine-and-SQL-S...

 

Let's say a SAS/Access engine is meant to access an external RBDMS table, Excel file, etc. as though it were a "virtual SAS dataset".  Perhaps a sweeping, overly generalized statement, but let's go with it for now.

 

Furthermore, let's say that consistency in how that engine works is a good thing.  In this case, I’m referring to consistency in treating both table names and column names in a case-insensitive manner, in the same way that SAS treats SAS datasets.

 

I'm working with SQL Server.  I need to work with both case-insensitive tables (Latin1_General_CI_AI collation), and case-sensitive tables (Latin1_General_BIN collation).

 

Some preliminaries/background:

Latin1_General_CI_AI:

Table names and column names are case-insensitive (in SQL Server itself)

Filtering (WHERE clause) is case-insensitive, eg. WHERE current_record='Y' will return Y, y, and various accented y chars (if present in the data) (differs from SAS).

Data is sorted (ORDER BY) in a case-insensitive manner (not in ASCII code order, and not matching SAS's default collation/sortseq)

 

Latin1_General_BIN:

Table names and column names are case-sensitive (in SQL Server itself).  For the SAS programmer, this can be a "pain".

Filtering (WHERE clause) is case-sensitive, eg. WHERE current_record='Y' will return Y only (like SAS).

Data is sorted (ORDER BY) in a case-sensitive manner (in ASCII code order, and matching SAS's default collation/sortseq).

 

Some test code:

Create two databases in SQL Server, say CASE_INS and CASE_SEN. 

Configure the collation for CASE_INS as Latin1_General_CI_AI.

Configure the collation for CASE_SEN as Latin1_General_BIN.

 

Run this code in SQL Server Management Studio for both databases:

 

 

USE CASE_INS  -- then change to CASE_SEN and re-run
GO

DROP TABLE TeSt
CREATE TABLE TeSt ( FOO VARCHAR(3) ,Bar VARCHAR(5) ,BlAh VARCHAR(7) )
INSERT INTO TeSt VALUES ('foo','BAR','Blah') SELECT foo, bar, blah FROM test WHERE bar='bar' SELECT FOO, Bar, BlAh FROM TeSt WHERE Bar='BAR'

 

 

Note that both SELECT queries work for CASE_INS, but only the second query works in CASE_SEN.

 

Now let's work with these tables in SAS:

 

 

* set desired SAS options ;
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
options msglevel=I;
options fullstimer;
options mprint mrecall;
options compress=binary;
options ls=max;
options nocenter;
options lognumberformat=1;

libname CASE_INS ODBC NOPROMPT="Driver={SQL Server Native Client 10.0};Server=MYSERVER;Database=CASE_INS;Trusted_Connection=yes;" schema=dbo;
libname CASE_SEN ODBC NOPROMPT="Driver={SQL Server Native Client 10.0};Server=MYSERVER;Database=CASE_SEN;Trusted_Connection=yes;" schema=dbo;

* case insensitive SQL Server database ;
* Note that all these scenarios work as expected ;
* (with the possible exception of the case-insensitive where clause results) ; data test1; set case_ins.test (keep=foo bar blah); where bar='bar'; by bar; run; data test2; keep foo bar blah; set case_ins.test; where bar='bar'; by bar; run; data test3; format foo bar blah; set case_ins.test; where bar='bar'; by bar; run; * case sensitive SQL Server database ; * the below two scenarios fail ; data test4; set case_sen.test (keep=foo bar blah); where bar='bar'; by bar; run; data test5; set case_sen.TEST (keep=foo bar blah); where bar='bar'; by bar; run; * this scenario correctly knows that the table exists, ; * creates the column list in the correct case, ; * but does not return the row. ; * this is fine, since this is a case-sensitive table, ; * and in fact is more in line with a where clause in SAS itself ; data test6; set case_sen.TeSt (keep=foo bar blah); where bar='bar'; by bar; run; * this scenario works as expected ; data test7; set case_sen.TeSt (keep=foo bar blah); where Bar='BAR'; by Bar; run; * this scenario fails due to the invalid column name ; data test8; set case_sen.TeSt (keep=foo bar blah baz); where Bar='BAR'; by Bar; run;

 

In all of the above scenarios, we get these sastrace messages in the SAS log:

 

 

ODBC: AUTOCOMMIT is NO for connection 3
ODBC: AUTOCOMMIT turned ON for connection id 3
 
ODBC_1: Prepared: on connection 3
SELECT * FROM "dbo"."test"
 
39         data test1;
40            set case_ins.test (keep=foo bar blah);
41            where bar='bar';
42            by bar;
43         run;

 
ODBC_2: Prepared: on connection 3
SELECT  "FOO", "Bar", "BlAh"  FROM "dbo"."test"  WHERE  ( "Bar" = 'bar' ) ORDER BY "Bar"
 
 
ODBC_3: Executed: on connection 3
Prepared statement ODBC_2

 

 

However, for the case-sensitive scenarios, we get these results:

 

 

ODBC: AUTOCOMMIT is NO for connection 3
ODBC: AUTOCOMMIT turned ON for connection id 3
 
ODBC_10: Prepared: on connection 3
SELECT * FROM "dbo"."test"
 
ODBC: ROLLBACK performed on connection 3.
61         data test4;
62            set case_sen.test (keep=foo bar blah);
ERROR: File CASE_SEN.test.DATA does not exist.
63            where bar='bar';
64            by bar;
65         run;

 

 

While I don't know the internals of the ODBC engine, this is my conjecture:

 

1) The ODBC engine is using SELECT * from "tablename" to determine if the table exists. 

2) The query executes quite quickly, so it's only returning limited (zero?) rows, functionally equivalent to WHERE 0=1.

3) This query is within a TRANSACTION block (or the equivalent) and perhaps a TRY/CATCH block (or the equivalent).  This conjecture is due to the ROLLBACK statement above.

4) If the query is unsuccessful, the decision is made that the table does not exist.

5) If the query is successful, the results are saved in a resultset.

6) The header of that resultset is compared to the SAS statements, in a case-insensitive manner, to generate the correct case-sensitive column list.

 

Regardless of whether that conjecture is technically correct, IMO SELECT * from "tablename" is a poor way to determine if the table exists, since it relies on the table name being specified in the correct case.

 

A better approach would be:

 

SELECT TABLE_NAME

FROM   INFORMATION_SCHEMA.TABLES

WHERE  UPPER(TABLE_SCHEMA)='DBO'

AND    UPPER(TABLE_NAME)='TEST'

 

or

 

SELECT tbl.name AS TABLE_NAME

FROM   sys.tables tbl

INNER JOIN sys.schemas sch

ON     UPPER(sch.name)='DBO'

AND    UPPER(tbl.name)='TEST'

 

My recommendation:

 

1) Use either of the above approaches to determine if the table exists (and you have the required permissions).  Check for the existence of the table in a case-insensitive manner.

2) If not, abort with table does not exist error.

3) If so, use the correct table name (including case) to load the resultset.

4) Use that resultset to build the correct column list (including case).

 

IMO this behaviour would be more consistent with how SAS treats datasets and "virtual datasets" using case-insensitive names.

 

Note:  we are currently undergoing a SQL Server upgrade, and migrating from the default of collation of Latin1_General_CI_AI to Latin1_General_BIN, in order to match the default collation in SAS.  The above behaviour of the ODBC engine will result in our need to refactor (or at least test) hundreds if not thousands of programs to ensure the table names are specified in the correct case.

2 Comments
Tom
Super User
Super User

Can you clarify what is failing?  Are you totally unable to find the table in the case sensitive database?  Or can you find it if you use the correct case for the name?  

In your example you show SAS trying this query.

ODBC_10: Prepared: on connection 3
SELECT * FROM "dbo"."test"

Is the problem that the actual table name is Test of tesT and not test? 

Or perhaps that the schema/database name is DBO instead of dbo?

ScottBass
Rhodochrosite | Level 12

Or can you find it if you use the correct case for the name?  

 

Is the problem that the actual table name is Test of tesT and not test? 

 

This.

 

For example:

 

data foo;

set case_sen.test (keep=foo bar blah);

run;

 

fails, but:

 

data foo;

set case_sen.TeSt (keep=foo bar blah);

run;

 

works, even though the case of the column names is incorrect. 

 

But, the 2nd case only works because the SAS ODBC engine is smart enough to change the case of the column names, but isn't smart enough to change the case of the table name.

 

I assert that this is inconsistent, furthermore should be easy to fix.  It's also inconsistent in the approach to table names vs. column names.

 

IMO if I'm referencing an RDBMS table via the ODBC engine, the usual case-insensitive SAS syntax should apply for both tables and columns if I'm referencing the table in SAS (not via explicit pass-through).

 

In the "related to" link above, I couldn't get it to work at all, again due to how SAS detects the existence of an RBDMS table.