Re: SQL sorted but data is NOTsorted

Peter_C · Posted 11-23-2009 12:11 PM

hit a problem with PROC SQL sorting
after pulling some data together with proc sql, I ran a short data step to check for duplicate names and found the table was NOT sorted by the column in the "ORDER BY schemeName" clause.[pre]513 proc sql ;
514 create table sch_names_low as
515 select a.psrNumber, a.schemeName, b.statusdate, b.status
516 from mi.tbl_bau_stats_details a
517 join mi.tbl_bau_stats_valuations b
518 on a.psrNumber = b.psrNumber
519 where schemeName lt 'a'
520 order by schemeName
521 ;
NOTE: Table WORK.SCH_NAMES_LOW created, with 258 rows and 4 columns.

522 quit ;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.25 seconds
cpu time 0.03 seconds

523
524 data schn_dup ;
525 set ;
526 by schemeName ;
527 if first.schemeName & last.schemeName then delete ;
528 run;

ERROR: BY variables are not properly sorted on data set WORK.SCH_NAMES_LOW.
PSRNumber=10###### SchemeName=[c& S (Neatxxxxxxxxxxxxxxxxxxxxxx Scheme 155EP373 StatusDate=04JAN2005:00:00:00 Status=Wound Up FIRST.SchemeName=1 LAST.SchemeName=1 _ERROR_=1 _N_=3
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 4 observations read from the data set WORK.SCH_NAMES_LOW[/pre]

No amount of SORTSEQ= handling could resolve the problem.

Usage note 4999 http://support.sas.com/kb/4/999.html hints at the problem. These data come from sql server where [ appears to have a different sort value than in SAS![pre]571 data _null_ ;
572 set SCH_NAMES_LOW( obs=4) ;
573 put schemeName= $char5. schemeName $hex10. ;
574 run;

SchemeName="J" F 224A222046
SchemeName=(C M) 2843204D29
SchemeName=[c& S 5B63262053
SchemeName=03666 3033363636
NOTE: There were 4 observations read[/pre]
Helpfully, PROC SQL did not mark the table created (with ORDER BY) as sorted, when the sort was performed on the SQL Server.
However, Proc SQL might inform and properly warn with some MSGLEVEL=i message or WARNING .

Is there some option I have missed which creates this warning?

DanielSantos · Posted 11-23-2009 12:53 PM

I've stumbled on this before.

I was extracting data from a DB2 system (zOS) to SAS on a distributed system (AIX).
The sort sequence did not match.

Since there were no predefined SORTSEQ that suited my needs, I've solved this trouble by creating a new SORT sequence (matching tho one at DB2) and setting the system option SORTSEQ to this table in the autoexec file.

No more problems since then.

Hope this helps.

See the TRANTAB procedure to create you own sequence table:
http://support.sas.com/documentation/cdl/en/nlsref/61893/HTML/default/a000146227.htm

And check the SORTSEQ system option:
http://support.sas.com/documentation/cdl/en/nlsref/61893/HTML/default/a000279219.htm

Cheers from Portugal.

Daniel Santos @ www.cgd.pt

Peter_C · Posted 11-23-2009 06:26 PM

thank you Daniel
I'll try that again

peter.c

SAS_user · Posted 11-24-2009 06:53 AM

What are these tables:
Mi.tbl_bau_stats_details
Mi.tbl_bau_status_valuations?

Are they SAS tables?

I had the same problems when using oracle data.
Workaround was done then downloaded data, then used proc sql;
After this data was sorted.

Peter_C · Posted 11-24-2009 02:56 PM

SAS user

sorry the message was so long, that you missed
> These data come from sql server where [

At least this scenario is a little better than my previous experience in this area.
Then, a cimport of a sas table downloaded to windows from zOS, continued to provide sortedby information - including sortseq=ebcdic. That was ignored in proc sql sas913 - so sql joined with no errors when the first row in the table downloaded from zOS was higher than all rows in the other table and both tables were declared to be in the corresponding order (apart from the sortseq=ebcdic on the table from zOS).
At least in this scenario, the implicit pass-thru of the query including the "ORDER BY" clause, by the sas/access to odbc engine connected to sql server, results in no sortedby information being placed in the header of the output data set.

So I've removed that order-by clause, and plan not to use it where ever I'm pulling data through pass-thru sql.
I still think proc sql could generate a warning when the effect of order-by is outside the interpretation of SAS.

PeterC

broz · Posted 09-12-2018 01:40 AM

While I agree that sortseq is related, I think SAS need to make a change.

So lets take an SQL Server database table. The collation is Latin1_General_CI_AS as expected

The issue of course is that if you read it into SAS via a PROC SORT, (irrespective of FORCE as an option) SAS hands it over to the database due to the SORTPGM option. If you run "proc options option=sortpgm value; run;" you will likely see SORTPGM=BEST

As Lex noted in http://support.sas.com/resources/papers/proceedings09/141-2009.pdf

you can change that to "options sortpgm=SAS;" which will make SAS do the sort with its defaults.

From time to time I see customers reading tables into SAS first then sorting which is not efficient, and this info is not jumping out around the place - so what about this?

1.) It would be nicer if PROC SORT allowed the SORTPGM option to be specified on the PROC SORT statement, not only as a system option. This would place it 'obvious' in the list of documented PROC SORT options

2.) In the meantime it would be good if SAS added this kind of information to the PROC SORT statement doco page anyway, with links to relevant SAS Access RDBMS doco, then anyone hitting this issue (collating and sortpgm) would quickly find the better answer (sortpgm). Its all good and well for those of us who know it, but I would like it to be accessible knowledge for all customers.

ScottBass · Posted 09-12-2018 03:18 AM

Hi Peter,

I work with SQL Server extensively in my current role, and have had a round with SAS TS on this issue.

What is the collation sequence of your SQL Server table? Actually, it's the collation sequence of your sort column schemeName; a collation sequence can be set at the column level in SQL Server. Which took me a while to wrap my head around!

In SQL Server Management Studio:

SELECT *
FROM   [INFORMATION_SCHEMA].[COLUMNS] [c]
WHERE  [c].[TABLE_NAME]='table_name'

If the COLLATION_NAME is NULL, then the column inherits the collation sequence of the database:

SELECT *
FROM   sys.[databases]
WHERE  name='your_database'

SAS uses a case-sensitive collation sequence. If your column has a case-insensitive collation sequence, that is likely your problem.

Issue the command:

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

in your program to see if SAS implicitly passed through the ORDER BY to SQL Server for execution. I suspect it did.

If you have the right permissions, you *could* assign a different collation sequence (eg. Latin1_General_BIN) to that column. But you better make sure it doesn't mess up other work by your colleagues!

Otherwise, you could change your code to have SAS do the sorting:

proc sql;
   connect using mi;
   create table sch_names_low as
   select a.psrNumber, a.schemeName, b.statusdate, b.status
   from connection to mi (
select a.psrNumber, a.schemeName, b.statusdate, b.status
from   tbl_bau_stats_details a
join   tbl_bau_stats_valuations b
on     a.psrNumber = b.psrNumber
where  schemeName lt 'a'
   )
   order by schemeName
   ;
quit;

I think you'll be in a world of pain if you try to create a custom trantab that matches your SQL Server collation sequence. But let me know if you go down that track and get it to work 🙂

The conclusion we came up with after consultation with SAS TS was that we changed the collation sequence of our databases from the default case-insensitive one (IIRC Latin1_General_100_CI_AI) to a case-sensitive one that matches SAS's sort order (Latin1_General_BIN). (This was in conjunction with a rebuild/upgrade of our SQL Server).

That change has its own issues (eg. case-sensitive table and column names), and bugs in SAS where SAS decides to uppercase implicit pass-through code to the database, and thinks that table FooBar is FOOBAR, and SQL Server returns that the table doesn't exist.

Hope this helps...

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

ScottBass · Posted 09-12-2018 08:12 PM

Hi Peter,

One more thing I thought of. You can actually specify the desired collation sequence on the ORDER BY statement in T-SQL.

Here is some example code:

IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1

CREATE TABLE #T1 (
    MyVar VARCHAR(3) COLLATE Latin1_General_100_CI_AI NOT NULL
)

INSERT INTO [#T1] 
([MyVar])
VALUES
('FOO'),('Foo'),('foo'),('fOo'),('fOO'),('FoO')

SELECT * FROM #T1 ORDER BY MyVar
SELECT * FROM #T1 ORDER BY MyVar COLLATE Latin1_General_BIN

Run the above code in SQL Server Management Studio and note the different output from the two select statements.

Then, run this SAS code:

%libname_sqlsvr(libref=TMP,server=YourSQLServer,port=,database=tempdb,schema=dbo)

proc sql;
   connect using tmp;
   execute by    tmp (
IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1

CREATE TABLE #T1 (
    MyVar VARCHAR(3) COLLATE Latin1_General_100_CI_AI NOT NULL
)

INSERT INTO [#T1] 
([MyVar])
VALUES
('FOO'),('Foo'),('foo'),('fOo'),('fOO'),('FoO')
   );

   create table test1 as
   select * from connection to tmp (
SELECT * FROM #T1 ORDER BY MyVar
   );

   create table test2 as
   select * from connection to tmp (
SELECT * FROM #T1 ORDER BY MyVar COLLATE Latin1_General_BIN
   );
quit;

data _null_;
   set test1; * fails, the default column collation sequence is case-insensitive ;
   by MyVar;
run;

data _null_;
   set test2; * works, the explicit collation sequence is case-sensitive and matches SAS ;
   by MyVar;
run;

I doubt that it's possible to coax SAS to implicitly pass-through a collation sequence to the SQL Server ORDER BY clause.

But, if you can change your code to use explicit pass-through, the 2nd approach may work for you. You'd likely get better performance having SQL Server do the sorting first, rather than having SAS do the sorting as in my previous post.

HTH...

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

broz · Posted 01-07-2019 06:29 PM

Spot on Scott. Yes that's pretty much where we ended up before commenting on this thread. Using the TSQL option COLLATE override was okay but the prospect of changing lots of code plus the (probable) extra overhead on the database to override each field's collate definition did not excite us.

In the end the database team changed the database collating default as well as some field collate values - which added the problem of working out if it was intentionally different - or whether it was an artifact of the legacy non SAS tools that loaded the database. Its a bit of a tar ball.

Coming back to the change suggested, irrespective of the COLLATE option and values chosen in the database, I emphatically think SAS should put some breadcrumbs down for SAS users to follow, starting with a specific new option on PROC SORT that does what the sortpgm option. From a human interface design perspective, the user is deal with a SORT so will look at the SORT doco, and if the sortpgm option was there then the discourse can also explain that not only does it ensure SAS sorts the data, it can override a database collate sequence. This is important because it informs the users in a natural 'discovery' pathway.

As for the efficiency, a novice user is likely to read the table out of the database, then sort it with SAS once it is in a SAS dataset. This means an inefficient two pass of the data. At least if a sortpgm option was available on proc sort syntax, the users would at least gain the efficiency of one pass of the data as its read from the database.

ScottBass · Posted 01-07-2019 07:05 PM

Sure, there are many examples where you can override a system option with a runtime override (for example OBS).

Perhaps create a post in SASware Ballot Ideas and lobby to get folks to upvote it?

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

Patrick · Posted 01-07-2019 08:20 PM

If going for implicit SQL then you could of course also create a SAS view instead of a table and then use this view in a SAS Proc Sort.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away