While I agree that sortseq is related, I think SAS need to make a change.
So lets take an SQL Server database table. The collation is Latin1_General_CI_AS as expected
The issue of course is that if you read it into SAS via a PROC SORT, (irrespective of FORCE as an option) SAS hands it over to the database due to the SORTPGM option. If you run "proc options option=sortpgm value; run;" you will likely see SORTPGM=BEST
As Lex noted in http://support.sas.com/resources/papers/proceedings09/141-2009.pdf
you can change that to "options sortpgm=SAS;" which will make SAS do the sort with its defaults.
From time to time I see customers reading tables into SAS first then sorting which is not efficient, and this info is not jumping out around the place - so what about this?
1.) It would be nicer if PROC SORT allowed the SORTPGM option to be specified on the PROC SORT statement, not only as a system option. This would place it 'obvious' in the list of documented PROC SORT options
2.) In the meantime it would be good if SAS added this kind of information to the PROC SORT statement doco page anyway, with links to relevant SAS Access RDBMS doco, then anyone hitting this issue (collating and sortpgm) would quickly find the better answer (sortpgm). Its all good and well for those of us who know it, but I would like it to be accessible knowledge for all customers.
Hi Peter,
I work with SQL Server extensively in my current role, and have had a round with SAS TS on this issue.
What is the collation sequence of your SQL Server table? Actually, it's the collation sequence of your sort column schemeName; a collation sequence can be set at the column level in SQL Server. Which took me a while to wrap my head around!
In SQL Server Management Studio:
SELECT *
FROM [INFORMATION_SCHEMA].[COLUMNS] [c]
WHERE [c].[TABLE_NAME]='table_name'
If the COLLATION_NAME is NULL, then the column inherits the collation sequence of the database:
SELECT *
FROM sys.[databases]
WHERE name='your_database'
SAS uses a case-sensitive collation sequence. If your column has a case-insensitive collation sequence, that is likely your problem.
Issue the command:
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
in your program to see if SAS implicitly passed through the ORDER BY to SQL Server for execution. I suspect it did.
If you have the right permissions, you *could* assign a different collation sequence (eg. Latin1_General_BIN) to that column. But you better make sure it doesn't mess up other work by your colleagues!
Otherwise, you could change your code to have SAS do the sorting:
proc sql;
connect using mi;
create table sch_names_low as
select a.psrNumber, a.schemeName, b.statusdate, b.status
from connection to mi (
select a.psrNumber, a.schemeName, b.statusdate, b.status
from tbl_bau_stats_details a
join tbl_bau_stats_valuations b
on a.psrNumber = b.psrNumber
where schemeName lt 'a'
)
order by schemeName
;
quit;
I think you'll be in a world of pain if you try to create a custom trantab that matches your SQL Server collation sequence. But let me know if you go down that track and get it to work 🙂
The conclusion we came up with after consultation with SAS TS was that we changed the collation sequence of our databases from the default case-insensitive one (IIRC Latin1_General_100_CI_AI) to a case-sensitive one that matches SAS's sort order (Latin1_General_BIN). (This was in conjunction with a rebuild/upgrade of our SQL Server).
That change has its own issues (eg. case-sensitive table and column names), and bugs in SAS where SAS decides to uppercase implicit pass-through code to the database, and thinks that table FooBar is FOOBAR, and SQL Server returns that the table doesn't exist.
Hope this helps...
Hi Peter,
One more thing I thought of. You can actually specify the desired collation sequence on the ORDER BY statement in T-SQL.
Here is some example code:
IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1
CREATE TABLE #T1 (
MyVar VARCHAR(3) COLLATE Latin1_General_100_CI_AI NOT NULL
)
INSERT INTO [#T1]
([MyVar])
VALUES
('FOO'),('Foo'),('foo'),('fOo'),('fOO'),('FoO')
SELECT * FROM #T1 ORDER BY MyVar
SELECT * FROM #T1 ORDER BY MyVar COLLATE Latin1_General_BIN
Run the above code in SQL Server Management Studio and note the different output from the two select statements.
Then, run this SAS code:
%libname_sqlsvr(libref=TMP,server=YourSQLServer,port=,database=tempdb,schema=dbo)
proc sql;
connect using tmp;
execute by tmp (
IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1
CREATE TABLE #T1 (
MyVar VARCHAR(3) COLLATE Latin1_General_100_CI_AI NOT NULL
)
INSERT INTO [#T1]
([MyVar])
VALUES
('FOO'),('Foo'),('foo'),('fOo'),('fOO'),('FoO')
);
create table test1 as
select * from connection to tmp (
SELECT * FROM #T1 ORDER BY MyVar
);
create table test2 as
select * from connection to tmp (
SELECT * FROM #T1 ORDER BY MyVar COLLATE Latin1_General_BIN
);
quit;
data _null_;
set test1; * fails, the default column collation sequence is case-insensitive ;
by MyVar;
run;
data _null_;
set test2; * works, the explicit collation sequence is case-sensitive and matches SAS ;
by MyVar;
run;
I doubt that it's possible to coax SAS to implicitly pass-through a collation sequence to the SQL Server ORDER BY clause.
But, if you can change your code to use explicit pass-through, the 2nd approach may work for you. You'd likely get better performance having SQL Server do the sorting first, rather than having SAS do the sorting as in my previous post.
HTH...
Spot on Scott. Yes that's pretty much where we ended up before commenting on this thread. Using the TSQL option COLLATE override was okay but the prospect of changing lots of code plus the (probable) extra overhead on the database to override each field's collate definition did not excite us.
In the end the database team changed the database collating default as well as some field collate values - which added the problem of working out if it was intentionally different - or whether it was an artifact of the legacy non SAS tools that loaded the database. Its a bit of a tar ball.
Coming back to the change suggested, irrespective of the COLLATE option and values chosen in the database, I emphatically think SAS should put some breadcrumbs down for SAS users to follow, starting with a specific new option on PROC SORT that does what the sortpgm option. From a human interface design perspective, the user is deal with a SORT so will look at the SORT doco, and if the sortpgm option was there then the discourse can also explain that not only does it ensure SAS sorts the data, it can override a database collate sequence. This is important because it informs the users in a natural 'discovery' pathway.
As for the efficiency, a novice user is likely to read the table out of the database, then sort it with SAS once it is in a SAS dataset. This means an inefficient two pass of the data. At least if a sortpgm option was available on proc sort syntax, the users would at least gain the efficiency of one pass of the data as its read from the database.
Sure, there are many examples where you can override a system option with a runtime override (for example OBS).
Perhaps create a post in SASware Ballot Ideas and lobby to get folks to upvote it?
If going for implicit SQL then you could of course also create a SAS view instead of a table and then use this view in a SAS Proc Sort.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.