Different sort orders

LAP · Posted 08-04-2009 07:47 PM

Why would a sql query with an orderby statement produce a different sort order than using a proc sort and a by statement identical to the order by sql statement?

I am using the following procedure to read 3 different teradata tables (same structure just different tables).

create View HISTORY_PART_&i._Vw as
select PROF_SEGM_ID
,MKT_SEGM_IDFR
,CMDY_LINE_ID
,BUS_GRP_ID
,BUS_CATG_CODE
,ORIG_PROF_REG
,DEST_PROF_REG
,MGRL_CAR_KIND
,F990_RPTG_DT_YRMO as yrmo
,sum(TOTL_NET_REV) as rev
,sum(TOTL_CWT/20)as tons
,sum(CAR_CNT) as units
from prodvp20.&TABLE_NAME
where f990_rptg_dt_yrmo Between &&Start_YRMO&i and &&END_YRMO&i
GROUP by BUS_GRP_ID
,CMDY_LINE_ID
,PROF_SEGM_ID
,MKT_SEGM_IDFR
,BUS_CATG_CODE
,ORIG_PROF_REG
,DEST_PROF_REG
,MGRL_CAR_KIND
,F990_RPTG_DT_YRMO
ORDER by PROF_SEGM_ID
,BUS_GRP_ID
,CMDY_LINE_ID
,MKT_SEGM_IDFR
,ORIG_PROF_REG
,DEST_PROF_REG
,MGRL_CAR_KIND
,BUS_CATG_CODE
,F990_RPTG_DT_YRMO;

I then interleave the data sets with in a data step as follows

Data all;
set HISTORY_PART_1_Vw
HISTORY_PART_2_Vw
HISTORY_PART_3_Vw ;
by PROF_SEGM_ID BUS_GRP_ID CMDY_LINE_ID MKT_SEGM_IDFR
ORIG_PROF_REG DEST_PROF_REG MGRL_CAR_KIND
BUS_CATG_CODE ;

ERROR: BY variables are not properly sorted on data set WORK.HISTORY_PART_2_VW.

However, if I sort the results from the sql queries before I do the data step. The order changes (from the sql query with order by statement) and the by statement in the data step does not produce an error.

Adding a sort after the sql with an order by seems is inefficient especially since each of these data sets are quite large.

I should add that the sql statement (with order by) has been in production for the past 18months with no problems, but the data today had some bad observations where MKT_SEGM_IDFR = C148l and C148L and the order in the order by was different than the order when I sorted the query so the interleave step failed.

Any thoughts? Is there something that I am missing here.

Thanks

Doc_Duke · Posted 08-04-2009 11:17 PM

Different hardware platforms have different sort orders (called collating sequences). Your SQL query is being passed to the teradata server (perhaps an IBM mainframe) for sorting and a PROC SORT is being executed on your SAS Server (more likely a Unix or PC platform).

Doc Muhlbaier
Duke

DanielSantos · Posted 08-05-2009 06:59 AM

True.
I've faced this issue sometime with data coming from a z/OS platform to Unix.

Luckily, PROC SORT procedure provides the SORTSEQ option to enable you to specify the collating sequence for your sort. You can even build up your own translation table that suits your needs through the PROC TRANTAB procedure.

See the SORTSEQ option:
http://support.sas.com/documentation/cdl/en/hostwin/61924/HTML/default/chsortfstart.htm

And the TRANTAB procedure;
http://support.sas.com/documentation/cdl/en/nlsref/61893/HTML/default/a000146227.htm

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.

sbb · Posted 08-05-2009 11:09 PM

Definitely consideration for OS platform (ASCII or EBCDIC) affecting collation sequence, as well as Teradata specific behavior, which I found in the document below:

http://www.teradataforum.com/teradata_pdf/b035-1101-061a_1.pdf - section on topic "About Collation Sequences".

References on SAS support http://support.sas.com/ website also mention that you consider specifying the ORDER BY outside the parenthese so the event occurs post-RDBMS extract processing.

Scott Barry
SBBWorks, Inc.

LAP · Posted 08-05-2009 12:16 PM

Thank you for your suggestions.

I'm a bit confused. This query is an implicit query. It was my understanding the query was passed back to SAS for processing which even if teradata resided on a different platform, the SAS platform should control the order.

In my case however, teradata and SAS are both ascii platforms, and the returned order from the teradata query does not match either sequence. My records with mkt_seg_idfr = C148l are intermingled among the all upper case records. They should be grouped either at begining or end if they were following collating sequence rules.

I'm looking at some other things right now concerning dbms sort stability which may explain this behavior. I find myself asking the question why waste processing time with an order by statement, if you have to resort the data.

Just thought someone else may have encountered this strange behavior. Thanks again for your help

Paul_Kent_SAS · Posted 08-05-2009 10:12 PM

Hi There; could you log this one into SAS technical support. i'm suspecting the default sort ordering on the teradata side is "case indifferent", which causes the SAS BY logic that checks for by groups variables being in "ever increasing" order... there is probably some SAS or teradata option to request lowercase before uppercase, and the pros in tech support can help us find it/

paul

Different sort orders

Re: Different sort orders

Re: Different sort orders

Re: Different sort orders

Re: Different sort orders

Re: Different sort orders

Different sort orders

Re: Different sort orders

Re: Different sort orders

Re: Different sort orders

Re: Different sort orders

Re: Different sort orders

Registration is open

SAS Training: Just a Click Away