Why would a sql query with an orderby statement produce a different sort order than using a proc sort and a by statement identical to the order by sql statement?
I am using the following procedure to read 3 different teradata tables (same structure just different tables).
create View HISTORY_PART_&i._Vw as
,F990_RPTG_DT_YRMO as yrmo
,sum(TOTL_NET_REV) as rev
,sum(CAR_CNT) as units
where f990_rptg_dt_yrmo Between &&Start_YRMO&i and &&END_YRMO&i
GROUP by BUS_GRP_ID
ORDER by PROF_SEGM_ID
I then interleave the data sets with in a data step as follows
by PROF_SEGM_ID BUS_GRP_ID CMDY_LINE_ID MKT_SEGM_IDFR
ORIG_PROF_REG DEST_PROF_REG MGRL_CAR_KIND
ERROR: BY variables are not properly sorted on data set WORK.HISTORY_PART_2_VW.
However, if I sort the results from the sql queries before I do the data step. The order changes (from the sql query with order by statement) and the by statement in the data step does not produce an error.
Adding a sort after the sql with an order by seems is inefficient especially since each of these data sets are quite large.
I should add that the sql statement (with order by) has been in production for the past 18months with no problems, but the data today had some bad observations where MKT_SEGM_IDFR = C148l and C148L and the order in the order by was different than the order when I sorted the query so the interleave step failed.
Any thoughts? Is there something that I am missing here.
Different hardware platforms have different sort orders (called collating sequences). Your SQL query is being passed to the teradata server (perhaps an IBM mainframe) for sorting and a PROC SORT is being executed on your SAS Server (more likely a Unix or PC platform).
I've faced this issue sometime with data coming from a z/OS platform to Unix.
Luckily, PROC SORT procedure provides the SORTSEQ option to enable you to specify the collating sequence for your sort. You can even build up your own translation table that suits your needs through the PROC TRANTAB procedure.
I'm a bit confused. This query is an implicit query. It was my understanding the query was passed back to SAS for processing which even if teradata resided on a different platform, the SAS platform should control the order.
In my case however, teradata and SAS are both ascii platforms, and the returned order from the teradata query does not match either sequence. My records with mkt_seg_idfr = C148l are intermingled among the all upper case records. They should be grouped either at begining or end if they were following collating sequence rules.
I'm looking at some other things right now concerning dbms sort stability which may explain this behavior. I find myself asking the question why waste processing time with an order by statement, if you have to resort the data.
Just thought someone else may have encountered this strange behavior. Thanks again for your help
Hi There; could you log this one into SAS technical support. i'm suspecting the default sort ordering on the teradata side is "case indifferent", which causes the SAS BY logic that checks for by groups variables being in "ever increasing" order... there is probably some SAS or teradata option to request lowercase before uppercase, and the pros in tech support can help us find it/