BookmarkSubscribeRSS Feed
LAP
Quartz | Level 8 LAP
Quartz | Level 8
Why would a sql query with an orderby statement produce a different sort order than using a proc sort and a by statement identical to the order by sql statement?

I am using the following procedure to read 3 different teradata tables (same structure just different tables).

create View HISTORY_PART_&i._Vw as
select PROF_SEGM_ID
,MKT_SEGM_IDFR
,CMDY_LINE_ID
,BUS_GRP_ID
,BUS_CATG_CODE
,ORIG_PROF_REG
,DEST_PROF_REG
,MGRL_CAR_KIND
,F990_RPTG_DT_YRMO as yrmo
,sum(TOTL_NET_REV) as rev
,sum(TOTL_CWT/20)as tons
,sum(CAR_CNT) as units
from prodvp20.&TABLE_NAME
where f990_rptg_dt_yrmo Between &&Start_YRMO&i and &&END_YRMO&i
GROUP by BUS_GRP_ID
,CMDY_LINE_ID
,PROF_SEGM_ID
,MKT_SEGM_IDFR
,BUS_CATG_CODE
,ORIG_PROF_REG
,DEST_PROF_REG
,MGRL_CAR_KIND
,F990_RPTG_DT_YRMO
ORDER by PROF_SEGM_ID
,BUS_GRP_ID
,CMDY_LINE_ID
,MKT_SEGM_IDFR
,ORIG_PROF_REG
,DEST_PROF_REG
,MGRL_CAR_KIND
,BUS_CATG_CODE
,F990_RPTG_DT_YRMO;

I then interleave the data sets with in a data step as follows

Data all;
set HISTORY_PART_1_Vw
HISTORY_PART_2_Vw
HISTORY_PART_3_Vw ;
by PROF_SEGM_ID BUS_GRP_ID CMDY_LINE_ID MKT_SEGM_IDFR
ORIG_PROF_REG DEST_PROF_REG MGRL_CAR_KIND
BUS_CATG_CODE ;

ERROR: BY variables are not properly sorted on data set WORK.HISTORY_PART_2_VW.

However, if I sort the results from the sql queries before I do the data step. The order changes (from the sql query with order by statement) and the by statement in the data step does not produce an error.

Adding a sort after the sql with an order by seems is inefficient especially since each of these data sets are quite large.

I should add that the sql statement (with order by) has been in production for the past 18months with no problems, but the data today had some bad observations where MKT_SEGM_IDFR = C148l and C148L and the order in the order by was different than the order when I sorted the query so the interleave step failed.

Any thoughts? Is there something that I am missing here.

Thanks
5 REPLIES 5
Doc_Duke
Rhodochrosite | Level 12
Different hardware platforms have different sort orders (called collating sequences). Your SQL query is being passed to the teradata server (perhaps an IBM mainframe) for sorting and a PROC SORT is being executed on your SAS Server (more likely a Unix or PC platform).

Doc Muhlbaier
Duke
DanielSantos
Barite | Level 11
True.
I've faced this issue sometime with data coming from a z/OS platform to Unix.

Luckily, PROC SORT procedure provides the SORTSEQ option to enable you to specify the collating sequence for your sort. You can even build up your own translation table that suits your needs through the PROC TRANTAB procedure.

See the SORTSEQ option:
http://support.sas.com/documentation/cdl/en/hostwin/61924/HTML/default/chsortfstart.htm

And the TRANTAB procedure;
http://support.sas.com/documentation/cdl/en/nlsref/61893/HTML/default/a000146227.htm

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Definitely consideration for OS platform (ASCII or EBCDIC) affecting collation sequence, as well as Teradata specific behavior, which I found in the document below:

http://www.teradataforum.com/teradata_pdf/b035-1101-061a_1.pdf - section on topic "About Collation Sequences".

References on SAS support http://support.sas.com/ website also mention that you consider specifying the ORDER BY outside the parenthese so the event occurs post-RDBMS extract processing.

Scott Barry
SBBWorks, Inc.
LAP
Quartz | Level 8 LAP
Quartz | Level 8
Thank you for your suggestions.

I'm a bit confused. This query is an implicit query. It was my understanding the query was passed back to SAS for processing which even if teradata resided on a different platform, the SAS platform should control the order.

In my case however, teradata and SAS are both ascii platforms, and the returned order from the teradata query does not match either sequence. My records with mkt_seg_idfr = C148l are intermingled among the all upper case records. They should be grouped either at begining or end if they were following collating sequence rules.

I'm looking at some other things right now concerning dbms sort stability which may explain this behavior. I find myself asking the question why waste processing time with an order by statement, if you have to resort the data.

Just thought someone else may have encountered this strange behavior. Thanks again for your help
Paul_Kent_SAS
SAS Employee
Hi There; could you log this one into SAS technical support. i'm suspecting the default sort ordering on the teradata side is "case indifferent", which causes the SAS BY logic that checks for by groups variables being in "ever increasing" order... there is probably some SAS or teradata option to request lowercase before uppercase, and the pros in tech support can help us find it/

paul

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 752 views
  • 0 likes
  • 5 in conversation