Help using Base SAS procedures

SAS proc SQL and Inner join - what are alternative methods

Reply
Frequent Contributor
Posts: 87

SAS proc SQL and Inner join - what are alternative methods

 

What I want to do is to find an alternative to the following code:

   

PROC SQL;
       CREATE TABLE WORK.XX AS
       SELECT DISTINCT t2.WC, t2.CWC
          FROM WORK.YY t1
               INNER JOIN WORK.ZZ t2 ON (t1.MC = t2.WC)
    ;
QUIT;



Could someone please help in doing the same thing using hash or any other method?

Super User
Posts: 6,928

Re: SAS proc SQL and Inner join - what are alternative methods

[ Edited ]

If there is a 1(YY) to n(ZZ) relation, then

data work.xx;
merge
  work.yy (in=a keep=mc rename=(mc=wc))
  work.zz (in=b keep=wc cwc)
;
by wc;
if a and b;
run;

proc sort data=work.xx nodupkey;
by wc cwc;
run;

should deliver the same result.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 87

Re: SAS proc SQL and Inner join - what are alternative methods

Many thanks for your help.

 

unfortunately, it returned me an error:

 

ERROR: BY variables are not properly sorted on data set WORK.YY
a=1 b=1 WC=7507 CWC=  FIRST.WC=0 LAST.WC=1 _ERROR_=1 _N_=585904753

Super User
Posts: 6,928

Re: SAS proc SQL and Inner join - what are alternative methods

Then you need to sort your datasets by the variables that are used in the merge.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 87

Re: SAS proc SQL and Inner join - what are alternative methods

These are relatively large tables. sorting might take a long time. I am providing a sample data in a reply in the trailing communication (please see below). Thanks.
Super User
Posts: 6,928

Re: SAS proc SQL and Inner join - what are alternative methods

The solution depends very much on the size(s). If one of the tables fits into MEMSIZE, creating a format or using a hash object can speed your process up.

If that is not the case, you will have to sort the tables for the merge. In my experience, using explicit sorts and a merge will out-perform SQL when the contributing tables grow large

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
PROC Star
Posts: 547

Re: SAS proc SQL and Inner join - what are alternative methods

Do the variables WC and CWC also exist in WORK.YY? Smiley Happy

Super User
Posts: 6,928

Re: SAS proc SQL and Inner join - what are alternative methods


draycut wrote:

Do the variables WC and CWC also exist in WORK.YY? Smiley Happy


Good catch. I edited my post to safeguard against that.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 87

Re: SAS proc SQL and Inner join - what are alternative methods

XX Table has following vars:
MC LC MCC MCN TLC DD

YY:
WC CWC TS MS Y X

 

Where MC and WC are same datatype (char) and common values but only var names are different i.e. in XX it is MC and in YY it is WC

 

CWC is only in YY and not in XX

 

Super User
Posts: 5,254

Re: SAS proc SQL and Inner join - what are alternative methods

That's a quite neat SQL, why do you need an alternative?
Data never sleeps
Frequent Contributor
Posts: 87

Re: SAS proc SQL and Inner join - what are alternative methods

[ Edited ]

I have the following tables:

 

data have01;
infile cards truncover expandtabs;
input MC $ LC $ MCC $ MCN $ TLC $ DD $ ODS_TimeStamp ODS_LUpd zTPl $ PuD $;
cards;
1853 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1856 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1869 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
2024 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
2025 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
run;

 

You might have to format date column in the above table.

 

data have02;
infile cards truncover expandtabs;
input WPMVId ToSTimeStamp TId ASN WC $ CWC $ TSide $ MNo Y X;
cards;
1 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 1 -82140 2468
2 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 2 -81940 2466
3 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 3 -81739 2463
4 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 4 -81539 2459
5 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 5 -81339 2456
6 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 6 -81139 2453
run;


 

You might have to format date column in the above table.

 

Please help me using some alternative to SQL code above, specifically when I have issue that my Table 2 above is almost 0.8 billion rows data and it takes hell a lot time to run SQL query as above.

Super User
Posts: 5,254

Re: SAS proc SQL and Inner join - what are alternative methods

But how large is t1?
Are the tables by any chance indexed on MC/WC respectively?
And how large/small does t1 subsets t2 aproxmately?
Unless small sub with indexes hash table or user def format are your best shots.
Data never sleeps
Ask a Question
Discussion stats
  • 11 replies
  • 360 views
  • 3 likes
  • 4 in conversation