What I want to do is to find an alternative to the following code:
PROC SQL;
CREATE TABLE WORK.XX AS
SELECT DISTINCT t2.WC, t2.CWC
FROM WORK.YY t1
INNER JOIN WORK.ZZ t2 ON (t1.MC = t2.WC)
;
QUIT;
Could someone please help in doing the same thing using hash or any other method?
If there is a 1(YY) to n(ZZ) relation, then
data work.xx;
merge
work.yy (in=a keep=mc rename=(mc=wc))
work.zz (in=b keep=wc cwc)
;
by wc;
if a and b;
run;
proc sort data=work.xx nodupkey;
by wc cwc;
run;
should deliver the same result.
Many thanks for your help.
unfortunately, it returned me an error:
ERROR: BY variables are not properly sorted on data set WORK.YY
a=1 b=1 WC=7507 CWC= FIRST.WC=0 LAST.WC=1 _ERROR_=1 _N_=585904753
Then you need to sort your datasets by the variables that are used in the merge.
The solution depends very much on the size(s). If one of the tables fits into MEMSIZE, creating a format or using a hash object can speed your process up.
If that is not the case, you will have to sort the tables for the merge. In my experience, using explicit sorts and a merge will out-perform SQL when the contributing tables grow large
Do the variables WC and CWC also exist in WORK.YY? 🙂
@PeterClemmensen wrote:
Do the variables WC and CWC also exist in WORK.YY? 🙂
Good catch. I edited my post to safeguard against that.
XX Table has following vars:
MC LC MCC MCN TLC DD
YY:
WC CWC TS MS Y X
Where MC and WC are same datatype (char) and common values but only var names are different i.e. in XX it is MC and in YY it is WC
CWC is only in YY and not in XX
I have the following tables:
data have01;
infile cards truncover expandtabs;
input MC $ LC $ MCC $ MCN $ TLC $ DD $ ODS_TimeStamp ODS_LUpd zTPl $ PuD $;
cards;
1853 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1856 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
1869 DR14 1 Vetu SM3 . 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 0 .
2024 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
2025 DV16 1 Vetu SM3 2008-01-31 24SEP2013:10:06:53.580 20JUL2016:12:55:39.240 47 .
run;
You might have to format date column in the above table.
data have02;
infile cards truncover expandtabs;
input WPMVId ToSTimeStamp TId ASN WC $ CWC $ TSide $ MNo Y X;
cards;
1 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 1 -82140 2468
2 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 2 -81940 2466
3 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 3 -81739 2463
4 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 4 -81539 2459
5 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 5 -81339 2456
6 21AUG2012:17:57:39.000 20949 1 7604 HPUS230 R 6 -81139 2453
run;
You might have to format date column in the above table.
Please help me using some alternative to SQL code above, specifically when I have issue that my Table 2 above is almost 0.8 billion rows data and it takes hell a lot time to run SQL query as above.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.