BookmarkSubscribeRSS Feed
yudhishtirb
Calcite | Level 5

HI,

 

I am new to SAS programming.

 

I am looking to convert below code in SQL

 

data strat.merge_calls_1;
merge strat.trans_1 (in=jkl) strat.pay_merge2 (in=iop);
by wclientcode szaccountnumber;
if jkl and iop;
run;

 

Can someone please tell me how merge will work here ?

 

Thanks In Advance

 

 

13 REPLIES 13
Kurt_Bremser
Super User

Strictly speaking, you can't, as the data step merge behaves in a certain way when dealing with multiple instances in both datasets that can't (easily) be reproduced in SQL. At least I never tried.

If you have a 1:1 or a 1:n relationship guaranteed, then

proc sql;
create table strat.merge_calls_1 as
select * from
strat_trans_1 a, strat_pay_merge2 b
where
a.wclientcode = b.wclientcode and
a.szaccountnumber = b.szaccountnumber
;
quit;

will do the trick.

But be advised that SQL joins of large datasets might carry a huge performance penalty vs. the data step merge.

yudhishtirb
Calcite | Level 5
Hi, Thank you for feedback.

Data has n:n relations, I tried with Inner Join as will with where condition but SAS Merge output not matches with SQL output.

Please help

Thanks
Kurt_Bremser
Super User

That is because of the behaviour of the data step merge with N:N relations.

Say, dataset a has three obs for by value x, and dataset b has 2.

During the merge, a-x-1 will be merged with b-x-1, a-x-2 will be merged with b-x-2, and a-x-3 will be merged with b-x-2, as there are no more obs in dataset b for value x. This results in 3 (maximum number of both datasets) observations.

SQL, OTOH, will merge all instances with each other, resulting in 6 observations in my example.

This is why I said that, strictly speaking, you can't replace a data step merge with SQL in a quick way. OTOH, the SQL behaviour may be desired in many cases, because you can't (easily) create the cartesian join with the data step.

LinusH
Tourmaline | Level 20

As @Kurt_Bremser states, they behave differently. And I would say in a N:N relationship it's nearly impossible to get the equal result.

But why do you need to change to SQL? 

One reason could be that in N.N relationship, the SQL way handling it is more predictable. But you on other hand wish the same result - so please explain the underlying requirement.

Data never sleeps
yudhishtirb
Calcite | Level 5

Dear All,

 

Thank you very much for your feedback. Herewith I am attaching excel file which have 4 tabs

 

1. Data in Table1

2. Data in Table2

3. SAS output of below Merge statement

 

data strat.merge_calls_1;
merge strat.Table1 (in=jkl) strat.Table2 (in=iop);
by clientcode accountnumber;
if jkl and iop;
run;

 

4. SQL output of equivalent SQL statement for above merge statement

 

SELECT * FROM Table1 INNER JOIN Table2
ON Table1.ClientCode = Table2.ClientCode
AND Table1.AccountNumber = Table2.AccountNumber

 

As per documentation, Merge statement is equivalent to SQL inner join. But SAS output and SQL output are different.

 

Can someone please guide us what we are missing here ?

 

Thanks In Advance

Kurt_Bremser
Super User

Please post the log of the data step.

I will NOT open a MS Office document from the web. If you want to supply test data, do so in a data step with

cards;
yudhishtirb
Calcite | Level 5

Hi,

 

Greetings

 

I have created sample program as below to illustrate the same

 

PROGRAM:

 

 

data Test.Tab1;

input ClientCode 1-3 AccountNumber 4-5 dtTransactionDate$ 6-28;

datalines;

-1         3          07JAN2016:19:43:30.000

-1         3          12JAN2016:10:32:28.000

-1         3          25JAN2016:14:34:07.000

-1         3          02FEB2016:14:36:42.000

-1         3          17FEB2016:11:59:42.000

-1         3          22FEB2016:13:24:37.000

-1         3          03MAR2016:15:48:23.000

-1         3          08MAR2016:14:52:43.000

-1         3          21MAR2016:13:20:05.000

-1         3          31MAR2016:11:45:24.000

-1         3          05APR2016:11:43:41.000

-1         3          13APR2016:08:01:06.000

-1         3          13APR2016:08:03:27.360

-1         3          19APR2016:12:16:21.000

-1         3          25APR2016:16:35:46.000

-1         3          03MAY2016:13:25:13.000

-1         3          27MAY2016:10:41:50.000

-1         3          01JUN2016:09:30:26.000

-1         3          01JUN2016:09:37:02.000

;

run;

 

 

proc sort data=Test.Tab1;

by ClientCode AccountNumber;

run;

 

 

data Test.Tab2;

input ClientCode 1-3 AccountNumber 4-5 dtPaymentDate$ 6-28 curpayment 29-32 aa 33-34 bb 35-36 dtpaymentdate2$ 37-59 curpayment2 60-63 unqkey 64-65;

datalines;

-1         3          06NOV2015:00:00:00.000     350      1          0          26JAN2016:00:00:00.000       500      1

-1         3          26JAN2016:00:00:00.000       500      2          1          01APR2016:00:00:00.000      450      2

-1         3          01APR2016:00:00:00.000      450      3          2          26APR2016:00:00:00.000      500      3

-1         3          26APR2016:00:00:00.000      500      4          3          31MAY2016:00:00:00.000     500      4

;

run;

 

proc sort data=Test.Tab2;

by ClientCode AccountNumber;

run;

 

data Test.Tab_Merge;

            merge Test.Tab1 (in=jkl) Test.Tab2 (in=iop);

            by ClientCode AccountNumber;

 

            if jkl=1 and iop=1;

run;

proc print data=Test.Tab_Merge;

run;

 

Output Of Merge Statement is as below

clientcodeAccountnumberdtTransactionDateclientcodeAccountnumberdtPaymentDatecurpaymentaabbdtpaymentdate2curpayment2unqkey
-1307JAN2016:19:43:30.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1312JAN2016:10:32:28.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1325JAN2016:14:34:07.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1302FEB2016:14:36:42.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1317FEB2016:11:59:42.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1322FEB2016:13:24:37.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1303MAR2016:15:48:23.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1308MAR2016:14:52:43.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1321MAR2016:13:20:05.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1331MAR2016:11:45:24.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1305APR2016:11:43:41.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1313APR2016:08:01:06.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1313APR2016:08:03:27.360-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1319APR2016:12:16:21.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1325APR2016:16:35:46.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1303MAY2016:13:25:13.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1327MAY2016:10:41:50.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1301JUN2016:09:30:26.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001
-1301JUN2016:09:37:02.000-1306NOV2015:00:00:00.0003501026JAN2016:00:00:00.0005001

 

It contains 19 rows, however its equivalent SQL Inner join gives 76 rows.

 

Can someone please assist me, how SAS merge gives 19 rows as output ?

 

Thanks In Advance

Kurt_Bremser
Super User

Read my previous post again where I described how the data step merge works. What you see is perfectly normal behaviour for a data step merge when you have a M:N relationship.

If you want to see all possible combinations, you must use SQL. Period.

yudhishtirb
Calcite | Level 5

Hi,

 

Thanks for feedback.

 

So I need to search other ways than INNER JOIN to produce identical output in SQL.

Will try to creat some routines for same in SQL, if possible can you suggest us a way to achieve it in SQL ?

 

Thanks In Advance

 

 

Kurt_Bremser
Super User

Why do you need to do it in SQL, anyway? If your result with the data step is fine, then stay with it. No need to jump through hoops just to achieve the same result in SQL. Not even as an exercise, as SQL just isn't meant for this kind of join.

AliAsgar
Calcite | Level 5

We need to convert merge statement of SAS data step to inner join for SQL Server stored procedure,not for proc sql.Please guide us on that.

Thanks

Kurt_Bremser
Super User

The only idea that came to my mind for doing that in SQL is

- first, for every BY group in the two tables, create a counter variable, starting with 1 in every group

- then, do an outer join on the original BY variable and the counter variable.

LinusH
Tourmaline | Level 20
No.
Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 5848 views
  • 0 likes
  • 4 in conversation