HI,
I am new to SAS programming.
I am looking to convert below code in SQL
data strat.merge_calls_1;
merge strat.trans_1 (in=jkl) strat.pay_merge2 (in=iop);
by wclientcode szaccountnumber;
if jkl and iop;
run;
Can someone please tell me how merge will work here ?
Thanks In Advance
Strictly speaking, you can't, as the data step merge behaves in a certain way when dealing with multiple instances in both datasets that can't (easily) be reproduced in SQL. At least I never tried.
If you have a 1:1 or a 1:n relationship guaranteed, then
proc sql;
create table strat.merge_calls_1 as
select * from
strat_trans_1 a, strat_pay_merge2 b
where
a.wclientcode = b.wclientcode and
a.szaccountnumber = b.szaccountnumber
;
quit;
will do the trick.
But be advised that SQL joins of large datasets might carry a huge performance penalty vs. the data step merge.
That is because of the behaviour of the data step merge with N:N relations.
Say, dataset a has three obs for by value x, and dataset b has 2.
During the merge, a-x-1 will be merged with b-x-1, a-x-2 will be merged with b-x-2, and a-x-3 will be merged with b-x-2, as there are no more obs in dataset b for value x. This results in 3 (maximum number of both datasets) observations.
SQL, OTOH, will merge all instances with each other, resulting in 6 observations in my example.
This is why I said that, strictly speaking, you can't replace a data step merge with SQL in a quick way. OTOH, the SQL behaviour may be desired in many cases, because you can't (easily) create the cartesian join with the data step.
As @Kurt_Bremser states, they behave differently. And I would say in a N:N relationship it's nearly impossible to get the equal result.
But why do you need to change to SQL?
One reason could be that in N.N relationship, the SQL way handling it is more predictable. But you on other hand wish the same result - so please explain the underlying requirement.
Dear All,
Thank you very much for your feedback. Herewith I am attaching excel file which have 4 tabs
1. Data in Table1
2. Data in Table2
3. SAS output of below Merge statement
data strat.merge_calls_1;
merge strat.Table1 (in=jkl) strat.Table2 (in=iop);
by clientcode accountnumber;
if jkl and iop;
run;
4. SQL output of equivalent SQL statement for above merge statement
SELECT * FROM Table1 INNER JOIN Table2
ON Table1.ClientCode = Table2.ClientCode
AND Table1.AccountNumber = Table2.AccountNumber
As per documentation, Merge statement is equivalent to SQL inner join. But SAS output and SQL output are different.
Can someone please guide us what we are missing here ?
Thanks In Advance
Please post the log of the data step.
I will NOT open a MS Office document from the web. If you want to supply test data, do so in a data step with
cards;
Hi,
Greetings
I have created sample program as below to illustrate the same
PROGRAM:
data Test.Tab1;
input ClientCode 1-3 AccountNumber 4-5 dtTransactionDate$ 6-28;
datalines;
-1 3 07JAN2016:19:43:30.000
-1 3 12JAN2016:10:32:28.000
-1 3 25JAN2016:14:34:07.000
-1 3 02FEB2016:14:36:42.000
-1 3 17FEB2016:11:59:42.000
-1 3 22FEB2016:13:24:37.000
-1 3 03MAR2016:15:48:23.000
-1 3 08MAR2016:14:52:43.000
-1 3 21MAR2016:13:20:05.000
-1 3 31MAR2016:11:45:24.000
-1 3 05APR2016:11:43:41.000
-1 3 13APR2016:08:01:06.000
-1 3 13APR2016:08:03:27.360
-1 3 19APR2016:12:16:21.000
-1 3 25APR2016:16:35:46.000
-1 3 03MAY2016:13:25:13.000
-1 3 27MAY2016:10:41:50.000
-1 3 01JUN2016:09:30:26.000
-1 3 01JUN2016:09:37:02.000
;
run;
proc sort data=Test.Tab1;
by ClientCode AccountNumber;
run;
data Test.Tab2;
input ClientCode 1-3 AccountNumber 4-5 dtPaymentDate$ 6-28 curpayment 29-32 aa 33-34 bb 35-36 dtpaymentdate2$ 37-59 curpayment2 60-63 unqkey 64-65;
datalines;
-1 3 06NOV2015:00:00:00.000 350 1 0 26JAN2016:00:00:00.000 500 1
-1 3 26JAN2016:00:00:00.000 500 2 1 01APR2016:00:00:00.000 450 2
-1 3 01APR2016:00:00:00.000 450 3 2 26APR2016:00:00:00.000 500 3
-1 3 26APR2016:00:00:00.000 500 4 3 31MAY2016:00:00:00.000 500 4
;
run;
proc sort data=Test.Tab2;
by ClientCode AccountNumber;
run;
data Test.Tab_Merge;
merge Test.Tab1 (in=jkl) Test.Tab2 (in=iop);
by ClientCode AccountNumber;
if jkl=1 and iop=1;
run;
proc print data=Test.Tab_Merge;
run;
Output Of Merge Statement is as below
clientcode | Accountnumber | dtTransactionDate | clientcode | Accountnumber | dtPaymentDate | curpayment | aa | bb | dtpaymentdate2 | curpayment2 | unqkey |
-1 | 3 | 07JAN2016:19:43:30.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 12JAN2016:10:32:28.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 25JAN2016:14:34:07.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 02FEB2016:14:36:42.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 17FEB2016:11:59:42.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 22FEB2016:13:24:37.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 03MAR2016:15:48:23.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 08MAR2016:14:52:43.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 21MAR2016:13:20:05.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 31MAR2016:11:45:24.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 05APR2016:11:43:41.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 13APR2016:08:01:06.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 13APR2016:08:03:27.360 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 19APR2016:12:16:21.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 25APR2016:16:35:46.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 03MAY2016:13:25:13.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 27MAY2016:10:41:50.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 01JUN2016:09:30:26.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
-1 | 3 | 01JUN2016:09:37:02.000 | -1 | 3 | 06NOV2015:00:00:00.000 | 350 | 1 | 0 | 26JAN2016:00:00:00.000 | 500 | 1 |
It contains 19 rows, however its equivalent SQL Inner join gives 76 rows.
Can someone please assist me, how SAS merge gives 19 rows as output ?
Thanks In Advance
Read my previous post again where I described how the data step merge works. What you see is perfectly normal behaviour for a data step merge when you have a M:N relationship.
If you want to see all possible combinations, you must use SQL. Period.
Hi,
Thanks for feedback.
So I need to search other ways than INNER JOIN to produce identical output in SQL.
Will try to creat some routines for same in SQL, if possible can you suggest us a way to achieve it in SQL ?
Thanks In Advance
Why do you need to do it in SQL, anyway? If your result with the data step is fine, then stay with it. No need to jump through hoops just to achieve the same result in SQL. Not even as an exercise, as SQL just isn't meant for this kind of join.
We need to convert merge statement of SAS data step to inner join for SQL Server stored procedure,not for proc sql.Please guide us on that.
Thanks
The only idea that came to my mind for doing that in SQL is
- first, for every BY group in the two tables, create a counter variable, starting with 1 in every group
- then, do an outer join on the original BY variable and the counter variable.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.