BookmarkSubscribeRSS Feed
sasexperienceto
Calcite | Level 5

 

data mydata;
  infile datalines;
  input list $ NDC discount;
datalines;
Nov 11 50
Dec 12 20
Nov 11 50
May 12 .
Mar 11 30
;
run;

data mydata2;
  infile datalines;
  input list $ NDC discount;
datalines;
Jan 11 40
Dec 12 20
Mar 10 50
May 12 .
Mar 11 30
;
run;

proc sql;
  create table x1 as
  select *
  from mydata t1 left join mydata2 t2 on(t1.list=t2.list);
quit;

proc sort data=mydata;
  by list;
run;

proc sort data=mydata2;
  by list;
run;

data x2;
  merge mydata(in=t1) mydata2;
  by list;
  if t1;
run;

Results X1:
x1 (1).PNG

Results X2:
x2.PNG

There is a topic on the forum like this one, but I would like to know how to correct the proc sql to get the same result from the SAS. can anybody help me?

 

 

12 REPLIES 12
ballardw
Super User

Data step MERGE and SQL joins have to three significant differences. The first affects how duplicate values of BY (join on) variables behave. Merge will truncate data after a limited number of matches. Depending on the specific join the behavior is quite different.

Second is the way Merge treats variables of the same name. You would have to write a specific Coalesce or Coalescec function call in Sql to replicate the behavior, for each and every like named variable.

Third, and the one that can take a truly large amount of SQL code, is the behavior when the "right" set has a value of the By variable that is not matched in the "left". SQL Left Join will completely discard the "right". So you need to do a separate step to get those. Data step Merge with a BY defaults to bringing in the observations from the second set.

 

SQL might be possible for two specific data sets to behave like merge for those sets. Change the number of observations in one of those sets and the code that worked for 12 observations may not work for 13.

My general suggestion is if a data step merge is doing what you want, leave it there. You might be surprised just how much, possibly obnoxious code, it may take to accomplish the same thing in SQL.

 

Nightmare causing to consider converting a MERGE that involves 3 or more sets (not all with the same variables) as some combination of SQL joins.

Tom
Super User Tom
Super User

You want to recreate the data step MERGE in pure SQL code?
Why?

Do you only need code that will work for this simple case where the "LEFT" dataset has unique values of the BY variable and the "RIGHT" dataset has some repeated values of the BY variable?

data dsmerge;
  merge mydata(in=in1) mydata2;
  by list;
  if in1;
run;

proc sql;
  create table sqljoin as
    select l.list
         , coalesce(r.ndc,l.ndc) as NDC
         , coalesce(r.discount,l.discount) as discount
    from mydata l
    left join mydata2 r
    on l.list = r.list
  ;
quit;
The COMPARE Procedure
Comparison of WORK.DSMERGE with WORK.SQLJOIN
(Method=EXACT)

Data Set Summary

Dataset                Created          Modified  NVar    NObs

WORK.DSMERGE  09MAY23:18:07:32  09MAY23:18:07:32     3       6
WORK.SQLJOIN  09MAY23:18:07:32  09MAY23:18:07:32     3       6


Variables Summary

Number of Variables in Common: 3.


Observation Summary

Observation      Base  Compare

First Obs           1        1
Last  Obs           6        6

Number of Observations in Common: 6.
Total Number of Observations Read from WORK.DSMERGE: 6.
Total Number of Observations Read from WORK.SQLJOIN: 6.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 6.

NOTE: No unequal values were found. All values compared are exactly equal.

It is not really possible to replicate the general data step merge.

You can get closer if you have some type of variable that numbers the rows within the BY groups in each dataset.  But you would need a DATA STEP to make such a variable.

 

 

 

sasexperienceto
Calcite | Level 5

I'm doing an implementation in a database that doesn't have the sas language just sql

Tom
Super User Tom
Super User

@sasexperienceto wrote:

I'm doing an implementation in a database that doesn't have the sas language just sql


 

Better to figure out what the goal of the complete program is and re build it from the ground up using SQL.

 

The original DATA STEP is probably not correct anyway.  You should not be merging dataset where the same non-BY variables appear in both dataset and you are not doing a one to one merge.

 

sasexperienceto
Calcite | Level 5
In the case I am implementing the system in the cloud, BigQuery and as I can only use SQL I was trying to replicate the results of my system in SAS. From what I understand the result presented by the merge would not be the most correct?
SASKiwi
PROC Star

"In the case I am implementing the system in the cloud, BigQuery and as I can only use SQL I was trying to replicate the results of my system in SAS. From what I understand the result presented by the merge would not be the most correct?"

 

I still don't understand what your actual use case is. If you are implementing a system in the cloud using SQL and BigQuery only, where is SAS's involvement? If SAS is not involved in your implementation I wouldn't be using it for testing BigQuery behaviour. 

sasexperienceto
Calcite | Level 5

I have my SAS system where I want to migrate a process to the cloud but I need the results to be the same.

sasexperienceto
Calcite | Level 5
I changed all my sas code to sql in bigquery but when I get to the merge I have these different results, but in my case they are with thousands of data, but if I manage to solve this smaller example I can probably implement the same solution in my set of data.
SASKiwi
PROC Star

Neither of the two datasets you have posted MyDATA and MYDATA2 contain a unique key for identifying rows to be updated. As a result,  duplicate key rows from MYDATA2 in a DATA step MERGE are ignored - only the first row read in for each LIST variable value is kept. Is this what is supposed to happen? 

sasexperienceto
Calcite | Level 5

I could create an identifier line with the sql row_number function in both data sets but I don't know how this code would look to solve this problem.

Tom
Super User Tom
Super User

@sasexperienceto wrote:
I changed all my sas code to sql in bigquery but when I get to the merge I have these different results, but in my case they are with thousands of data, but if I manage to solve this smaller example I can probably implement the same solution in my set of data.

For your simple example the data step merge is NOT the right way to combine that data.

 

Sounds like you need to understand what the code was SUPPOSED to do (not what it is currently doing) and get that to work in your new implementation.  If it produces different results then you have found a case where the original programming in SAS was wrong.

ballardw
Super User

@sasexperienceto wrote:

I'm doing an implementation in a database that doesn't have the sas language just sql


Suggestion:

Go a support page for the specific data base. Show them your two (or more) example starting data sets and the desired result.

 

Pretty much each version of SQL has its own additions and special tools. So someone familiar with that database is much more likely to have a workable solution than a SAS programmer that is used to working with a somewhat to significantly different dialect of SQL.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 5591 views
  • 0 likes
  • 4 in conversation