- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data mydata;
infile datalines;
input list $ NDC discount;
datalines;
Nov 11 50
Dec 12 20
Nov 11 50
May 12 .
Mar 11 30
;
run;
data mydata2;
infile datalines;
input list $ NDC discount;
datalines;
Jan 11 40
Dec 12 20
Mar 10 50
May 12 .
Mar 11 30
;
run;
proc sql;
create table x1 as
select *
from mydata t1 left join mydata2 t2 on(t1.list=t2.list);
quit;
proc sort data=mydata;
by list;
run;
proc sort data=mydata2;
by list;
run;
data x2;
merge mydata(in=t1) mydata2;
by list;
if t1;
run;
Results X1:

Results X2:

There is a topic on the forum like this one, but I would like to know how to correct the proc sql to get the same result from the SAS. can anybody help me?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data step MERGE and SQL joins have to three significant differences. The first affects how duplicate values of BY (join on) variables behave. Merge will truncate data after a limited number of matches. Depending on the specific join the behavior is quite different.
Second is the way Merge treats variables of the same name. You would have to write a specific Coalesce or Coalescec function call in Sql to replicate the behavior, for each and every like named variable.
Third, and the one that can take a truly large amount of SQL code, is the behavior when the "right" set has a value of the By variable that is not matched in the "left". SQL Left Join will completely discard the "right". So you need to do a separate step to get those. Data step Merge with a BY defaults to bringing in the observations from the second set.
SQL might be possible for two specific data sets to behave like merge for those sets. Change the number of observations in one of those sets and the code that worked for 12 observations may not work for 13.
My general suggestion is if a data step merge is doing what you want, leave it there. You might be surprised just how much, possibly obnoxious code, it may take to accomplish the same thing in SQL.
Nightmare causing to consider converting a MERGE that involves 3 or more sets (not all with the same variables) as some combination of SQL joins.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You want to recreate the data step MERGE in pure SQL code?
Why?
Do you only need code that will work for this simple case where the "LEFT" dataset has unique values of the BY variable and the "RIGHT" dataset has some repeated values of the BY variable?
data dsmerge;
merge mydata(in=in1) mydata2;
by list;
if in1;
run;
proc sql;
create table sqljoin as
select l.list
, coalesce(r.ndc,l.ndc) as NDC
, coalesce(r.discount,l.discount) as discount
from mydata l
left join mydata2 r
on l.list = r.list
;
quit;
The COMPARE Procedure Comparison of WORK.DSMERGE with WORK.SQLJOIN (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.DSMERGE 09MAY23:18:07:32 09MAY23:18:07:32 3 6 WORK.SQLJOIN 09MAY23:18:07:32 09MAY23:18:07:32 3 6 Variables Summary Number of Variables in Common: 3. Observation Summary Observation Base Compare First Obs 1 1 Last Obs 6 6 Number of Observations in Common: 6. Total Number of Observations Read from WORK.DSMERGE: 6. Total Number of Observations Read from WORK.SQLJOIN: 6. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 6. NOTE: No unequal values were found. All values compared are exactly equal.
It is not really possible to replicate the general data step merge.
You can get closer if you have some type of variable that numbers the rows within the BY groups in each dataset. But you would need a DATA STEP to make such a variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm doing an implementation in a database that doesn't have the sas language just sql
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sasexperienceto wrote:
I'm doing an implementation in a database that doesn't have the sas language just sql
Better to figure out what the goal of the complete program is and re build it from the ground up using SQL.
The original DATA STEP is probably not correct anyway. You should not be merging dataset where the same non-BY variables appear in both dataset and you are not doing a one to one merge.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"In the case I am implementing the system in the cloud, BigQuery and as I can only use SQL I was trying to replicate the results of my system in SAS. From what I understand the result presented by the merge would not be the most correct?"
I still don't understand what your actual use case is. If you are implementing a system in the cloud using SQL and BigQuery only, where is SAS's involvement? If SAS is not involved in your implementation I wouldn't be using it for testing BigQuery behaviour.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have my SAS system where I want to migrate a process to the cloud but I need the results to be the same.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Neither of the two datasets you have posted MyDATA and MYDATA2 contain a unique key for identifying rows to be updated. As a result, duplicate key rows from MYDATA2 in a DATA step MERGE are ignored - only the first row read in for each LIST variable value is kept. Is this what is supposed to happen?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I could create an identifier line with the sql row_number function in both data sets but I don't know how this code would look to solve this problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sasexperienceto wrote:
I changed all my sas code to sql in bigquery but when I get to the merge I have these different results, but in my case they are with thousands of data, but if I manage to solve this smaller example I can probably implement the same solution in my set of data.
For your simple example the data step merge is NOT the right way to combine that data.
Sounds like you need to understand what the code was SUPPOSED to do (not what it is currently doing) and get that to work in your new implementation. If it produces different results then you have found a case where the original programming in SAS was wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sasexperienceto wrote:
I'm doing an implementation in a database that doesn't have the sas language just sql
Suggestion:
Go a support page for the specific data base. Show them your two (or more) example starting data sets and the desired result.
Pretty much each version of SQL has its own additions and special tools. So someone familiar with that database is much more likely to have a workable solution than a SAS programmer that is used to working with a somewhat to significantly different dialect of SQL.