SAS Programming

sasexperienceto · Posted 05-09-2023 04:57 PM

data mydata;
  infile datalines;
  input list $ NDC discount;
datalines;
Nov 11 50
Dec 12 20
Nov 11 50
May 12 .
Mar 11 30
;
run;

data mydata2;
  infile datalines;
  input list $ NDC discount;
datalines;
Jan 11 40
Dec 12 20
Mar 10 50
May 12 .
Mar 11 30
;
run;

proc sql;
  create table x1 as
  select *
  from mydata t1 left join mydata2 t2 on(t1.list=t2.list);
quit;

proc sort data=mydata;
  by list;
run;

proc sort data=mydata2;
  by list;
run;

data x2;
  merge mydata(in=t1) mydata2;
  by list;
  if t1;
run;

Results X1:


Results X2:


There is a topic on the forum like this one, but I would like to know how to correct the proc sql to get the same result from the SAS. can anybody help me?

ballardw · Posted 05-09-2023 05:31 PM

Data step MERGE and SQL joins have to three significant differences. The first affects how duplicate values of BY (join on) variables behave. Merge will truncate data after a limited number of matches. Depending on the specific join the behavior is quite different.

Second is the way Merge treats variables of the same name. You would have to write a specific Coalesce or Coalescec function call in Sql to replicate the behavior, for each and every like named variable.

Third, and the one that can take a truly large amount of SQL code, is the behavior when the "right" set has a value of the By variable that is not matched in the "left". SQL Left Join will completely discard the "right". So you need to do a separate step to get those. Data step Merge with a BY defaults to bringing in the observations from the second set.

SQL might be possible for two specific data sets to behave like merge for those sets. Change the number of observations in one of those sets and the code that worked for 12 observations may not work for 13.

My general suggestion is if a data step merge is doing what you want, leave it there. You might be surprised just how much, possibly obnoxious code, it may take to accomplish the same thing in SQL.

Nightmare causing to consider converting a MERGE that involves 3 or more sets (not all with the same variables) as some combination of SQL joins.

Tom · Posted 05-09-2023 05:56 PM

You want to recreate the data step MERGE in pure SQL code?
Why?

Do you only need code that will work for this simple case where the "LEFT" dataset has unique values of the BY variable and the "RIGHT" dataset has some repeated values of the BY variable?

data dsmerge;
  merge mydata(in=in1) mydata2;
  by list;
  if in1;
run;

proc sql;
  create table sqljoin as
    select l.list
         , coalesce(r.ndc,l.ndc) as NDC
         , coalesce(r.discount,l.discount) as discount
    from mydata l
    left join mydata2 r
    on l.list = r.list
  ;
quit;

The COMPARE Procedure
Comparison of WORK.DSMERGE with WORK.SQLJOIN
(Method=EXACT)

Data Set Summary

Dataset                Created          Modified  NVar    NObs

WORK.DSMERGE  09MAY23:18:07:32  09MAY23:18:07:32     3       6
WORK.SQLJOIN  09MAY23:18:07:32  09MAY23:18:07:32     3       6


Variables Summary

Number of Variables in Common: 3.


Observation Summary

Observation      Base  Compare

First Obs           1        1
Last  Obs           6        6

Number of Observations in Common: 6.
Total Number of Observations Read from WORK.DSMERGE: 6.
Total Number of Observations Read from WORK.SQLJOIN: 6.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 6.

NOTE: No unequal values were found. All values compared are exactly equal.

It is not really possible to replicate the general data step merge.

You can get closer if you have some type of variable that numbers the rows within the BY groups in each dataset. But you would need a DATA STEP to make such a variable.

sasexperienceto · Posted 05-09-2023 05:59 PM

I'm doing an implementation in a database that doesn't have the sas language just sql

Tom · Posted 05-09-2023 06:12 PM

@sasexperienceto wrote:

I'm doing an implementation in a database that doesn't have the sas language just sql

Better to figure out what the goal of the complete program is and re build it from the ground up using SQL.

The original DATA STEP is probably not correct anyway. You should not be merging dataset where the same non-BY variables appear in both dataset and you are not doing a one to one merge.

sasexperienceto · Posted 05-09-2023 09:52 PM

In the case I am implementing the system in the cloud, BigQuery and as I can only use SQL I was trying to replicate the results of my system in SAS. From what I understand the result presented by the merge would not be the most correct?

SASKiwi · Posted 05-09-2023 10:16 PM

"In the case I am implementing the system in the cloud, BigQuery and as I can only use SQL I was trying to replicate the results of my system in SAS. From what I understand the result presented by the merge would not be the most correct?"

I still don't understand what your actual use case is. If you are implementing a system in the cloud using SQL and BigQuery only, where is SAS's involvement? If SAS is not involved in your implementation I wouldn't be using it for testing BigQuery behaviour.

sasexperienceto · Posted 05-09-2023 10:19 PM

I have my SAS system where I want to migrate a process to the cloud but I need the results to be the same.

sasexperienceto · Posted 05-09-2023 10:21 PM

I changed all my sas code to sql in bigquery but when I get to the merge I have these different results, but in my case they are with thousands of data, but if I manage to solve this smaller example I can probably implement the same solution in my set of data.

SASKiwi · Posted 05-09-2023 10:58 PM

Neither of the two datasets you have posted MyDATA and MYDATA2 contain a unique key for identifying rows to be updated. As a result, duplicate key rows from MYDATA2 in a DATA step MERGE are ignored - only the first row read in for each LIST variable value is kept. Is this what is supposed to happen?

sasexperienceto · Posted 05-10-2023 10:48 AM

I could create an identifier line with the sql row_number function in both data sets but I don't know how this code would look to solve this problem.

Tom · Posted 05-10-2023 11:00 AM

@sasexperienceto wrote:
I changed all my sas code to sql in bigquery but when I get to the merge I have these different results, but in my case they are with thousands of data, but if I manage to solve this smaller example I can probably implement the same solution in my set of data.

For your simple example the data step merge is NOT the right way to combine that data.

Sounds like you need to understand what the code was SUPPOSED to do (not what it is currently doing) and get that to work in your new implementation. If it produces different results then you have found a case where the original programming in SAS was wrong.

ballardw · Posted 05-09-2023 06:47 PM

@sasexperienceto wrote:

I'm doing an implementation in a database that doesn't have the sas language just sql

Suggestion:

Go a support page for the specific data base. Show them your two (or more) example starting data sets and the desired result.

Pretty much each version of SQL has its own additions and special tools. So someone familiar with that database is much more likely to have a workable solution than a SAS programmer that is used to working with a somewhat to significantly different dialect of SQL.

SAS Programming

proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Re: proc sql join and merge different results

Different results from Proc arima and proc autoreg

PROC SQL - Insert

Proc SQL Left Join Based on a Condition

PROC SQL- Order by

Why proc sql and proc means produce different results?

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...