DATA Step, Macro, Functions and more

PROC SQL vs. Data Step Merge not getting same results

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 123
Accepted Solution

PROC SQL vs. Data Step Merge not getting same results

I'm having trouble understanding why my proc sql and data step merge are not getting the same results. It seems to be off by one observation (I'm trying to convert my proc sql into a data step). Here is the full code:

 

data mydata;
  infile datalines;
  input list $ NDC discount;
datalines;
Nov 11 50
Dec 12 20
Nov 11 50
May 12 .
Mar 11 30
;
run;

data mydata2;
  infile datalines;
  input list $ NDC discount;
datalines;
Jan 11 40
Dec 12 20
Mar 10 50
May 12 .
Mar 11 30
;
run;

proc sql;
  create table x1 as
  select *
  from mydata t1 left join mydata2 t2 on(t1.list=t2.list);
quit;

proc sort data=mydata;
  by list;
run;

proc sort data=mydata2;
  by list;
run;

data x2;
  merge mydata(in=t1) mydata2;
  by list;
  if t1;
run;

Results of x1:

 

x1.PNG

 

Results of x2:

 

x2.PNG

 

The second observation from x1 is different from that of x2. Why is this? It must be something extremely simple - forgive me if I'm wasting your time.


Accepted Solutions
Solution
‎11-07-2016 11:08 AM
Super User
Super User
Posts: 7,042

Re: PROC SQL vs. Data Step Merge not getting same results

[ Edited ]
Posted in reply to JediApprentice

Because your variable names are the same in both data set.  With a MERGE the last value read is the one that you will see in the output. WIth SELECT * in SQL the first instance of that variable is the one that is output if there is another variable with the same name then it is ignored.  

 

Try this step and see if it doesn't match the SQL result.

data x3;
  merge mydata2 mydata(in=t1);
  by list;
  if t1;
run;

Since MYDATA is listed last the values of NDC and DISCOUNT from the MYDATA records will overwrite the values from the MYDATA2 records.

View solution in original post


All Replies
Frequent Contributor
Posts: 123

Re: PROC SQL vs. Data Step Merge not getting same results

Posted in reply to JediApprentice

In other words, why am I getting that observation from mydata2 when I specified "if t1".

Super User
Posts: 5,504

Re: PROC SQL vs. Data Step Merge not getting same results

Posted in reply to JediApprentice

Since you are merging by LIST, IF T1 means "did this LIST value appear in T1?"  If it did, the results of the merge appear in the final data set.

 

One of the issues involved in getting different results is the meaning of SELECT * from MYDATA.  This brings in data values from MYDATA only, and brings in nothing from MYDATA2.  The only way MYDATA2 gets used is to compare its LIST values to those in MYDATA.  Which observation looks to you like it comes from MYDATA2?

Frequent Contributor
Posts: 123

Re: PROC SQL vs. Data Step Merge not getting same results

Posted in reply to Astounding

@Astounding "One of the issues involved in getting different results is the meaning of SELECT * from MYDATA.  This brings in data values from MYDATA only, and brings in nothing from MYDATA2."

 

Then how come when I do this and select * in the proc sql at the end, all values from both datasets get brought in?

 

data mydata;
  infile datalines;
  input ID $ Country $ Code;
  datalines;
A USA 228
B BGM 210
C CAN 200
;
run;

data mydata2;
  infile datalines;
  input ID $ State $ age;
  datalines;
A WI 15
B CA 12
C MN 20
;
run;

proc sql;
  create table want as
  select *
  from have t1 left join have2 t2 on(t1.ID=t2.ID);
quit;

Is it because I'm now using different variables?

Solution
‎11-07-2016 11:08 AM
Super User
Super User
Posts: 7,042

Re: PROC SQL vs. Data Step Merge not getting same results

[ Edited ]
Posted in reply to JediApprentice

Because your variable names are the same in both data set.  With a MERGE the last value read is the one that you will see in the output. WIth SELECT * in SQL the first instance of that variable is the one that is output if there is another variable with the same name then it is ignored.  

 

Try this step and see if it doesn't match the SQL result.

data x3;
  merge mydata2 mydata(in=t1);
  by list;
  if t1;
run;

Since MYDATA is listed last the values of NDC and DISCOUNT from the MYDATA records will overwrite the values from the MYDATA2 records.

Super User
Posts: 7,782

Re: PROC SQL vs. Data Step Merge not getting same results

Posted in reply to JediApprentice

The log will give you answers. Merging or joining datasets that contain the same variables always gives peculiar results (accompanied by log messages in the case of SQL when using select *) and is generally a BAD IDEA.

While SQL only takes variables from one table, the data step overwrites values in the sequence in which observations are read from the contributing tables.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 123

Re: PROC SQL vs. Data Step Merge not getting same results

[ Edited ]
Posted in reply to KurtBremser

@KurtBremser

But SQL does take variables from both tables... I'm not sure I understand. If you have two tables - lets say have multiple variables but they share one variable i ncommon and you do a left join on that, you will get all the other variables from the datasets when you use a select *

Super User
Super User
Posts: 7,042

Re: PROC SQL vs. Data Step Merge not getting same results

[ Edited ]
Posted in reply to JediApprentice

Your problem with SQL is that you cannot create a SAS dataset with two variables that have the same name.

Try this code to see what is happening.

proc sql ;
 select a.name,b.name
 from sashelp.class a
 inner join sashelp.class b
 on a.name = b.name
 ;
 create table test as
 select a.name,b.name
 from sashelp.class a
 inner join sashelp.class b
 on a.name = b.name
 ;
 select * from test ;
quit;

 

Super User
Posts: 7,782

Re: PROC SQL vs. Data Step Merge not getting same results

Posted in reply to JediApprentice

JediApprentice wrote:

@KurtBremser

But SQL does take variables from both tables... 


ONLY when they have different names.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 342 views
  • 2 likes
  • 4 in conversation