BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi everyone,

I guess these codes should be exactly the same; however, the SAS log tells otherwise! It doesn't really make sense to me how the error arises.

/* CODES */

data merged;

    merge y1999_ (in=a) cusip (in=b);

    by cusip;

    if a & b;

run;

proc sql feedback;

    create table merged as

    select a.*

    from y1999_ as a INNER JOIN cusip as b

    on a.cusip=b.cusip

    ;

quit;

/* SAS LOGS Tell Otherwise */

/* For 1st code */

565  data merged;

566      merge y1999_ (in=a) cusip (in=b);

567      by cusip;

568      if a & b;

569  run;

NOTE: MERGE statement has more than one data set with repeats of BY values.

NOTE: There were 4857105 observations read from the data set WORK.Y1999_.

NOTE: There were 88319 observations read from the data set WORK.CUSIP.

NOTE: The data set WORK.MERGED has 3789859 observations and 14 variables.

/* For 2nd code */

570  proc sql feedback;

571      create table merged as

572      select a.*

573      from y1999_ as a INNER JOIN cusip as b

574      on a.cusip=b.cusip

575      ;

NOTE: Statement transforms to:

        select ....

          from WORK.Y1999_ A inner join WORK.CUSIP B on A.cusip = B.CUSIP;

NOTE: Table WORK.MERGED created, with 17955067 rows and 14 columns.

I guess they should be the same, right? It's an inner join that keep only observations that share the same BY variable from both data sets.

I find this a mystery, and just can't figure out why. Can you please help? Thanks a lot

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20

Editor's Note: In addition to the many resources listed below, I'm adding this video tutorial. At just under 7 minutes, it walks you through the steps of merging data sets in SAS using PROC SQL:

 

DATA step merge and SQL join handles duplicate key values differently, which is described and discussed thoroughly, both in documentation, papers and forums.  I think that SQL is more predictable, creating a product of rows, as opposed to the data step line-by-line matching.

 

Resources via @AhmedAl_Attar:

Many-to-Many data step merges have been discussed before and several approaches and solutions have been presented in previous years. Here are few samples

http://www2.sas.com/proceedings/forum2008/081-2008.pdf

http://support.sas.com/resources/papers/proceedings09/071-2009.pdf

http://www.lexjansen.com/pharmasug/2011/tu/pharmasug-2011-tu05.pdf

http://support.sas.com/resources/papers/proceedings09/071-2009.pdf

 

From @Astounding:

MERGE statement has more than one data set with repeats of BY values.

 

This is telling you that the same CUSIP appears multiple times in both incoming tables.  Are you expecting that to happen?  When it does happen, what is the proper outcome?  It's true and documented (as LinusH mentioned) that JOIN vs. MERGE handle this situation differently, but it is up to you to design the proper outcome before selecting the tool that will generate that outcome.  There are a few possible answers you might come up with, so if you post your decision you'll find there are plenty of people who could suggest a solution.

Data never sleeps

View solution in original post

7 REPLIES 7
LinusH
Tourmaline | Level 20

Editor's Note: In addition to the many resources listed below, I'm adding this video tutorial. At just under 7 minutes, it walks you through the steps of merging data sets in SAS using PROC SQL:

 

DATA step merge and SQL join handles duplicate key values differently, which is described and discussed thoroughly, both in documentation, papers and forums.  I think that SQL is more predictable, creating a product of rows, as opposed to the data step line-by-line matching.

 

Resources via @AhmedAl_Attar:

Many-to-Many data step merges have been discussed before and several approaches and solutions have been presented in previous years. Here are few samples

http://www2.sas.com/proceedings/forum2008/081-2008.pdf

http://support.sas.com/resources/papers/proceedings09/071-2009.pdf

http://www.lexjansen.com/pharmasug/2011/tu/pharmasug-2011-tu05.pdf

http://support.sas.com/resources/papers/proceedings09/071-2009.pdf

 

From @Astounding:

MERGE statement has more than one data set with repeats of BY values.

 

This is telling you that the same CUSIP appears multiple times in both incoming tables.  Are you expecting that to happen?  When it does happen, what is the proper outcome?  It's true and documented (as LinusH mentioned) that JOIN vs. MERGE handle this situation differently, but it is up to you to design the proper outcome before selecting the tool that will generate that outcome.  There are a few possible answers you might come up with, so if you post your decision you'll find there are plenty of people who could suggest a solution.

Data never sleeps
Astounding
PROC Star

The key message here is this:

MERGE statement has more than one data set with repeats of BY values.

This is telling you that the same CUSIP appears multiple times in both incoming tables.  Are you expecting that to happen?  When it does happen, what is the proper outcome?  It's true and documented (as LinusH mentioned) that JOIN vs. MERGE handle this situation differently, but it is up to you to design the proper outcome before selecting the tool that will generate that outcome.  There are a few possible answers you might come up with, so if you post your decision you'll find there are plenty of people who could suggest a solution.

Good luck.

smilingmelbourne
Fluorite | Level 6

Hi, I would like to have only the rows that match (by Cusip) across the two tables.

I read this one http://www2.sas.com/proceedings/forum2008/178-2008.pdf

Sorry I'm still a bit lost.

The paper above says that the following codes are exactly 100% the same. And I know that Proc SQL with INNER JOIN statement is the same as the WHERE statement.

proc sql;

     create table inner_sql as

     select      b.transaction

                    , b.item_id

                    , b.qty

                    , b.date

                    , a.sales_id

                    , a.name

     from sales b, salesperson a

     where a.sales_id = b.sales_id

     order by sales_id;

quit;

data inner_join;

     merge sales ( in = s )

     salesperson ( in = p );

     by sales_id;

     if s and p;

run;

My codes in the original post return different matches, no matter whether I use INNER JOIN or WHERE statement in the Proc SQL.

data merged;

588      merge y1999_ (in=a) cusip (in=b);

589      by cusip;

590      if a & b;

591  run;

NOTE: MERGE statement has more than one data set with repeats of BY values.

NOTE: There were 4857105 observations read from the data set WORK.Y1999_.

NOTE: There were 88319 observations read from the data set WORK.CUSIP.

NOTE: The data set WORK.MERGED has 3789859 observations and 14 variables.

proc sql feedback;

593      create table merged as

594      select a.*

595      from y1999_ as a, cusip as b

596      where a.cusip=b.cusip

597      ;

NOTE: Table WORK.MERGED created, with 17955067 rows and 14 columns.

Sorry for this inconvenience but I'm a bit lost actually.

Astounding
PROC Star

Here's an example of the situation you are facing.

Table A:

CUSIP   Amount

123         52

123         95

Table B:

CUSIP    Rating

123          AAA

123          AA+

123          AA-

Now when these two tables are joined by matching on CUSIP, what should the resulting table look like?  No programming statements at this point, just sketch out the result you are hoping to achieve.

Haikuo
Onyx | Level 15

OK, the short answer is: (in your setting)

if it is a ONE to ONE or ONE to MANY, then datastep merge is equivalent to Proc SQL inner join in term of results. Otherwise, it is NOT.

Haikuo

Tom
Super User Tom
Super User

From the names of you datasets I would expect that the CUSPID dataset would have only unique values of CUSPID. Your log says this is not true.

Is there something wrong with your CUSPID dataset that is causing it to have duplicate ids?

When you have two customers with the same ID how do you want to match them to the sales?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 37883 views
  • 1 like
  • 6 in conversation