SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Inner Join Equivalent for three variables

Reply
Contributor
Posts: 73

Inner Join Equivalent for three variables

Hello,

I am trying to figure out if there is way to get same results as operating inner join on three tables in SQL by means of match-merge in DATA step?

As long as data set is sorted on BY variable, and subsetting IF condition in used on correct variable, it is straightforward to operate inner join for two data sets in DATA step. However is it possible to add third data set as well? What effect will it have to introduce two variables in BY statement of MERGE?

Thanks.

Respected Advisor
Posts: 3,124

Re: Inner Join Equivalent for three variables

"As long as data set is sorted on BY variable, and subsetting IF condition in used on correct variable, it is straightforward to operate inner join for two data sets in DATA step"

Data Step merge is running on completely different mechanism From Proc SQL.  Resultwise is only comparable if it is "one to one" or "one to many" join or merge, datastep merge will give you something unexpected if it is 'many to many' merge.

Back to your question:  "is it possible to add third data set as well?", the answer is 'YES', it is possible to add more than 3 tables as long as they share the 'by' variable, which again proc sql join does not have this restriction.

Finally, is there anything comparable to Proc SQL join in Data Step regime? Yes, check docs on Hash() method.

HTH,

Haikuo

Contributor
Posts: 72

Re: Inner Join Equivalent for three variables

Dhanashree,

If you are going to be doing a lot of PROC SQL programming, then I would recommend taking a look at Kirk Lafler's book:

PROC SQL: Beyond the Basics Using SAS, Second Edition

He just came out with the second edition of this classic text.

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE

(aka Michael A. Raithel)

Best of luck in all your SAS endeavors!

----MMMMIIIIKKKKEEEE
(aka Michael A. Raithel)
Amazon Author's Page: http://www.amazon.com/Michael-A.-Raithel/e/B001K8GG90/ref=ntt_dp_epwbk_0
Contributor
Posts: 43

Re: Inner Join Equivalent for three variables

To expand on Hai.kuo's response:

Using the DATA step MERGE:

1.  You can use two or more variables in the BY statement, and all of the variables must match to merge the data together.

2.  You can merge three datasets together, but you should make sure that at least two of them have only one row per BY variables, ie. one-to-one match.  Make sure to check your saslog.  SAS will give a warning if "more than one data set have multiples of the BY variable".  When this happens, the results are not at all what you would expect or want!

3.  Definitely learn to use PROC SQL to do more complex merges (JOINs) of data.  See MMMMIIIIKKKKEEEE's refernce to Kirk Lafler's book.  Make some simple test files with multiple rows of your BY variables and learn what the results look like, and compare to what the DATA MERGE does.

Ask a Question
Discussion stats
  • 3 replies
  • 516 views
  • 8 likes
  • 4 in conversation