Hi All,


In which senario we should use proc sql over data step while merging two dataset? If I have a dataset with 2 million records and other has only 200 records. which method should I use while joining these two tables?






The choice usualy depends on the result you want:


  • Is there a one-to-one match, a many-to-one match, or a many-to-many match
  • Do you want matches only, or do you want some (or all) of the mismatches?

It might also depend on your skill with the DATA step vs. SQL.


The bottom line is you have to describe the inputs and outputs at least a little bit.

There's no hard and fast rule. And it isn't only between SQL and data step, hash objects and formats can also be options for lookups. 


In general, a many to many merge is often best done in SQL. Everything else has multiple options. 

