DATA Step, Macro, Functions and more

How to check if merge happened correctly in sas and things to remember when performing a merge?

Reply
Occasional Contributor
Posts: 6

How to check if merge happened correctly in sas and things to remember when performing a merge?

 
Super User
Posts: 11,343

Re: How to check if merge happened correctly in sas and things to remember when performing a merge?

Posted in reply to beginner1

The basic check would be to use Proc Print to examine the result. You can select specific variables to see if you get expected results.

Assuming you are using a dataset Merge statement things to consider are:

   Is it appropriate to use By to match values

   Order of the data sets determines which values of like named variables are in result

   Are all variables of the same name of the same type? All numeric or all character. If Character are they all the same length (might get truncation).

  And do want to Merge (record to record) or Set entire data set follow other data set or Update (assign specific values to specific variables).

  If using By statement what do you want for results if there are multiple values of the by variables in each set? You might be looking at moving to Proc SQL and a join if you want each record of one set matched to each record of another set By specific variables.

Occasional Contributor
Posts: 6

Re: How to check if merge happened correctly in sas and things to remember when performing a merge?

Thanks Ballardw!!!

 

But in case of very large datasets with more than hundred variables and thousands of observations is it possible to confirm the right execution by using Proc Print??

Anyother Procedure??

Super User
Posts: 19,870

Re: How to check if merge happened correctly in sas and things to remember when performing a merge?

[ Edited ]
Posted in reply to beginner1

For merges, my major check is record counts, usually if that's correct its ok.

 

Because each merge is customized to the situation there isn't a default way or proc to check if it's correct. Sometimes you want multiples, in other cases you don't. 

 

Things to look out BEFORE you merge are multiple records of your BY variables.

Identify if you're doing a 1:1 or 1:many or many:many merge. Make sure SAS can handle the merge you're trying to do.

 

If you're doing a 1:1 you can verify the record count and then I verify the fields I merge in.

 

Another thing to check is variables - if you have variables with the same names in the datasets other than the BY variables you can get unexpected results. 

 

Honestly, I prefer SQL joins as I find it easier to explicitly define what I'm doing and less likely to get unexpected results, but that is a personal preference.

 

SQL Join Type

 

 

Super User
Posts: 11,343

Re: How to check if merge happened correctly in sas and things to remember when performing a merge?

Posted in reply to beginner1

beginner1 wrote:

Thanks Ballardw!!!

 

But in case of very large datasets with more than hundred variables and thousands of observations is it possible to confirm the right execution by using Proc Print??

Anyother Procedure??


If I know what I want the result to look like it is usually fairly easy to check a few specific cases so Proc print with sufficient variables and possibly a where statement to check on one or two combinations of the BY variables brings up enough information to confirm that general behavior is as I intended (or not as the case may be).

Generally starting with minimum data sets to demonstrate the basic purpose works. I actually seldom turn a data step loose on sets with 100K plus datasets without testing on 10 or 100 or 1000 records, or again specific subsets picked using Where.

Super User
Posts: 19,870

Re: How to check if merge happened correctly in sas and things to remember when performing a merge?

Posted in reply to beginner1

Remember to define and test your edge cases!!!!

PROC Star
Posts: 1,325

Re: How to check if merge happened correctly in sas and things to remember when performing a merge?

[ Edited ]
Posted in reply to beginner1

First, I would always read the log.  You ignore the log at your own peril. 

 

Second, I would consider your expectations about the merge, and state those expectations so that if they are violated, they will throw an error message. 

 

For example, if I am merging two datasets, and I expect that both of them are unique by ID, and I expect all the records in both to match, I will code:

data c;
  merge a (in=a) b (in=b);
  by id;
  if NOT (a=b) then put "ERROR: mismatch records " (id a b)(=);
  if NOT (first.id and last.id) then put "ERROR: duplicate found " id=;
run;

I'm also a fan of system options that throw as many errors as possible, e.g. :

 

options dsoptions=note2err msglevel=i mergenoby=error;

dsoptions=note2err is undocumented but it converts a lot of bad log notes into errors.  msglevel=i throws an INFO: line to the log when two varibles collide.  Mergenoby=error throws an error if you forget the BY statement.

 

Ask a Question
Discussion stats
  • 6 replies
  • 475 views
  • 7 likes
  • 4 in conversation