The basic check would be to use Proc Print to examine the result. You can select specific variables to see if you get expected results.
Assuming you are using a dataset Merge statement things to consider are:
Is it appropriate to use By to match values
Order of the data sets determines which values of like named variables are in result
Are all variables of the same name of the same type? All numeric or all character. If Character are they all the same length (might get truncation).
And do want to Merge (record to record) or Set entire data set follow other data set or Update (assign specific values to specific variables).
If using By statement what do you want for results if there are multiple values of the by variables in each set? You might be looking at moving to Proc SQL and a join if you want each record of one set matched to each record of another set By specific variables.
Thanks Ballardw!!!
But in case of very large datasets with more than hundred variables and thousands of observations is it possible to confirm the right execution by using Proc Print??
Anyother Procedure??
For merges, my major check is record counts, usually if that's correct its ok.
Because each merge is customized to the situation there isn't a default way or proc to check if it's correct. Sometimes you want multiples, in other cases you don't.
Things to look out BEFORE you merge are multiple records of your BY variables.
Identify if you're doing a 1:1 or 1:many or many:many merge. Make sure SAS can handle the merge you're trying to do.
If you're doing a 1:1 you can verify the record count and then I verify the fields I merge in.
Another thing to check is variables - if you have variables with the same names in the datasets other than the BY variables you can get unexpected results.
Honestly, I prefer SQL joins as I find it easier to explicitly define what I'm doing and less likely to get unexpected results, but that is a personal preference.
@beginner1 wrote:
Thanks Ballardw!!!
But in case of very large datasets with more than hundred variables and thousands of observations is it possible to confirm the right execution by using Proc Print??
Anyother Procedure??
If I know what I want the result to look like it is usually fairly easy to check a few specific cases so Proc print with sufficient variables and possibly a where statement to check on one or two combinations of the BY variables brings up enough information to confirm that general behavior is as I intended (or not as the case may be).
Generally starting with minimum data sets to demonstrate the basic purpose works. I actually seldom turn a data step loose on sets with 100K plus datasets without testing on 10 or 100 or 1000 records, or again specific subsets picked using Where.
Remember to define and test your edge cases!!!!
First, I would always read the log. You ignore the log at your own peril.
Second, I would consider your expectations about the merge, and state those expectations so that if they are violated, they will throw an error message.
For example, if I am merging two datasets, and I expect that both of them are unique by ID, and I expect all the records in both to match, I will code:
data c;
merge a (in=a) b (in=b);
by id;
if NOT (a=b) then put "ERROR: mismatch records " (id a b)(=);
if NOT (first.id and last.id) then put "ERROR: duplicate found " id=;
run;
I'm also a fan of system options that throw as many errors as possible, e.g. :
options dsoptions=note2err msglevel=i mergenoby=error;
dsoptions=note2err is undocumented but it converts a lot of bad log notes into errors. msglevel=i throws an INFO: line to the log when two varibles collide. Mergenoby=error throws an error if you forget the BY statement.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.