BookmarkSubscribeRSS Feed
6 REPLIES 6
ballardw
Super User

The basic check would be to use Proc Print to examine the result. You can select specific variables to see if you get expected results.

Assuming you are using a dataset Merge statement things to consider are:

   Is it appropriate to use By to match values

   Order of the data sets determines which values of like named variables are in result

   Are all variables of the same name of the same type? All numeric or all character. If Character are they all the same length (might get truncation).

  And do want to Merge (record to record) or Set entire data set follow other data set or Update (assign specific values to specific variables).

  If using By statement what do you want for results if there are multiple values of the by variables in each set? You might be looking at moving to Proc SQL and a join if you want each record of one set matched to each record of another set By specific variables.

beginner1
Fluorite | Level 6

Thanks Ballardw!!!

 

But in case of very large datasets with more than hundred variables and thousands of observations is it possible to confirm the right execution by using Proc Print??

Anyother Procedure??

Reeza
Super User

For merges, my major check is record counts, usually if that's correct its ok.

 

Because each merge is customized to the situation there isn't a default way or proc to check if it's correct. Sometimes you want multiples, in other cases you don't. 

 

Things to look out BEFORE you merge are multiple records of your BY variables.

Identify if you're doing a 1:1 or 1:many or many:many merge. Make sure SAS can handle the merge you're trying to do.

 

If you're doing a 1:1 you can verify the record count and then I verify the fields I merge in.

 

Another thing to check is variables - if you have variables with the same names in the datasets other than the BY variables you can get unexpected results. 

 

Honestly, I prefer SQL joins as I find it easier to explicitly define what I'm doing and less likely to get unexpected results, but that is a personal preference.

 

SQL Join Type

 

 

ballardw
Super User

@beginner1 wrote:

Thanks Ballardw!!!

 

But in case of very large datasets with more than hundred variables and thousands of observations is it possible to confirm the right execution by using Proc Print??

Anyother Procedure??


If I know what I want the result to look like it is usually fairly easy to check a few specific cases so Proc print with sufficient variables and possibly a where statement to check on one or two combinations of the BY variables brings up enough information to confirm that general behavior is as I intended (or not as the case may be).

Generally starting with minimum data sets to demonstrate the basic purpose works. I actually seldom turn a data step loose on sets with 100K plus datasets without testing on 10 or 100 or 1000 records, or again specific subsets picked using Where.

Quentin
Super User

First, I would always read the log.  You ignore the log at your own peril. 

 

Second, I would consider your expectations about the merge, and state those expectations so that if they are violated, they will throw an error message. 

 

For example, if I am merging two datasets, and I expect that both of them are unique by ID, and I expect all the records in both to match, I will code:

data c;
  merge a (in=a) b (in=b);
  by id;
  if NOT (a=b) then put "ERROR: mismatch records " (id a b)(=);
  if NOT (first.id and last.id) then put "ERROR: duplicate found " id=;
run;

I'm also a fan of system options that throw as many errors as possible, e.g. :

 

options dsoptions=note2err msglevel=i mergenoby=error;

dsoptions=note2err is undocumented but it converts a lot of bad log notes into errors.  msglevel=i throws an INFO: line to the log when two varibles collide.  Mergenoby=error throws an error if you forget the BY statement.

 

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3688 views
  • 7 likes
  • 4 in conversation