How to compare more than two SAS datasets

Reply
Senior User
Posts: 1

How to compare more than two SAS datasets

Hello,

 

I have been asked to compare more than two SAS data-sets. There is one base data-set and more than one compare data-sets. Is there any special program that I have to write to do this job?

 

Thank you,

AB

Super User
Posts: 23,724

Re: How to compare more than two SAS datasets

You can use PROC COMPARE multiple times or you can write in your own. If you're comparing more than a few and need specific output, its usually easiest to write your own IMO.

 


ab645 wrote:

Hello,

 

I have been asked to compare more than two SAS data-sets. There is one base data-set and more than one compare data-sets. Is there any special program that I have to write to do this job?

 

Thank you,

AB


 

PROC Star
Posts: 1,805

Re: How to compare more than two SAS datasets

what comparison do you want do perform?

 

merge or hash might fill your needs

Super User
Posts: 13,542

Re: How to compare more than two SAS datasets

I might be tempted to start by just looking at dataset/variables characteristics and membership, especially if you aren't actually interested at first in a record by record comparison.

SAS maintains a number of informational sources you can query such as dictionary tables.

 

Depending on the actual questions you need to get from your comparisons this may be all you need.

The following example builds some similar data sets then provides summaries of numbers of observations and variables per data set and another set of variable summaries such as type, label and formats.

 

The 1 in the body of the table tells which what is where.

Data work.junk1;
   informat x $25. y best4. dday mmddyy10.;
   format dday mmddyy10.;
   input x y dday;
   label x='Account ID';
datalines;
abc  123  01/01/2015
pdq  4.56 12/12/2012
;
run;

Data work.junk2;
   informat x $20. y $4. dday mmddyy10.;
   format dday mmddyy10.;
   input x y dday;
   label x='Client ID';
datalines;
abc  444  01/01/2015
pdq  4.56 12/12/2012
pdq  202  12/12/2012
;
run;

Data work.junk3;
   informat x $15. y $4. dday $10.;
   input x y dday z ;
   label x = 'Something Else';
datalines;
abc  123  01/01/2015  88
pdq  4.56 12/12/2012  99
;
run;
proc sql;
   create table datasettab
   as select * 
   from dictionary.tables
   where libname='WORK' and memname in ('JUNK1' 'JUNK2' 'JUNK3')
   ;
quit;

proc tabulate data=datasettab;
   class memname;
   class nobs nvar;
   tables nobs nvar,
         memname*n=''
         /misstext=' '
   ;
run;
proc sql;
   create table vartab as
   select *
   from dictionary.columns
   where libname='WORK' and memname in ('JUNK1' 'JUNK2' 'JUNK3')
   ;
quit;

proc tabulate data=vartab;
   class memname name type;
   class label format informat/missing;
   tables name*type,
         memname*n=''
         /misstext=' '
   ;
   tables name*label,
         memname*n=''
         /misstext=' '
   ;
   tables name*type*format*informat,
         memname*n=''
         /misstext=' '
   ;

run;


So it is easy to see that Junk1 and Junk3 have the same number of observations and Junk1 and Junk2 have the same number of variables.

 

The second table shows which table each variable is numeric, character. The third shows labels per variable and data set. This might be very helpful. (You do use variable labels don't you). The last shows the variable type with format and informat information which may clarify issues with code working on set where a date variable is actually a SAS date valued variable and not in another where the data is actually character but happens to look like a date.

 

This could reduce what you look for if you go to use Proc compare as you could use the VAR and WITH options to only compare variables that are likely to be the same.

Ask a Question
Discussion stats
  • 3 replies
  • 113 views
  • 1 like
  • 4 in conversation