DATA Step, Macro, Functions and more

Compare every variable between two datasets

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Compare every variable between two datasets

With PROC COMPARE, I noticed that there is the ability to compare variables across datasets with different names using the VAR and WITH statements.

I need to compare a table with about 1,400 variables.  My base table has just 20 variables.

I want to see if any of the 20 variables in my base table have the same observations as one of the 1400 variables in the comparison table.

I do not know which names I should place in the WITH statement in advance.  I was wondering if there were options for PROC COMPARE that enabled you to compare the observations of the 20 variables with all 1400 variables?  And if there are, is there a better approach to finding variables from your base dataset that correspond to another variable (with a different name) in another dataset?


Accepted Solutions
Solution
‎01-08-2015 01:12 PM
Trusted Advisor
Posts: 1,301

Re: Compare every variable between two datasets

Yes, you can compare a variable with more than one other variable.  I may have misread your question at first (still not 100% clear).  Assuming the following is a scaled down version of your problem:

options mprint;

data a;

input id a b c;

cards;

1 1.1 1.2 1.3

2 2.1 2.2 2.3

3 3.1 3.2 3.3

;

data b;

input idx abc bcd cde def efg fgh;

cards;

1 1.1 1.2 1.3 1.4 1.5 1.6

2 2.1 2.2 2.3 2.4 2.5 2.6

3 3.1 3.2 3.3 2.4 2.5 2.6

;

run;

%macro crazy_compare();

%let a=%sysfunc(open(work.a));

%let b=%sysfunc(open(work.b));

proc compare base=a compare=b /*listequalvar novalues nodate*/ noprint outstat=foobar(where=(_type_='NDIF' and sum(_BASE_, _COMP_)=0));

var %do i=1 %to %sysfunc(attrn(&a., nvars));

       %do j=1 %to %sysfunc(attrn(&b., nvars));

          %sysfunc(varname(&a., &i.))

       %end;

    %end;

;

with %do i=1 %to %sysfunc(attrn(&a., nvars));

        %do j=1 %to %sysfunc(attrn(&b., nvars));

           %sysfunc(varname(&b., &j.))

        %end;

     %end;

;

run;

%let rc=%sysfunc(close(&a.));

%let rc=%sysfunc(close(&b.));

%mend;

%crazy_compare

data _null_;

set foobar;

put _var_ ' = ' _with_;

run;

id  = idx

a  = abc

b  = bcd

c  = cde

View solution in original post


All Replies
Trusted Advisor
Posts: 1,301

Re: Compare every variable between two datasets

The most simple answer to your question is no...  It can be done, but not with PROC COMPARE, and with such a comparison you should expect it to take a substantial amount of time to compute.

New Contributor
Posts: 3

Re: Compare every variable between two datasets

FriedEgg:

I suspect you're right.  It very well may take a long time.

I could create a Macro loop and cycle through all of the 20 x 1400 variable combinations, testing each of the 20 variables against every one of the 1400 variables.  I could duplicate the entry of the variable I want to compare 1400 times in the VAR statement, and using PROC Contents, pass a macro variable containing the variable names from the 1400 variable dataset, to the WITH statement:

From the SUGI paper:

http://analytics.ncsu.edu/sesug/2003/TU01-Reiss.pdf

The author suggests the following code for testing one variable against several:

proc compare base = base-data-set
      compare = compare-data-set
      nosummary
;

var base-variable base-variable;

with compare-variable-#1 compare-variable-#2;

title1 'Compare One Variable (base-variable)’;
title2 ‘with Two Variables (compare-variable-#1 and compare-variable-#2)';
run;

Solution
‎01-08-2015 01:12 PM
Trusted Advisor
Posts: 1,301

Re: Compare every variable between two datasets

Yes, you can compare a variable with more than one other variable.  I may have misread your question at first (still not 100% clear).  Assuming the following is a scaled down version of your problem:

options mprint;

data a;

input id a b c;

cards;

1 1.1 1.2 1.3

2 2.1 2.2 2.3

3 3.1 3.2 3.3

;

data b;

input idx abc bcd cde def efg fgh;

cards;

1 1.1 1.2 1.3 1.4 1.5 1.6

2 2.1 2.2 2.3 2.4 2.5 2.6

3 3.1 3.2 3.3 2.4 2.5 2.6

;

run;

%macro crazy_compare();

%let a=%sysfunc(open(work.a));

%let b=%sysfunc(open(work.b));

proc compare base=a compare=b /*listequalvar novalues nodate*/ noprint outstat=foobar(where=(_type_='NDIF' and sum(_BASE_, _COMP_)=0));

var %do i=1 %to %sysfunc(attrn(&a., nvars));

       %do j=1 %to %sysfunc(attrn(&b., nvars));

          %sysfunc(varname(&a., &i.))

       %end;

    %end;

;

with %do i=1 %to %sysfunc(attrn(&a., nvars));

        %do j=1 %to %sysfunc(attrn(&b., nvars));

           %sysfunc(varname(&b., &j.))

        %end;

     %end;

;

run;

%let rc=%sysfunc(close(&a.));

%let rc=%sysfunc(close(&b.));

%mend;

%crazy_compare

data _null_;

set foobar;

put _var_ ' = ' _with_;

run;

id  = idx

a  = abc

b  = bcd

c  = cde

New Contributor
Posts: 3

Re: Compare every variable between two datasets

FriedEgg, this appears to be exactly what I was looking for.  Thank you very much.

Trusted Advisor
Posts: 1,301

Re: Compare every variable between two datasets

Glad to help, going to take quite a while longer to run with 28,000 variables...

Super User
Posts: 10,035

Re: Compare every variable between two datasets

How do you define " the same observations " ?

Make an example to explain your question better .

Super User
Super User
Posts: 7,971

Re: Compare every variable between two datasets

Should be a relatively simple two stage approach.  First identify matching variables, i.e. use sashelp.vcolumn merge where name in both datasets.  Then with that list build your proc compare.  Xia Keshan brings up a very good point however, if you are comparing just one variable against another you would need a set of common identifier variables.  (Note I haven't checked it, this is just to give idea):

proc sql;

     create table LOOP as

     select     distinct     COALESCE(A.NAME,B.NAME) as NAME

     from       (select * from SASHELP.VCOLUMN where LIBNAME="libname" and MEMNAME="datasetname") A,

                    (select * from SASHELP.VCOLUMN where LIBNAME="libname" and MEMNAME="comparedatasetname") B

     where     A.NAME=B.NAME;

quit;

data _null_;

     set loop; /* For each name create a proc compare */

     call execute('ods pdf file="compare_of_'||strip(name)||'.rtf";

                          proc compare data=....;run;

                          ods pdf close;');

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 6273 views
  • 0 likes
  • 4 in conversation