BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rcclark2
Calcite | Level 5

With PROC COMPARE, I noticed that there is the ability to compare variables across datasets with different names using the VAR and WITH statements.

I need to compare a table with about 1,400 variables.  My base table has just 20 variables.

I want to see if any of the 20 variables in my base table have the same observations as one of the 1400 variables in the comparison table.

I do not know which names I should place in the WITH statement in advance.  I was wondering if there were options for PROC COMPARE that enabled you to compare the observations of the 20 variables with all 1400 variables?  And if there are, is there a better approach to finding variables from your base dataset that correspond to another variable (with a different name) in another dataset?

1 ACCEPTED SOLUTION

Accepted Solutions
FriedEgg
SAS Employee

Yes, you can compare a variable with more than one other variable.  I may have misread your question at first (still not 100% clear).  Assuming the following is a scaled down version of your problem:

options mprint;

data a;

input id a b c;

cards;

1 1.1 1.2 1.3

2 2.1 2.2 2.3

3 3.1 3.2 3.3

;

data b;

input idx abc bcd cde def efg fgh;

cards;

1 1.1 1.2 1.3 1.4 1.5 1.6

2 2.1 2.2 2.3 2.4 2.5 2.6

3 3.1 3.2 3.3 2.4 2.5 2.6

;

run;

%macro crazy_compare();

%let a=%sysfunc(open(work.a));

%let b=%sysfunc(open(work.b));

proc compare base=a compare=b /*listequalvar novalues nodate*/ noprint outstat=foobar(where=(_type_='NDIF' and sum(_BASE_, _COMP_)=0));

var %do i=1 %to %sysfunc(attrn(&a., nvars));

       %do j=1 %to %sysfunc(attrn(&b., nvars));

          %sysfunc(varname(&a., &i.))

       %end;

    %end;

;

with %do i=1 %to %sysfunc(attrn(&a., nvars));

        %do j=1 %to %sysfunc(attrn(&b., nvars));

           %sysfunc(varname(&b., &j.))

        %end;

     %end;

;

run;

%let rc=%sysfunc(close(&a.));

%let rc=%sysfunc(close(&b.));

%mend;

%crazy_compare

data _null_;

set foobar;

put _var_ ' = ' _with_;

run;

id  = idx

a  = abc

b  = bcd

c  = cde

View solution in original post

7 REPLIES 7
FriedEgg
SAS Employee

The most simple answer to your question is no...  It can be done, but not with PROC COMPARE, and with such a comparison you should expect it to take a substantial amount of time to compute.

rcclark2
Calcite | Level 5

FriedEgg:

I suspect you're right.  It very well may take a long time.

I could create a Macro loop and cycle through all of the 20 x 1400 variable combinations, testing each of the 20 variables against every one of the 1400 variables.  I could duplicate the entry of the variable I want to compare 1400 times in the VAR statement, and using PROC Contents, pass a macro variable containing the variable names from the 1400 variable dataset, to the WITH statement:

From the SUGI paper:

http://analytics.ncsu.edu/sesug/2003/TU01-Reiss.pdf

The author suggests the following code for testing one variable against several:

proc compare base = base-data-set
      compare = compare-data-set
      nosummary
;

var base-variable base-variable;

with compare-variable-#1 compare-variable-#2;

title1 'Compare One Variable (base-variable)’;
title2 ‘with Two Variables (compare-variable-#1 and compare-variable-#2)';
run;

FriedEgg
SAS Employee

Yes, you can compare a variable with more than one other variable.  I may have misread your question at first (still not 100% clear).  Assuming the following is a scaled down version of your problem:

options mprint;

data a;

input id a b c;

cards;

1 1.1 1.2 1.3

2 2.1 2.2 2.3

3 3.1 3.2 3.3

;

data b;

input idx abc bcd cde def efg fgh;

cards;

1 1.1 1.2 1.3 1.4 1.5 1.6

2 2.1 2.2 2.3 2.4 2.5 2.6

3 3.1 3.2 3.3 2.4 2.5 2.6

;

run;

%macro crazy_compare();

%let a=%sysfunc(open(work.a));

%let b=%sysfunc(open(work.b));

proc compare base=a compare=b /*listequalvar novalues nodate*/ noprint outstat=foobar(where=(_type_='NDIF' and sum(_BASE_, _COMP_)=0));

var %do i=1 %to %sysfunc(attrn(&a., nvars));

       %do j=1 %to %sysfunc(attrn(&b., nvars));

          %sysfunc(varname(&a., &i.))

       %end;

    %end;

;

with %do i=1 %to %sysfunc(attrn(&a., nvars));

        %do j=1 %to %sysfunc(attrn(&b., nvars));

           %sysfunc(varname(&b., &j.))

        %end;

     %end;

;

run;

%let rc=%sysfunc(close(&a.));

%let rc=%sysfunc(close(&b.));

%mend;

%crazy_compare

data _null_;

set foobar;

put _var_ ' = ' _with_;

run;

id  = idx

a  = abc

b  = bcd

c  = cde

rcclark2
Calcite | Level 5

FriedEgg, this appears to be exactly what I was looking for.  Thank you very much.

FriedEgg
SAS Employee

Glad to help, going to take quite a while longer to run with 28,000 variables...

Ksharp
Super User

How do you define " the same observations " ?

Make an example to explain your question better .

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Should be a relatively simple two stage approach.  First identify matching variables, i.e. use sashelp.vcolumn merge where name in both datasets.  Then with that list build your proc compare.  Xia Keshan brings up a very good point however, if you are comparing just one variable against another you would need a set of common identifier variables.  (Note I haven't checked it, this is just to give idea):

proc sql;

     create table LOOP as

     select     distinct     COALESCE(A.NAME,B.NAME) as NAME

     from       (select * from SASHELP.VCOLUMN where LIBNAME="libname" and MEMNAME="datasetname") A,

                    (select * from SASHELP.VCOLUMN where LIBNAME="libname" and MEMNAME="comparedatasetname") B

     where     A.NAME=B.NAME;

quit;

data _null_;

     set loop; /* For each name create a proc compare */

     call execute('ods pdf file="compare_of_'||strip(name)||'.rtf";

                          proc compare data=....;run;

                          ods pdf close;');

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 9607 views
  • 0 likes
  • 4 in conversation