With PROC COMPARE, I noticed that there is the ability to compare variables across datasets with different names using the VAR and WITH statements.
I need to compare a table with about 1,400 variables. My base table has just 20 variables.
I want to see if any of the 20 variables in my base table have the same observations as one of the 1400 variables in the comparison table.
I do not know which names I should place in the WITH statement in advance. I was wondering if there were options for PROC COMPARE that enabled you to compare the observations of the 20 variables with all 1400 variables? And if there are, is there a better approach to finding variables from your base dataset that correspond to another variable (with a different name) in another dataset?
Yes, you can compare a variable with more than one other variable. I may have misread your question at first (still not 100% clear). Assuming the following is a scaled down version of your problem:
options mprint;
data a;
input id a b c;
cards;
1 1.1 1.2 1.3
2 2.1 2.2 2.3
3 3.1 3.2 3.3
;
data b;
input idx abc bcd cde def efg fgh;
cards;
1 1.1 1.2 1.3 1.4 1.5 1.6
2 2.1 2.2 2.3 2.4 2.5 2.6
3 3.1 3.2 3.3 2.4 2.5 2.6
;
run;
%macro crazy_compare();
%let a=%sysfunc(open(work.a));
%let b=%sysfunc(open(work.b));
proc compare base=a compare=b /*listequalvar novalues nodate*/ noprint outstat=foobar(where=(_type_='NDIF' and sum(_BASE_, _COMP_)=0));
var %do i=1 %to %sysfunc(attrn(&a., nvars));
%do j=1 %to %sysfunc(attrn(&b., nvars));
%sysfunc(varname(&a., &i.))
%end;
%end;
;
with %do i=1 %to %sysfunc(attrn(&a., nvars));
%do j=1 %to %sysfunc(attrn(&b., nvars));
%sysfunc(varname(&b., &j.))
%end;
%end;
;
run;
%let rc=%sysfunc(close(&a.));
%let rc=%sysfunc(close(&b.));
%mend;
%crazy_compare
data _null_;
set foobar;
put _var_ ' = ' _with_;
run;
id = idx
a = abc
b = bcd
c = cde
The most simple answer to your question is no... It can be done, but not with PROC COMPARE, and with such a comparison you should expect it to take a substantial amount of time to compute.
FriedEgg:
I suspect you're right. It very well may take a long time.
I could create a Macro loop and cycle through all of the 20 x 1400 variable combinations, testing each of the 20 variables against every one of the 1400 variables. I could duplicate the entry of the variable I want to compare 1400 times in the VAR statement, and using PROC Contents, pass a macro variable containing the variable names from the 1400 variable dataset, to the WITH statement:
From the SUGI paper:
http://analytics.ncsu.edu/sesug/2003/TU01-Reiss.pdf
The author suggests the following code for testing one variable against several:
proc compare base = base-data-set
compare = compare-data-set
nosummary
;
var base-variable base-variable;
with compare-variable-#1 compare-variable-#2;
title1 'Compare One Variable (base-variable)’;
title2 ‘with Two Variables (compare-variable-#1 and compare-variable-#2)';
run;
Yes, you can compare a variable with more than one other variable. I may have misread your question at first (still not 100% clear). Assuming the following is a scaled down version of your problem:
options mprint;
data a;
input id a b c;
cards;
1 1.1 1.2 1.3
2 2.1 2.2 2.3
3 3.1 3.2 3.3
;
data b;
input idx abc bcd cde def efg fgh;
cards;
1 1.1 1.2 1.3 1.4 1.5 1.6
2 2.1 2.2 2.3 2.4 2.5 2.6
3 3.1 3.2 3.3 2.4 2.5 2.6
;
run;
%macro crazy_compare();
%let a=%sysfunc(open(work.a));
%let b=%sysfunc(open(work.b));
proc compare base=a compare=b /*listequalvar novalues nodate*/ noprint outstat=foobar(where=(_type_='NDIF' and sum(_BASE_, _COMP_)=0));
var %do i=1 %to %sysfunc(attrn(&a., nvars));
%do j=1 %to %sysfunc(attrn(&b., nvars));
%sysfunc(varname(&a., &i.))
%end;
%end;
;
with %do i=1 %to %sysfunc(attrn(&a., nvars));
%do j=1 %to %sysfunc(attrn(&b., nvars));
%sysfunc(varname(&b., &j.))
%end;
%end;
;
run;
%let rc=%sysfunc(close(&a.));
%let rc=%sysfunc(close(&b.));
%mend;
%crazy_compare
data _null_;
set foobar;
put _var_ ' = ' _with_;
run;
id = idx
a = abc
b = bcd
c = cde
FriedEgg, this appears to be exactly what I was looking for. Thank you very much.
Glad to help, going to take quite a while longer to run with 28,000 variables...
How do you define " the same observations " ?
Make an example to explain your question better .
Should be a relatively simple two stage approach. First identify matching variables, i.e. use sashelp.vcolumn merge where name in both datasets. Then with that list build your proc compare. Xia Keshan brings up a very good point however, if you are comparing just one variable against another you would need a set of common identifier variables. (Note I haven't checked it, this is just to give idea):
proc sql;
create table LOOP as
select distinct COALESCE(A.NAME,B.NAME) as NAME
from (select * from SASHELP.VCOLUMN where LIBNAME="libname" and MEMNAME="datasetname") A,
(select * from SASHELP.VCOLUMN where LIBNAME="libname" and MEMNAME="comparedatasetname") B
where A.NAME=B.NAME;
quit;
data _null_;
set loop; /* For each name create a proc compare */
call execute('ods pdf file="compare_of_'||strip(name)||'.rtf";
proc compare data=....;run;
ods pdf close;');
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.