With PROC COMPARE, I noticed that there is the ability to compare variables across datasets with different names using the VAR and WITH statements.
I need to compare a table with about 1,400 variables. My base table has just 20 variables.
I want to see if any of the 20 variables in my base table have the same observations as one of the 1400 variables in the comparison table.
I do not know which names I should place in the WITH statement in advance. I was wondering if there were options for PROC COMPARE that enabled you to compare the observations of the 20 variables with all 1400 variables? And if there are, is there a better approach to finding variables from your base dataset that correspond to another variable (with a different name) in another dataset?
Yes, you can compare a variable with more than one other variable. I may have misread your question at first (still not 100% clear). Assuming the following is a scaled down version of your problem:
options mprint;
data a;
input id a b c;
cards;
1 1.1 1.2 1.3
2 2.1 2.2 2.3
3 3.1 3.2 3.3
;
data b;
input idx abc bcd cde def efg fgh;
cards;
1 1.1 1.2 1.3 1.4 1.5 1.6
2 2.1 2.2 2.3 2.4 2.5 2.6
3 3.1 3.2 3.3 2.4 2.5 2.6
;
run;
%macro crazy_compare();
%let a=%sysfunc(open(work.a));
%let b=%sysfunc(open(work.b));
proc compare base=a compare=b /*listequalvar novalues nodate*/ noprint outstat=foobar(where=(_type_='NDIF' and sum(_BASE_, _COMP_)=0));
var %do i=1 %to %sysfunc(attrn(&a., nvars));
%do j=1 %to %sysfunc(attrn(&b., nvars));
%sysfunc(varname(&a., &i.))
%end;
%end;
;
with %do i=1 %to %sysfunc(attrn(&a., nvars));
%do j=1 %to %sysfunc(attrn(&b., nvars));
%sysfunc(varname(&b., &j.))
%end;
%end;
;
run;
%let rc=%sysfunc(close(&a.));
%let rc=%sysfunc(close(&b.));
%mend;
%crazy_compare
data _null_;
set foobar;
put _var_ ' = ' _with_;
run;
id = idx
a = abc
b = bcd
c = cde
The most simple answer to your question is no... It can be done, but not with PROC COMPARE, and with such a comparison you should expect it to take a substantial amount of time to compute.
FriedEgg:
I suspect you're right. It very well may take a long time.
I could create a Macro loop and cycle through all of the 20 x 1400 variable combinations, testing each of the 20 variables against every one of the 1400 variables. I could duplicate the entry of the variable I want to compare 1400 times in the VAR statement, and using PROC Contents, pass a macro variable containing the variable names from the 1400 variable dataset, to the WITH statement:
From the SUGI paper:
http://analytics.ncsu.edu/sesug/2003/TU01-Reiss.pdf
The author suggests the following code for testing one variable against several:
proc compare base = base-data-set
compare = compare-data-set
nosummary
;
var base-variable base-variable;
with compare-variable-#1 compare-variable-#2;
title1 'Compare One Variable (base-variable)’;
title2 ‘with Two Variables (compare-variable-#1 and compare-variable-#2)';
run;
Yes, you can compare a variable with more than one other variable. I may have misread your question at first (still not 100% clear). Assuming the following is a scaled down version of your problem:
options mprint;
data a;
input id a b c;
cards;
1 1.1 1.2 1.3
2 2.1 2.2 2.3
3 3.1 3.2 3.3
;
data b;
input idx abc bcd cde def efg fgh;
cards;
1 1.1 1.2 1.3 1.4 1.5 1.6
2 2.1 2.2 2.3 2.4 2.5 2.6
3 3.1 3.2 3.3 2.4 2.5 2.6
;
run;
%macro crazy_compare();
%let a=%sysfunc(open(work.a));
%let b=%sysfunc(open(work.b));
proc compare base=a compare=b /*listequalvar novalues nodate*/ noprint outstat=foobar(where=(_type_='NDIF' and sum(_BASE_, _COMP_)=0));
var %do i=1 %to %sysfunc(attrn(&a., nvars));
%do j=1 %to %sysfunc(attrn(&b., nvars));
%sysfunc(varname(&a., &i.))
%end;
%end;
;
with %do i=1 %to %sysfunc(attrn(&a., nvars));
%do j=1 %to %sysfunc(attrn(&b., nvars));
%sysfunc(varname(&b., &j.))
%end;
%end;
;
run;
%let rc=%sysfunc(close(&a.));
%let rc=%sysfunc(close(&b.));
%mend;
%crazy_compare
data _null_;
set foobar;
put _var_ ' = ' _with_;
run;
id = idx
a = abc
b = bcd
c = cde
FriedEgg, this appears to be exactly what I was looking for. Thank you very much.
Glad to help, going to take quite a while longer to run with 28,000 variables...
How do you define " the same observations " ?
Make an example to explain your question better .
Should be a relatively simple two stage approach. First identify matching variables, i.e. use sashelp.vcolumn merge where name in both datasets. Then with that list build your proc compare. Xia Keshan brings up a very good point however, if you are comparing just one variable against another you would need a set of common identifier variables. (Note I haven't checked it, this is just to give idea):
proc sql;
create table LOOP as
select distinct COALESCE(A.NAME,B.NAME) as NAME
from (select * from SASHELP.VCOLUMN where LIBNAME="libname" and MEMNAME="datasetname") A,
(select * from SASHELP.VCOLUMN where LIBNAME="libname" and MEMNAME="comparedatasetname") B
where A.NAME=B.NAME;
quit;
data _null_;
set loop; /* For each name create a proc compare */
call execute('ods pdf file="compare_of_'||strip(name)||'.rtf";
proc compare data=....;run;
ods pdf close;');
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.