BookmarkSubscribeRSS Feed
chandler
Fluorite | Level 6
I'm trying to use a different data source for my reports. The alternate COMPARE dataset/table has the same observations as the old source BASE dataset/table, plus some additional observations, but the variable names are different. Some are easy to determine just by looking, but most are not intuitive at all. I want to map the two in a cross-reference, if you will, to point my code to the correct new variable names in the alternate dataset without altering the logic in the program. Is there an option in PROC COMPARE to show in two different datasets, variables having equal values but different names.

Here is my PROC COMPARE code below:

proc sort data=whse.acls_detail_all out=jesslibr.detail_all_base nodup; /* THIS IS THE BASE dataset */
by acct_num ;
run;
data jesslibr.bbacls_201005_test ;
length acct_num $ 18 ;
set jesslibr.bbacls_201005 ;
acct_num = substr(source_key_value,13,18) ;
format acct_num $18. ;
run;
proc sort data=jesslibr.bbacls_201005_test out=jesslibr.bbacls_201005_comp nodup ; /* THIS IS THE COMPARE dataset */
by acct_num ;
run;


/** Begin PROC COMPARE of the two sorted datasets. **/

ods listing close;
options linesize= 120 pagesize= 80;
proc compare base=jesslibr.detail_all_base
compare=jesslibr.bbacls_201005_comp LISTCOMPVAR BRIEFSUMMARY out=jesslibr.proc_compare_results_acls maxprint=10;
by acct_num;


title1 "Variables In May2010 dataset Acls_Detail_All of equal value but different name in May2010 dataset BBacls_201005" ;

run;
ods listing;
3 REPLIES 3
data_null__
Jade | Level 19
I don't know of a PROC COMPARE option for that but you can cause PROC COMPARE to compare a variable to a number of other variables using the VAR and WITH statements. For numeric variable you could look at summary statistics to help ID variables that are equal but have different name.

Here is example of comparing AGE to all the other numeric variables in a different data set to find a match.

[pre]
data class;
set sashelp.class;
rename age=age1 weight=age2 height=age3;
run;
proc compare base=sashelp.class compare=class listequalvar novalues;
var age age age;
with _numeric_;
run;
[/pre]

You could automate this by preparing lists of variables from each data set, using PROC TRANSPOSE or PROC CONTENTS and code gen calls to PROC COMPARE.
chandler
Fluorite | Level 6
Thanks Data _NULL_ ; I believe I follow your logic in the example you gave here, and I am trying to apply it to my situation, but the last comment you made about "automate this by preparing lists of variables from each dataset, using PROC TRANSPOSE or PROC CONTENS and code gen calls to PROC COMPARE", I can't visualize that. I need examples of how to do that. I have 877 variables and 40,236 observations in my BASE dataset and 145 variables and 64,364 observations in the COMPARE dataset.
data_null__
Jade | Level 19
Here is an example of a program that may work for you. I don't know what kind of performance you can expect with so many variables. This program compares each numeric variable in BASE with every other numeric variable in COMPARE.


[pre]
*** Create sample data;
data base;
set sashelp.class(in=in1) sashelp.class(in=in2);
byvar = cats(of in:);
run;

data compare;
set sashelp.class(in=in1) sashelp.class(in=in2);
byvar = cats(of in:);
rename age=agey;
rename weight=wt;
today = today();
if in2 and name eq: 'J' then do;
age + ranuni(1);
height = ranuni(1);
end;
run;

*** get a list of numeric variable names from BASE DATA;
proc transpose data=base(obs=0) out=baseVars;
var _numeric_;
run;
*** code gen VAR and WITH statements;
filename FT33F001 temp;
data _null_;
file FT33F001;
if 0 then set compare(keep=_numeric_);
array _n
  • _numeric_;
    cvar0 = dim(_n);
    do until(eof);
    set baseVars end=eof;
    do i = 1 to cvar0;
    put +3 'Var ' _name_ +(-1) ';';
    end;
    put +3 'with _numeric_;';
    end;
    run;
    *** compare;
    proc compare noprint base=base compare=compare novalues /*listequalvar note*/ OUTSTATS=stats;
    by NOTSORTED byvar;
    %inc FT33F001 / source2;
    run;

    *** this may be enough information to answer your question;
    proc print data=stats;
    where _TYPE_ eq 'NDIF';
    run;

    [/pre]

    The NDIF statistics is Number of OBS different in the _BASE_ column with %dif in the _COMP_ column.

    [pre]
    Obs byvar _VAR_ _WITH_ _TYPE_ _BASE_ _COMP_ _DIF_ _PCTDIF_

    9 10 Age agey NDIF 0 0.000 . .
    20 10 Age Height NDIF 19 100.000 . .
    31 10 Age wt NDIF 19 100.000 . .
    42 10 Age today NDIF 19 100.000 . .
    53 10 Height agey NDIF 19 100.000 . .
    64 10 Height Height NDIF 0 0.000 . .
    75 10 Height wt NDIF 19 100.000 . .
    86 10 Height today NDIF 19 100.000 . .
    97 10 Weight agey NDIF 19 100.000 . .
    108 10 Weight Height NDIF 19 100.000 . .
    119 10 Weight wt NDIF 0 0.000 . .
    130 10 Weight today NDIF 19 100.000 . .
    141 01 Age agey NDIF 7 36.842 . .
    152 01 Age Height NDIF 19 100.000 . .
    163 01 Age wt NDIF 19 100.000 . .
    174 01 Age today NDIF 19 100.000 . .
    185 01 Height agey NDIF 19 100.000 . .
    196 01 Height Height NDIF 7 36.842 . .
    207 01 Height wt NDIF 19 100.000 . .
    218 01 Height today NDIF 19 100.000 . .
    229 01 Weight agey NDIF 19 100.000 . .
    240 01 Weight Height NDIF 19 100.000 . .
    251 01 Weight wt NDIF 0 0.000 . .
    262 01 Weight today NDIF 19 100.000 . .
    [/pre]
  • SAS Innovate 2025: Save the Date

     SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

    Save the date!

    What is Bayesian Analysis?

    Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 3 replies
    • 1139 views
    • 0 likes
    • 2 in conversation