BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ucdcrush
Obsidian | Level 7

Hi all - here is the scenario.

 

I have two datasets of identical structure, about 300 fields long, and that can share a common "id" variable.

 

I'm trying to find out, in a concise way, which values (for the same variables) are different between these datasets, and what the differences are.

 

I have tried proc compare.

 

proc compare data=md5_imported
compare=md5_latest
outnoequal
out=md5_changes;
id acrfid;
run;

The output displayed on the screen is great, as it shows the ID variable, the name of the variable that was different, and the values in the two datasets of those variables. That's everything I need, except I need to use that in data somewhere, so having it on the screen in the text output table isn't going to work.

 

When I use the out= option, I get the table, but it's of course 300+ variables long, and I have to scroll over to find the column with an 'X' somewhere in the value to show me which field was different. Alas, it also does not tell me the value of that variable in either dataset 😞

 

What I'm ideally looking for is to obtain something like this, in a dataset:

 

ID     TableA                  TableB

1      Name-Daniel          Name-Daniele

 

where "name" would be the name of the variable where there was a difference, "-" would be hardcoded as a separator, and "Daniel" or "Daniele" would be the value of that variable in each of those tables.

 

I think I might be able to get there -- somehow -- transposing the output from the proc compare table, then joining back to the original tables, but I have no idea how, and I have to imagine someone has already figured out a (better) way..


Any advice is appreciated, thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this (assumes observation order and number match)?

%macro compare(ds1=,ds2=);
  %local i varnames nbvars;
  proc contents data=&ds1 out=CONT1 noprint; run;
  proc contents data=&ds2 out=CONT2 noprint; run;
  proc sql noprint;
    create table VARIABLES_IN_ONLY_ONE_TABLE as 
    (select NAME,'A' as TABLE from CONT1 except select NAME,'A' from CONT2)
     union
    (select NAME,'B' from CONT2 except select NAME,'B' from CONT1) ;
    select CONT1.NAME into :varnames separated by ' ' from CONT1, CONT2 where CONT1.NAME=CONT2.NAME;
  quit;
  %let nbvars=%sysfunc(countw(&varnames)); 
  data COMPARE;
    merge &ds1 (rename=(%do i=1%to&nbvars;%scan(&varnames,&i)=_%scan(&varnames,&i)%end;))
          &ds2;
   %do i=1%to&nbvars;
     if %scan(&varnames,&i) ne _%scan(&varnames,&i) then do;
       TABLEA=cats("%scan(&varnames,&i)-", %scan(&varnames,&i)); 
       TABLEB=cats("%scan(&varnames,&i)-",_%scan(&varnames,&i));
       output; 
     end;
   %end;
   keep TABLE:;
  run;
%mend;

data CLASS; 
  set SASHELP.CLASS; 
  drop SEX; 
  if _N_<3 then AGE =10  ;
  if _N_=1 then NAME='Jo';
run;

%compare(ds1=CLASS, ds2=SASHELP.CLASS);
Obs TABLEA TABLEB
1 Age-14 Age-10
2 Name-Alfred Name-Jo
3 Age-13 Age-10

 

 

 

View solution in original post

2 REPLIES 2
ChrisNZ
Tourmaline | Level 20

Like this (assumes observation order and number match)?

%macro compare(ds1=,ds2=);
  %local i varnames nbvars;
  proc contents data=&ds1 out=CONT1 noprint; run;
  proc contents data=&ds2 out=CONT2 noprint; run;
  proc sql noprint;
    create table VARIABLES_IN_ONLY_ONE_TABLE as 
    (select NAME,'A' as TABLE from CONT1 except select NAME,'A' from CONT2)
     union
    (select NAME,'B' from CONT2 except select NAME,'B' from CONT1) ;
    select CONT1.NAME into :varnames separated by ' ' from CONT1, CONT2 where CONT1.NAME=CONT2.NAME;
  quit;
  %let nbvars=%sysfunc(countw(&varnames)); 
  data COMPARE;
    merge &ds1 (rename=(%do i=1%to&nbvars;%scan(&varnames,&i)=_%scan(&varnames,&i)%end;))
          &ds2;
   %do i=1%to&nbvars;
     if %scan(&varnames,&i) ne _%scan(&varnames,&i) then do;
       TABLEA=cats("%scan(&varnames,&i)-", %scan(&varnames,&i)); 
       TABLEB=cats("%scan(&varnames,&i)-",_%scan(&varnames,&i));
       output; 
     end;
   %end;
   keep TABLE:;
  run;
%mend;

data CLASS; 
  set SASHELP.CLASS; 
  drop SEX; 
  if _N_<3 then AGE =10  ;
  if _N_=1 then NAME='Jo';
run;

%compare(ds1=CLASS, ds2=SASHELP.CLASS);
Obs TABLEA TABLEB
1 Age-14 Age-10
2 Name-Alfred Name-Jo
3 Age-13 Age-10

 

 

 

ucdcrush
Obsidian | Level 7

Hi Chris. Sorry for the late reply, but THANK YOU very much! That works perfectly! Much appreciated.

 

Dave

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 832 views
  • 0 likes
  • 2 in conversation