Hi all,
I am comparing variables within the same dataset. I am trying to get an output dataset that lists the variables being compared and the number of differences counted.
This is what I currently have in my syntax, and the problem is that it outputs the ndif for numeric vars only. I would like the NDIF for all vars and not just the numeric ones.
proc compare base=mydata maxprint=0 outstats=summary;
var x; with x_old;
var Y; with Y_old;
.......
....;
run;
data Ndiffs;
set summary;
if _TYPE_ not in ("NDIF") then delete;
run;
I can get an NDIF for character variables.
By any chance are you only getting output for the varaiable in the LAST VAR and WITH statement? I
I think Proc Compare really only wants to see one one VAR and one With statement.
I get results in the results tab for all. But if I do an outstats dataset, then I only get outstats for the numerics.
@K_S wrote:
I get results in the results tab for all. But if I do an outstats dataset, then I only get outstats for the numerics.
From the documentation:
When you use the OUTSTATS= option, PROC COMPARE calculates the same summary statistics as the ALLSTATS option for each pair of numeric variables that are compared.
I added the emphasis.
Proc compare is old and likely not getting many features added. So another approach is going to be needed.
Maybe something along these lines:
data need;
   set mydata;
   Array new x y z;
   Array old x_old y_old z_old;
   array dif difx dify difz ;
   do i = 1 to dim(new);
      dif[i]= (new[i] ne old[i]);
   end;
   drop i;
run;
proc means data=need sum maxdec=0;
   var dif:;
run;
The test dif[i]= (new[i] ne old[i]); will create 1/0 coded output with 1 meaning different.
The use proc means (or summary) to sum the differences. You could create an output data set or use a report procedure to create human readable summaries.
Thanks for this. I have over 100 variables so this may not be very practical.
@K_S wrote:
Thanks for this. I have over 100 variables so this may not be very practical.
The data step ARRAY function will accept variable lists.
So if the variables you want are adjacent, i.e. have sequential column numbers when reported in Proc Contents you can use a list like:
array old var1 -- abc; The two - indicate the adjacency.
Or if you name variables with sequential numbers as a suffix:
array old x1 - x25 ; would use X1, x2, x3 ... x25.
Or if not all but most have suffixes you can mix a list and explicit names:
array old x x2-x15 x27 z1-z12; or such.
Or if all of a group of variable names start with the same character(s) you can use the : list builder.
Array old stem_: ; would use all variables that start with the characters stem_.
However since this needs two variables that align between the source arrays of new and old then you need to make sure they are working.
The output dif array could be defined as : array dif {50}; or however many pairs are involved. But you need to keep track of which pair is which. Which could actually be done with the VNAME function to extract the names of variables compared:
data need;
   set mydata;
   Array new x y z;
   Array old x_old y_old z_old;
   length compared $ 65;
   do i = 1 to dim(new);
      dif= (new[i] ne old[i]);
      compared = catx(' ',vname(new[i]),vname(old[i]));
      output;
   end;
   keep compared dif;
run;
proc means data=need sum maxdec=0 nway;
   class compared;
   var dif:;
run;
The nway isn't needed for results output but likely would if creating an output data set.
But I don't see where this is any worse than listing 100 variables on Var and With statments in Proc Compare.
Thank you so much for the time you invested into this. Made my day!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
