DATA Step, Macro, Functions and more

comparing variables in two datasets

Reply
Contributor
Posts: 57

comparing variables in two datasets

Scenario: I have two datasets: A and B. I would to know which variables they have in common, which variables are in A but not in B and vice versa.

Question: If you know any handy SAS program to accomplish this task, please share with me.

Thanks in advance.

Super User
Posts: 10,466

Re: comparing variables in two datasets

Proc compare.

Basic syntax Proc compare base=<datasetname> compare=<datasetname>; run;

Super User
Posts: 5,072

Re: comparing variables in two datasets

Here's a variation you may like.  Extending an earlier suggestion:

proc compare data=data1 (obs=0) compare=data2 (obs=0);

run;

That way, you get all the structural differences (not just variable names) but you don't get all the detail about which data values are different.  For example, if the same variable name appears in both data sets, but is defined as character in one but numeric in the other, you may want to know that.

Good luck.

PROC Star
Posts: 7,356

Re: comparing variables in two datasets

Here is one way, although I'm sure there are many:

/*create some test data*/

data have1;

  set sashelp.class;

  retain x y z (0,0,0);

run;

data have2;

  set sashelp.class;

  retain a b c (0,0,0);

run;

proc sql;

  create table names1 as

    select name

      from dictionary.columns

        where libname="WORK" and

              memname="HAVE1"

          order by name

  ;

  create table names2 as

    select name

      from dictionary.columns

        where libname="WORK" and

              memname="HAVE2"

          order by name

  ;

quit;

data want;

  merge names1 (in=in1) names2 (in=in2);

  by name;

  if in1 and in2 then type="in both";

  else if in1 then type="in 1";

  else if in2 then type="in 2";

run;

proc print data=want;

run;

Super User
Posts: 9,662

Re: comparing variables in two datasets

data have1;
  set sashelp.class;
  retain x y z (0,0,0);
run;

data have2;
  set sashelp.class;
  retain a b c (0,0,0);
run;

proc sql;
create table common_var as
 select name from dictionary.columns where libname='WORK' and memname='HAVE1'
 intersect
 select name from dictionary.columns where libname='WORK' and memname='HAVE2';

create table in_a as
 select name from dictionary.columns where libname='WORK' and memname='HAVE1'
 except
 select name from dictionary.columns where libname='WORK' and memname='HAVE2';

create table in_b as
 select name from dictionary.columns where libname='WORK' and memname='HAVE2'
 except
 select name from dictionary.columns where libname='WORK' and memname='HAVE1';

 quit;


Ksharp

Respected Advisor
Posts: 3,124

Re: comparing variables in two datasets

Of course, like ballardw suggested, proc compare will give you whole-9-yard comparison. But if what you want is just variable names, here is an another approach:

data h1;

set sashelp.class;

rename age=h1_age;

run;

data h2;

set h1;

rename h1_age=h2_age;

run;

proc contents data=h1 out=h1out (keep=name);

run;

proc contents data=h2 out=h2out(keep=name);

run;

proc sort data=h1out;

by name;

run;

proc sort data=h2out;

by name;

run;

data common h1only h2only;

merge h1out(in=h1) h2out(in=h2);

by name;

if h1 and h2 then output common;

else if h1 and not h2 then output h1only;

else output h2only;

run;

Haikuo

Contributor
Posts: 45

Re: comparing variables in two datasets

My favorite approach is also SQL/Dictionary Tables , but for the sake of variety, one could also use the approach below, again relying on Proc Contents:

proc contents data=sashelp.class (keep=weight age sex) out=X (keep=name) noprint;
run;
proc contents data=sashelp.class (rename=(name=tag weight=mass)) out=Y (keep=name) noprint;
run;

data XnotinY YnotinX;

     if _N_=1 then do;
     declare hash hx (dataset:"X");
          hx.definekey("name");
          hx.definedata("name");
          hx.definedone();

     declare hash hy (dataset:"Y");
          hy.definekey("name");
          hy.definedata("name");
          hy.definedone();
     end;


     do until(xdone);
          set X end=xdone;;
          if hy.check() ne 0 then output XnotinY;
          call missing(name);
     end;

     do until(ydone);
          set Y end=ydone;
          if hx.check() ne 0 then output YnotinX;
          call missing(name);
          end;

run;


Ask a Question
Discussion stats
  • 6 replies
  • 386 views
  • 1 like
  • 7 in conversation