BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
doesper
Obsidian | Level 7

I have two datasets with over 1,000 character variables in common between them.  I would like to get a simple count for each of these character variables showing the number of observations that match and the number of observations that don't match.  Something like this:

 

Variable  Match  NoMatch

   v1      100      10 

   v2       50      60

   v3        0     110

    .

    .

    .

 

If I could output the match-nomatch counts to a SAS dataset that would be even better.

 

The datasets also have a lot of numeric variables in common between them.  I already was able to do a suitable comparison using PROC COMPARE for the numeric variables as follows:

 

proc compare base=ds_old compare=ds_new outstats=ds_stats nosummary allstats novalues nomiss;
   id idnum;
run;

 

Want to do the something similar for all the character variables.

 

Thanks much,

 

Dave

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

You could do it all with one proc compare, combined with a proc freq to analyze the results. e.g.,

 

data one;
  set sashelp.class;
  idnum=_n_;
run;

data two;
  retain height sex weight name age;
  set sashelp.class;
  idnum=_n_;
  if mod(_n_,2) then do;
    name='Ralph';
    sex='N';
    age=6;
    height=12;
    weight=74;
  end;
run;

proc compare base=one compare=two out=ds nosummary outdiff stats nomiss;
   id idnum;
run;

data ds;
  set ds;
  array chars _character_;
  array nums _numeric_;
  do over chars;
    if index(chars,'X') then chars='0';
    else chars=1;
  end;
  do over nums;
    if nums=0 then nums=1;
    else nums=0;
  end;
run;

proc freq data=ds;
  tables name--weight;
run;

Of course, you could add two formats if you want the output to be more descriptive.

 

Art, CEO, AnalystFinder.com

 

View solution in original post

4 REPLIES 4
Reeza
Super User

PROC COMPARE doesn't limit to numeric variables. 

 

I think it's best if you post sample data that we can work with and expected output that matches your sample data. 

 

 

Ksharp
Super User
Can you post some sample data to describe your problem.
It is easy for IML code.


data old;
 set sashelp.class;
run;
data new;
 set sashelp.class;
 if _n_=1 then sex='X';
 if _n_=4 then name='KSharp';
run;
proc iml;
use old nobs nobs;
read all var _char_ into old[c=vname];
close;

use new;
read all var _char_ into new;
close;
var=t(vname);
match=t((old=new)[+,]);
not_match=t(nobs-match);

create want var {var match not_match};
append;
close;
quit;

art297
Opal | Level 21

You could do it all with one proc compare, combined with a proc freq to analyze the results. e.g.,

 

data one;
  set sashelp.class;
  idnum=_n_;
run;

data two;
  retain height sex weight name age;
  set sashelp.class;
  idnum=_n_;
  if mod(_n_,2) then do;
    name='Ralph';
    sex='N';
    age=6;
    height=12;
    weight=74;
  end;
run;

proc compare base=one compare=two out=ds nosummary outdiff stats nomiss;
   id idnum;
run;

data ds;
  set ds;
  array chars _character_;
  array nums _numeric_;
  do over chars;
    if index(chars,'X') then chars='0';
    else chars=1;
  end;
  do over nums;
    if nums=0 then nums=1;
    else nums=0;
  end;
run;

proc freq data=ds;
  tables name--weight;
run;

Of course, you could add two formats if you want the output to be more descriptive.

 

Art, CEO, AnalystFinder.com

 

doesper
Obsidian | Level 7

Thanks to Reeza, Ksharp, and Art297 for responding.  With regards to Ksharp's suggestion about using IML - wish we had it but we don't.  Perhaps IML will someday be included with SAS Foundation?  Here's hoping...

 

In the end, I went with a variant of what Art297 suggested.  Here's what I did.  Not elegant, but it does get the job done.

 

Thanks to all,

 

Dave

 

proc compare base=ds_old compare=ds_new out=ds_stats2 outdif noprint;
   id idnum;
   var _character_;
run;

data ds_stats2b;
   set ds_stats2;
   array chars _character_;
   do over chars;
      if index(chars,'X') then chars = '0';
      else                     chars = '1';
   end;
run;

ods output OneWayFreqs = ds_stats2c;
proc freq data=ds_stats2b;
  tables _character_;
run;
ods output close;

data ds_stats2d;
   set ds_stats2c;
   where round(cumpercent,.01) < 100.00;
   element = substr(table,7);
   pctdiff = percent;
   keep element pctdiff;
run;

proc sort data=ds_stats2d;
   by descending pctdiff;
run;

proc print data=ds_stats2d (obs=2000);
   title 'ds_stats2d';
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 955 views
  • 0 likes
  • 4 in conversation