I have two datasets with over 1,000 character variables in common between them. I would like to get a simple count for each of these character variables showing the number of observations that match and the number of observations that don't match. Something like this:
Variable Match NoMatch
v1 100 10
v2 50 60
v3 0 110
.
.
.
If I could output the match-nomatch counts to a SAS dataset that would be even better.
The datasets also have a lot of numeric variables in common between them. I already was able to do a suitable comparison using PROC COMPARE for the numeric variables as follows:
proc compare base=ds_old compare=ds_new outstats=ds_stats nosummary allstats novalues nomiss;
id idnum;
run;
Want to do the something similar for all the character variables.
Thanks much,
Dave
You could do it all with one proc compare, combined with a proc freq to analyze the results. e.g.,
data one; set sashelp.class; idnum=_n_; run; data two; retain height sex weight name age; set sashelp.class; idnum=_n_; if mod(_n_,2) then do; name='Ralph'; sex='N'; age=6; height=12; weight=74; end; run; proc compare base=one compare=two out=ds nosummary outdiff stats nomiss; id idnum; run; data ds; set ds; array chars _character_; array nums _numeric_; do over chars; if index(chars,'X') then chars='0'; else chars=1; end; do over nums; if nums=0 then nums=1; else nums=0; end; run; proc freq data=ds; tables name--weight; run;
Of course, you could add two formats if you want the output to be more descriptive.
Art, CEO, AnalystFinder.com
PROC COMPARE doesn't limit to numeric variables.
I think it's best if you post sample data that we can work with and expected output that matches your sample data.
Can you post some sample data to describe your problem. It is easy for IML code. data old; set sashelp.class; run; data new; set sashelp.class; if _n_=1 then sex='X'; if _n_=4 then name='KSharp'; run; proc iml; use old nobs nobs; read all var _char_ into old[c=vname]; close; use new; read all var _char_ into new; close; var=t(vname); match=t((old=new)[+,]); not_match=t(nobs-match); create want var {var match not_match}; append; close; quit;
You could do it all with one proc compare, combined with a proc freq to analyze the results. e.g.,
data one; set sashelp.class; idnum=_n_; run; data two; retain height sex weight name age; set sashelp.class; idnum=_n_; if mod(_n_,2) then do; name='Ralph'; sex='N'; age=6; height=12; weight=74; end; run; proc compare base=one compare=two out=ds nosummary outdiff stats nomiss; id idnum; run; data ds; set ds; array chars _character_; array nums _numeric_; do over chars; if index(chars,'X') then chars='0'; else chars=1; end; do over nums; if nums=0 then nums=1; else nums=0; end; run; proc freq data=ds; tables name--weight; run;
Of course, you could add two formats if you want the output to be more descriptive.
Art, CEO, AnalystFinder.com
Thanks to Reeza, Ksharp, and Art297 for responding. With regards to Ksharp's suggestion about using IML - wish we had it but we don't. Perhaps IML will someday be included with SAS Foundation? Here's hoping...
In the end, I went with a variant of what Art297 suggested. Here's what I did. Not elegant, but it does get the job done.
Thanks to all,
Dave
proc compare base=ds_old compare=ds_new out=ds_stats2 outdif noprint; id idnum; var _character_; run; data ds_stats2b; set ds_stats2; array chars _character_; do over chars; if index(chars,'X') then chars = '0'; else chars = '1'; end; run; ods output OneWayFreqs = ds_stats2c; proc freq data=ds_stats2b; tables _character_; run; ods output close; data ds_stats2d; set ds_stats2c; where round(cumpercent,.01) < 100.00; element = substr(table,7); pctdiff = percent; keep element pctdiff; run; proc sort data=ds_stats2d; by descending pctdiff; run; proc print data=ds_stats2d (obs=2000); title 'ds_stats2d'; run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.