12-27-2011 01:13 PM
Can someone please give me some advice?
I have two datasets that contain unique records based on a unique key (record key). Data from one may not be in the other or the unique key may be there but only some value may have changed. I have three values to compare (reckey, caseload, and casedesc). I need to find all observations from the base dataset that no longer exist in the compare dataset as well as those new in the compare dataset that may not exist in the base dataset. I also need to find all observations where the observation exists in both datasets but some combination of the three values has changed.
I have experimented with proc compare but can't get it to give me what I want. I'd like a simple list of all observations that are either different or non-matched in either dataset. I don't need the summaries, just the actual observations.
I am hoping to use this as a learning experience with proc compare and I'm hoping someone can point me in the right direction.
12-27-2011 04:45 PM
Proc compare is not the best tool for indentifying miss matches. It is useful for looking for changes in observations that are in both datasets.
If you just have a couple of variables to compare then just do it yourself in a data step.
merge one (in=in1 rename=(caseload=caseload1 casedesc=casedesc1))
two (in=in2 rename=(caseload=caseload2 casedesc=casedesc2))
by reckey ;
length match $50 ;
if not in1 then match='IN ONE ONLY';
else if not in2 then match='IN TWO ONLY';
if caseload1 ne caseload2 then match='CASELOAD MISMATCH';
if casedesc1 ne casedesc2 then match=catx(', ',match,'CASEDESC MISMATCH));