DATA Step, Macro, Functions and more

How to get position Of Miss mattch

Reply
Contributor
Posts: 52

How to get position Of Miss mattch

Hi All,

I have n number of variables in one dataset and same structure like in another dataset. Now I want to compare those two dataset with Respective variables and need to find position of the miss-matches.

Below example data looks like original data.

data test;

  Input name $ surname $;

  datalines;

ramesh Babu

Dishant parikh

Disha.N Khanna

;

Run;

data test1;

  Input name1 $ Surname1 $;

  datalines;

RamesH BABU

DISHANT PARIKH

disha Khanna

;

Run;

Now I need To Output like below.

Name Name1 Char Surname Surname1 Char1

ramesh RamesH 1,6 Babu BABU 2,3,4

Dishant DISHANT 2,3,4,5,6,7 parikh PARIKH 1,2,3,4,5,6

Disha.N disha 1,6,7 Khanna Khanna

So In Output Respective Variables Miss-Matches Position Need to be show As like Above.Your help will Mean to be lot.

Thanks In advance.

Regards,

Dishant Parikh

Super User
Posts: 5,257

Re: How to get position Of Miss mattch

You might want to take a look into PROC COMPARE.

Data never sleeps
Contributor
Posts: 52

Re: How to get position Of Miss mattch

Hi Linush,

Thanks For Giving Time To look my problem. Initially Tried with PROC COMAPRE But Didn't Achieve it. Could You Please Help Me..

Super User
Posts: 5,257

Re: How to get position Of Miss mattch

Looks a bit complicated output. What do you intend to do with the result?

Probably you need to do some do-loops wtih some substr() comparisons.

Data never sleeps
Contributor
Posts: 52

Re: How to get position Of Miss mattch

My Criteria is like there is Thousand Of data for comparison need to figure out in which data and on which position Data is not matching.(so Miss-match position required if their is 100 Miss-match in One string All position Should to display as i explain above).

So can you please Give Some Suggestion With Code which Achieve to target So I can it Get Exactly.

Super User
Super User
Posts: 7,401

Re: How to get position Of Miss mattch

There is no built in function to do what you require.  Proc compare can give you an output dataset of difference's by similar variables and you could post manipulate that output.  However, looking at your example and text I don't think that is what you are trying to do.  For instance, check one seems to match ramesh with RamesH, both of which are distinct character strings.

What I would suggest is to look at your data and identify the kinds of checks/merges you want to achieve.  For instance in the previous example, maybe you want to upcase both ramesh and RamesH as a key merge variable, you could then merge these two and compare the original strings with a small do loop (though note that it wouldn't work for some unless you added other logic in).

So some quick and dirty code for variable 1:

proc sql;

  create table WANT as

  select  COALESCE(upcase(A.NAME),upcase(B.NAME1)) as MERGE_VAR,

          A.NAME,

          B.NAME1

  from    WORK.TEST A

  full join WORK.TEST1 B

  on      upcase(A.NAME)=upcase(B.NAME1);

quit;

data WANT;

  set WANT;

  attrib diffs format=$2000.;

  do i=1 to 2000;

    if substr(name,i,1) ne substr(name1,i,1) then diffs=catx(",",strip(diffs),strip(put(i,best.)));

  end;

run;

Respected Advisor
Posts: 3,777

Re: How to get position Of Miss mattch

Seems like PROC COMPARE is the most logical.  It is easier if you variables are named the same then you don't need VAR/WITH.  Either way get a nice output that works nicely for any number of variables.  You can derive the column numbers of all differences from the Xs with FINDC but I'm not sure what that will provide that is more not easily show by the DIF row.  Explain better what your goal is, and not just I need to the column numbers.

data test;
  Input name $ surname $;
  datalines;
ramesh Babu
Dishant parikh
Disha.N Khanna
RamesH BABU
;
Run;

data test1;
  Input name1 $ Surname1 $;
  datalines;
RamesH BABU
DISHANT PARIKH
disha Khanna
RamesH BABU
;
Run;

proc compare base=test compare=test1 noprint out=comp outbase outcomp outdiff;
  
var name surname;
   with name1 surname1;
   run;
proc print;
  
run;

Obs    _TYPE_     _OBS_    name        surname

  
1    BASE         1      ramesh      Babu   
  
2    COMPARE      1      RamesH      BABU   
  
3    DIF          1      X....X..    .XXX....
  
4    BASE         2      Dishant     parikh 
  
5    COMPARE      2      DISHANT     PARIKH 
  
6    DIF          2      .XXXXXX.    XXXXXX..
  
7    BASE         3      Disha.N     Khanna 
  
8    COMPARE      3      disha       Khanna 
  
9    DIF          3      X....XX.    ........
 
10    BASE         4      RamesH      BABU   
 
11    COMPARE      4      RamesH      BABU   
 
12    DIF          4      ........    ........
Occasional Contributor
Posts: 6

Re: How to get position Of Miss mattch

Hi Dishant,

Here is a possible solution based on your data assuming that the order in both dataset is the same:

data test;

  Input name $ surname $;

  datalines;

ramesh Babu

Dishant parikh

Disha.N Khanna

;Run;

data test1;

  Input name1 $ surname1 $;

  datalines;

RamesH BABU

DISHANT PARIKH

disha Khanna

;Run;

*length of variables name and surname based on your data and can be obtained from sashelp.vcolumn;

%let nlen =8;

%let slen =8;

data compare(drop=i);

merge test test1; 

length pos1 pos2  $32.;

do i=1 to &nlen.;

if char(name,i) ^= char(name1,i) then pos1 = catx(',', pos1,i);

end;

do i=1 to &slen.;

if char(surname,i) ^= char(surname1,i) then pos2 = catx(',', pos2,i);

end;

run;

BR

Contributor
Posts: 52

Re: How to get position Of Miss mattch

Hi All,

Thanks For Your help And Sharing Your Logic But In My dataset their is 50 Variables are present.so each variable we need to give condition as per your logic so is their any logic we just pass to variables in macro it will do compare and give the miss-match position like below.

%COMPARE(NAME);

%COMPARE1(NAME1);

Name Name1 Char

ramesh RamesH 1,6

Dishant DISHANT 2,3,4,5,6,7

Disha.N disha 1,6,7




Super User
Super User
Posts: 7,401

Re: How to get position Of Miss mattch

Hi,

This goes back to my post earlier on.  You make no mention of whether each variable you want to compare will follow the exact same logic, for instance you show two name variables, what if there is a date or an age variable?  If they are all exactly the same logic then you could wrap any of the above solutions up in a macro and use that, get a list of your variables from SASHELP.VCOLUMNS, and call the macro with call execute('%your_macro('||variable||','||variable||'1);');

However if your data follows different logic per group of variables, then you would need to identify what needs to be checked and how.  Maybe an example is Edit Check programming, where a DCM would have various different responses included, and checks are written to ensure dates are correct, in the past, within windows etc. which wouldn't be the same type of check that ensures drug coding matches.

Super Contributor
Posts: 282

Re: How to get position Of Miss mattch

Hi,

I have added an id variable and come up with the following:

data test;

  Input id name $ surname $;

  datalines;

1 ramesh Babu

2 Dishant parikh

3 Disha.N Khanna

;

Run;

data test1;

  Input id name1 $ Surname1 $;

  datalines;

1 RamesH BABU

2 DISHANT PARIKH

3 disha Khanna

;

Run;

data want(keep=name name1 surname surname1 char char1);

  merge test

        test1;

  by id;

  length str1 $30

         str2 $30

  diff $60

  ;

  str1=name;

  str2=name1;

  link diffpos;

  char=diff;

  str1=surname;

  str2=surname1;

  link diffpos;

  char1=diff;

return;

diffpos:

  position=0;

  diff='';

  len1=length(str1);

  len2=length(str2);

  do until(position ge len1 or position ge len2);

    position=position+1;

  if substr(str1,position,1) ne substr(str2,position,1) then

   diff=catx(',',diff,position);

  end;

  if len1 ne len2 then

  do;

    if len1 gt len2 then

      limit=len1;

  else

      limit=len2;

    do i=position+1 to limit;

      diff=catx(',',diff,i);

    end;

  end;

return;

run;

Regards,

Amir.

Respected Advisor
Posts: 3,124

Re: How to get position Of Miss mattch

The following snippet bases on the assumption that 'test' and 'test1' have a one-one match from top down.

data test;

Input name $ surname $;

datalines;

ramesh Babu

Dishant parikh

  1. Disha.N Khanna

;

Run;

data test1;

Input name1 $ Surname1 $;

datalines;

RamesH BABU

DISHANT PARIKH

disha Khanna

;

Run;

data compared;

merge test test1;

length dif_name dif_surname $ 50;

do _n_=1 to max(lengthn(name),lengthn(name1));

if char(name,_n_) ne char(name1,_n_) then dif_name=catx(',',dif_name,_n_);

end;

do _n_=1 to max(lengthn(surname),lengthn(surname1));

if char(surname,_n_) ne char(surname1,_n_) then dif_surname=catx(',',dif_surname,_n_);

end;

run;

Haikuo

Ask a Question
Discussion stats
  • 11 replies
  • 412 views
  • 0 likes
  • 7 in conversation