Hi All,
I have n number of variables in one dataset and same structure like in another dataset. Now I want to compare those two dataset with Respective variables and need to find position of the miss-matches.
Below example data looks like original data.
data test;
Input name $ surname $;
datalines;
ramesh Babu
Dishant parikh
Disha.N Khanna
;
Run;
data test1;
Input name1 $ Surname1 $;
datalines;
RamesH BABU
DISHANT PARIKH
disha Khanna
;
Run;
Now I need To Output like below.
Name Name1 Char Surname Surname1 Char1
ramesh RamesH 1,6 Babu BABU 2,3,4
Dishant DISHANT 2,3,4,5,6,7 parikh PARIKH 1,2,3,4,5,6
Disha.N disha 1,6,7 Khanna Khanna
So In Output Respective Variables Miss-Matches Position Need to be show As like Above.Your help will Mean to be lot.
Thanks In advance.
Regards,
Dishant Parikh
You might want to take a look into PROC COMPARE.
Hi Linush,
Thanks For Giving Time To look my problem. Initially Tried with PROC COMAPRE But Didn't Achieve it. Could You Please Help Me..
Looks a bit complicated output. What do you intend to do with the result?
Probably you need to do some do-loops wtih some substr() comparisons.
My Criteria is like there is Thousand Of data for comparison need to figure out in which data and on which position Data is not matching.(so Miss-match position required if their is 100 Miss-match in One string All position Should to display as i explain above).
So can you please Give Some Suggestion With Code which Achieve to target So I can it Get Exactly.
There is no built in function to do what you require. Proc compare can give you an output dataset of difference's by similar variables and you could post manipulate that output. However, looking at your example and text I don't think that is what you are trying to do. For instance, check one seems to match ramesh with RamesH, both of which are distinct character strings.
What I would suggest is to look at your data and identify the kinds of checks/merges you want to achieve. For instance in the previous example, maybe you want to upcase both ramesh and RamesH as a key merge variable, you could then merge these two and compare the original strings with a small do loop (though note that it wouldn't work for some unless you added other logic in).
So some quick and dirty code for variable 1:
proc sql;
create table WANT as
select COALESCE(upcase(A.NAME),upcase(B.NAME1)) as MERGE_VAR,
A.NAME,
B.NAME1
from WORK.TEST A
full join WORK.TEST1 B
on upcase(A.NAME)=upcase(B.NAME1);
quit;
data WANT;
set WANT;
attrib diffs format=$2000.;
do i=1 to 2000;
if substr(name,i,1) ne substr(name1,i,1) then diffs=catx(",",strip(diffs),strip(put(i,best.)));
end;
run;
Seems like PROC COMPARE is the most logical. It is easier if you variables are named the same then you don't need VAR/WITH. Either way get a nice output that works nicely for any number of variables. You can derive the column numbers of all differences from the Xs with FINDC but I'm not sure what that will provide that is more not easily show by the DIF row. Explain better what your goal is, and not just I need to the column numbers.
Hi Dishant,
Here is a possible solution based on your data assuming that the order in both dataset is the same:
data test;
Input name $ surname $;
datalines;
ramesh Babu
Dishant parikh
Disha.N Khanna
;Run;
data test1;
Input name1 $ surname1 $;
datalines;
RamesH BABU
DISHANT PARIKH
disha Khanna
;Run;
*length of variables name and surname based on your data and can be obtained from sashelp.vcolumn;
%let nlen =8;
%let slen =8;
data compare(drop=i);
merge test test1;
length pos1 pos2 $32.;
do i=1 to &nlen.;
if char(name,i) ^= char(name1,i) then pos1 = catx(',', pos1,i);
end;
do i=1 to &slen.;
if char(surname,i) ^= char(surname1,i) then pos2 = catx(',', pos2,i);
end;
run;
BR
Hi All,
Thanks For Your help And Sharing Your Logic But In My dataset their is 50 Variables are present.so each variable we need to give condition as per your logic so is their any logic we just pass to variables in macro it will do compare and give the miss-match position like below.
%COMPARE(NAME);
%COMPARE1(NAME1);
Name Name1 Char
ramesh RamesH 1,6
Dishant DISHANT 2,3,4,5,6,7
Disha.N disha 1,6,7
Hi,
This goes back to my post earlier on. You make no mention of whether each variable you want to compare will follow the exact same logic, for instance you show two name variables, what if there is a date or an age variable? If they are all exactly the same logic then you could wrap any of the above solutions up in a macro and use that, get a list of your variables from SASHELP.VCOLUMNS, and call the macro with call execute('%your_macro('||variable||','||variable||'1);');
However if your data follows different logic per group of variables, then you would need to identify what needs to be checked and how. Maybe an example is Edit Check programming, where a DCM would have various different responses included, and checks are written to ensure dates are correct, in the past, within windows etc. which wouldn't be the same type of check that ensures drug coding matches.
Hi,
I have added an id variable and come up with the following:
data test;
Input id name $ surname $;
datalines;
1 ramesh Babu
2 Dishant parikh
3 Disha.N Khanna
;
Run;
data test1;
Input id name1 $ Surname1 $;
datalines;
1 RamesH BABU
2 DISHANT PARIKH
3 disha Khanna
;
Run;
data want(keep=name name1 surname surname1 char char1);
merge test
test1;
by id;
length str1 $30
str2 $30
diff $60
;
str1=name;
str2=name1;
link diffpos;
char=diff;
str1=surname;
str2=surname1;
link diffpos;
char1=diff;
return;
diffpos:
position=0;
diff='';
len1=length(str1);
len2=length(str2);
do until(position ge len1 or position ge len2);
position=position+1;
if substr(str1,position,1) ne substr(str2,position,1) then
diff=catx(',',diff,position);
end;
if len1 ne len2 then
do;
if len1 gt len2 then
limit=len1;
else
limit=len2;
do i=position+1 to limit;
diff=catx(',',diff,i);
end;
end;
return;
run;
Regards,
Amir.
The following snippet bases on the assumption that 'test' and 'test1' have a one-one match from top down.
data test;
Input name $ surname $;
datalines;
ramesh Babu
Dishant parikh
;
Run;
data test1;
Input name1 $ Surname1 $;
datalines;
RamesH BABU
DISHANT PARIKH
disha Khanna
;
Run;
data compared;
merge test test1;
length dif_name dif_surname $ 50;
do _n_=1 to max(lengthn(name),lengthn(name1));
if char(name,_n_) ne char(name1,_n_) then dif_name=catx(',',dif_name,_n_);
end;
do _n_=1 to max(lengthn(surname),lengthn(surname1));
if char(surname,_n_) ne char(surname1,_n_) then dif_surname=catx(',',dif_surname,_n_);
end;
run;
Haikuo
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.