BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nav
Fluorite | Level 6 Nav
Fluorite | Level 6

Hi

 

I  need to generate a mismatch report by comparing  three variable values (one set) from one dataset with  2 sets  of variable values in a different dataset.

 

For example Dataset one has var1 var2 var3 etc. and dataset two has  var1 var2 var3  var4 var5 var6 etc.

 

I need to check whether  (one.var1 one.var2 and one. var3 )matches with( two. var1,two.var2 two.var3) or (two. var4, two.var5, two.var6) and generate a mismatch report having only mismatches(fallouts). of one.var1 onevar2 and one var3

 

I was wondering what could be the best way(efficient way) to program this scenaro.

 

Thanks in advance!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Nav
Fluorite | Level 6 Nav
Fluorite | Level 6

NAV@ ASTOUNDING,

 

Thank you !  

 

  Simple and smart solution. 

  Seems I am thinking too much and making it more complicated . 

 

I have tried and compared with my other program output where I have used Except operator.

GOT the same results as your program . 

 

Thank you very much everyone  for all your valuable time and  suggestions .

  

   

 

KIND REGARDS

NAV

View solution in original post

11 REPLIES 11
art297
Opal | Level 21

How many records in each dataset and is there an ID variable or, if not, are both datasets' records in the same order?

 

Art, CEO, AnalystFinder.com

art297
Opal | Level 21

Depending upon your answer, and whether the report has to be detailed (and, if so, to what level of specificity), you might be able to get by with something as simple as:

 

data one;
  input var1-var3;
  cards;
1 1 1
2 1 2
1 2 1
;
data two;
  input var1-var6;
  cards;
1 1 1 1 1 1
2 1 2 2 2 2
1 2 1 1 2 2
;

data both (keep=var1-var3 match);
  set one;
  set two (rename=(var1=_var1 var2=_var2 var3=_var3));
  array one(3) var1-var3;
  array twoa(3) _var1-_var3;
  array twob(3)  var4-var6;
  do i=1 to 3;
    if one(i) ne twoa(i) or one(i) ne twob(i) then match='mismatch';
  end;
run;

Art, CEO, AnalystFinder.com

 

Nav
Fluorite | Level 6 Nav
Fluorite | Level 6
Thank you very much!!
Will try this and let you know. I need only mismatches of dataset one. with var1 var2 and var3 .along with comments column where I need to specify which variable/variables not found in dataset two .
Nav
Fluorite | Level 6 Nav
Fluorite | Level 6

Hi ,

 

Thanks art297!!

There are no ID variables In my dataset  and order is also not same.

 

What I need to check is whether the value of ( one.var1 ||one. var2||one.var3) eq (two.var1||two.var2||two.var3) OR

( one.var1 ||one. var2||one.var3) eq ( two.var4 ||two. var5||two.var6) .

Right now I have 150 obs in data one and  data two is master dataset which has 500 obs .

Note: one.var3 two.var3 two.var6  might have missing values .

I have used proc sql with except operator but not sure whether my approach is right.

 

Thanks

NAV

 

 

 

 

 

 

art297
Opal | Level 21

To insure we're all on the same page, consider the following two datasets:

data one;
  input var1-var3;
  cards;
1 1 1
2 1 2
1 2 1
;
data two;
  input var1-var6;
  cards;
1 1 1 1 1 1
2 1 2 2 2 2
1 2 1 1 2 2
1 1 1 1 1 1
2 1 2 2 4 2
1 2 1 1 2 2
1 1 1 1 1 1
2 1 2 2 2 1
1 2 1 1 2 3
;

Given those two data sets, show us what you want your output to look like.

 

Art, CEO, AnalystFinder.com

 

Nav
Fluorite | Level 6 Nav
Fluorite | Level 6

Hi , 

 Sorry @art297 I should have given  dummy data  to avoid confusion.

 

Astounding
PROC Star

Here's a simple way:

 

data checklist (keep=var1-var3);

set two (keep=var1-var6);

output;

var1=var4;

var2=var5;

var3=var6;

output;

run;

 

proc sort data=checklist nodupkey;

by var1 var2 var3;

run;

 

proc sort data=one;

by var1 var2 var3;

run;

 

data mismatch;

merge one checklist (in=on_the_master);

by var1 var2 var3;

if on_the_master=0;

run;

 

Your data sets are small enough that "easiest to follow" is more important than "speediest to run".

art297
Opal | Level 21

@Nav: One aspect of your approach surprizes me as, earlier, you said that dataset one was the smaller of the two. Is it the one that has the desired entries and are their duplicates and/or conflicts within its entries?

 

Also, is case relevant or irrelevant?

 

Art, CEO, AnalystFinder.com

 

Nav
Fluorite | Level 6 Nav
Fluorite | Level 6

Hi @art297

 

Yes Dataset one is smaller  .In the example I have provided I didn't create lots of observations for dataset two, however  Dataset two has lots of observations .

Each observation of dataset one( var1 var2 var3) should be compared with two sets in data two  and if there is mismatch then 

only mismatched observations of dataset one should be in  the  mismatch dataset. 

 

Yes  it is Case relevant !!

 

Thank you very much!!

 

 

 

 

Nav
Fluorite | Level 6 Nav
Fluorite | Level 6

NAV@ ASTOUNDING,

 

Thank you !  

 

  Simple and smart solution. 

  Seems I am thinking too much and making it more complicated . 

 

I have tried and compared with my other program output where I have used Except operator.

GOT the same results as your program . 

 

Thank you very much everyone  for all your valuable time and  suggestions .

  

   

 

KIND REGARDS

NAV

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1348 views
  • 0 likes
  • 4 in conversation