Find the differences between two datasets and combine

Reply
Contributor
Posts: 55

Find the differences between two datasets and combine

Hi,

I have the below two datasets (old and new) and need to find the differences between them considering "ID" as matching column in both. Old dataset should be considered as Base dataset. I want output as attached "Desired output.xls". Please help.

--Base dataset.

data one;

input id type$ batch_no$ dose $ caps;

datalines;

11 Base 101-101 120MG 10

11 Base 101-102 240MG 12

12 Base 101-111 120MG 13

12 Base 101-111 240MG 14

;

run;

--New dataset.

data two;

input name $ id type$ batch_no$ dose $ caps;

datalines;

ABCD123 11 Base 101-101 120MG 10

ABCD123 12 Base 101-112 120MG 10

ABCD123 12 Base 101-123 240MG 12

;

run;

Super Contributor
Posts: 305

Re: Find the differences between two datasets and combine

Hello,

There are some cases where the below code will not be enough and shall be  (for example , if you have in BASE:

11 Base 101-101 120MG 10

11 Base 101-102 240MG 12

11 Base 101-103 240MG 12

11 Base 101-104 240MG 12

and in COMPARE

ABCD123 11 Base 101-101 120MG 10

ABCD123 11 Base 101-103 120MG 10

ABCD123 11 Base 101-104 120MG 10 )

data base;
input id base_type $ base_batch_no $ base_dose $ base_caps;
datalines;
11 Base 101-101 120MG 10
11 Base 101-101 120MG 10
11 Base 101-102 240MG 12
12 Base 101-111 120MG 13
12 Base 101-111 240MG 14
;
run;

data compare;
input new_name $ id new_type $ new_batch_no $ new_dose $ new_caps;
datalines;
ABCD123 11 Base 101-101 120MG 10
ABCD123 12 Base 101-112 120MG 10
ABCD123 12 Base 101-112 120MG 10
ABCD123 12 Base 101-123 240MG 12
;
run;

proc sort data=base;
by id base_batch_no;
run;

proc sort data=compare;
by id new_batch_no;
run;

data want;
merge base compare ;
by id ;

if base_batch_no eq new_batch_no then do;
call missing( of new: );
flag_delete=1;
end;

if missing(flag_delete);


drop flag_delete;
run;

Contributor
Posts: 55

Re: Find the differences between two datasets and combine

Hi Loko, Thanks for the code however, it's not meeting the requirement as batch_no is considered as matching condition wherein only "ID" to be considered as matching condition.

Super Contributor
Posts: 305

Re: Find the differences between two datasets and combine

Hello,

If  batch_no is considered as matching condition you will never get the combination:

12 Base 101-111 ABCD123 12 Base 101-112 120MG 10

That's the reason it has seemed to me more logical to match only by ID.

Grand Advisor
Posts: 9,571

Re: Find the differences between two datasets and combine

If I understood what you mean.

data one;
input id type$ batch_no$ dose $ caps;
datalines;
11 Base 101-101 120MG 10
11 Base 101-102 240MG 12
12 Base 101-111 120MG 13
12 Base 101-111 240MG 14
;
run;
 
data two;
input name $ id type$ batch_no$ dose $ caps;
datalines;
ABCD123 11 Base 101-101 120MG 10
ABCD123 12 Base 101-112 120MG 10
ABCD123 12 Base 101-123 240MG 12
;
run;

proc sql;
create table old as
 select id as _id,id as old_id, type as old_type, batch_no as old_batch_no,dose as old_dose, caps as old_caps
  from one
  except
 select id as _id,* from two(drop=name);

 create table new as
 select id as _id,id as new_id, type as new_type, batch_no as new_batch_no,dose as new_dose, caps as new_caps
  from two
   except
 select id as _id,* from one ;

quit;
data want;
 merge old new two(keep=id name rename=(id=_id));
 by _id;
 drop _id;
run;

Xia Keshan

Ask a Question
Discussion stats
  • 4 replies
  • 413 views
  • 0 likes
  • 3 in conversation