DATA Step, Macro, Functions and more

compare two datasets

Accepted Solution Solved
Reply
Contributor
Posts: 28
Accepted Solution

compare two datasets

Hi SAS Users,

 

SAS dataset 1:

 

data have1;

input id var $ sex $;

datalines;

10 bala m

10 raju m

11 chak f

11 sudh m

12 keer f

;

run;

 

 

SAS dataset 2:

 

data have2;

input id var $ sex $ sal;

datalines;

10 balaa m 200

11 raju f 100

12 chak m 400

11 sudh m 500

12 keer m 350

;

run;

 

 

we need to compare have1 SAS dataset with have2 SAS dataset and output should contain unequal data (if any new variable is recorded that also we need to report in the output for e.g. sal is the extra variable here).

 

 

Kind Regards,

Raju


Accepted Solutions
Solution
‎08-30-2016 02:29 AM
Regular Contributor
Posts: 242

Re: compare two datasets

 

proc contents data=have2 out=contents2 noprint;
run;

proc sql noprint;
select name into :compare_col separated by ' '
from contents2
where varnum>3;
quit;
%put &compare_col ;

This is similar to what you have written below. But it can fail if key column lie after varnum >3

 

Do refer to @Reeza post.

 

If you think you got answer to your problem. Please mark it as solution. 

View solution in original post


All Replies
Regular Contributor
Posts: 242

Re: compare two datasets

[ Edited ]
data have1;
input id var $ sex $;
datalines;
10 bala m
10 raju m
11 chak f
11 sudh m
12 keer f
;
run;
 
 

 
data have2;
input id var $ sex $ sal ;
datalines;
10 balaa m 200
11 raju f 100
12 chak m 400
11 sudh m 500
12 keer m 350
;
run;
proc sort data=have1 out=have1_sorted;
by id;run;
proc sort data=have2 out=have2_sorted;;
by id;run;

proc compare base=have1_sorted compare=have2_sorted;
run;

 

If input dataset are sorted then no need to add proc sort code.

It will generate a report that can be saved as pdf file. 

 

Do check the proc compare it has much more options to use. I have given example of the most simplest.

 

Contributor
Posts: 28

Re: compare two datasets

Hi Rahul,

 

Thank you for your response.

 

Even I tried this one, but how to report it in Excel.

 

for the following code the output should look like the attached screenshot in Excelsheet.

 

data have1;
input id var $ sex $;
datalines;
10 bala m
11 chak f
12 keer f
;
run;

data have2;
input id var $ sex $;
datalines;
10 balaa m
11 sudh m
12 chak m
;
run;

Contributor
Posts: 28

Re: compare two datasets

Kindly find the attched screenshot.


Compare.PNG
Super User
Posts: 17,912

Re: compare two datasets

Use a SQL merge with a case statement to check for your conditions. 

 

Use proc export to get data to excel. 

Super User
Posts: 17,912

Re: compare two datasets

Use a SQL merge with a case statement to check for your conditions. 

 

Use proc export to get data to excel. 

Regular Contributor
Posts: 242

Re: compare two datasets


proc sort data=have1 out=have1_sorted;
by id;run;
proc sort data=have2 out=have2_sorted;;
by id;run;

proc transpose data=have1 out=have1_trans (rename= (_name_=col_name));
by id;
var var sex ;
run;

proc transpose data=have2 out=have2_trans(rename= (_name_=col_name)) ;
by id;
var var sex sal;
run;

proc sql;
create table compare_table
as
select coalesce(a.id,b.id) as ID,coalesce(a.col_name,b.col_name) as var
, a.col1 as old_val
,b.col1 as new_val
,case when a.col1 <> b.col1 and a.id= b.id then 'Modified'
      when a.col1 <> b.col1 and a.id=. then 'Added'
      when a.col1 <> b.col1 and b.id=. then 'Deleted' end as remark 

from have1_trans a
full join have2_trans b
on a.id = b.id and a.col_name = b.col_name;
quit;

You can then export the final table to xls using proc export.

Regular Contributor
Posts: 242

Re: compare two datasets

For testing purpose, I have kept all the rows. You can filter the rows from output dataset.

Contributor
Posts: 28

Re: compare two datasets

Hi Rahul & SAS Users,

 

data have1;

input id var $ record sex $ sal fold $ rate;

datalines;

10 bala 1 m 200 v1 5

10 bala 2 m 300 v1 4

10 bala 3 f 200 a2 6

11 sri 1 m 204 d2 2

11 sri 2 f 408 d4 3

12 koti 1 m 490 v3 4

12 koti 3 f 300 b3 5

12 koti 5 m 203 n4 3

;

run;

 

data have2;

input id var $ record sex $ sal fold $ rate;

datalines;

10 bala 1 f 220 v1 5

10 bala 2 m 300 v1 4

10 bala 3 m 230 a2 7

11 sri 1 m 204 v2 2

11 sri 2 m 468 d4 3

12 koti 1 f 491 n3 4

12 koti 3 m 300 b3 5

12 koti 5 f 203 n4 3

12 koti 6 m 213 n4 3

;

run;

 

now by keeping id, var and record as key coloumns (for e.g. (10 bala 1) record of have1 SAS dataset should compare with (10 bala 1) record of have2 SAS dataset) we need to compare rest of all variables and the output should be look like the screenshot attached which shown earlier.

 

if any new record comes (for e.g. (12 koti) is having two records in have1 SAS dataset but in have2 (12 koti 6) added so in this case it should show like new record added.)

 

Thank you Rahul for your quick response and patience.

 

Kind Regards,

Raju

Regular Contributor
Posts: 242

Re: compare two datasets

[ Edited ]
data have1;
input id var $ record sex $ sal fold $ rate;
datalines;
10 bala 1 m 200 v1 5
10 bala 2 m 300 v1 4
10 bala 3 f 200 a2 6
11 sri 1 m 204 d2 2
11 sri 2 f 408 d4 3
12 koti 1 m 490 v3 4
12 koti 3 f 300 b3 5
12 koti 5 m 203 n4 3
;
run;
 
data have2;
input id var $ record sex $ sal fold $ rate;
datalines;
10 bala 1 f 220 v1 5
10 bala 2 m 300 v1 4
10 bala 3 m 230 a2 7
11 sri 1 m 204 v2 2
11 sri 2 m 468 d4 3
12 koti 1 f 491 n3 4
12 koti 3 m 300 b3 5
12 koti 5 f 203 n4 3
12 koti 6 m 213 n4 3
;
run;

%let key_col=id var  record ;
%let compare_col=sex sal fold rate;

%let key_col_cnt=%sysfunc(countw(&key_col));
%macro select_col;
%do i=1 %to &key_col_cnt;
%let col=%sysfunc(scan( &key_col,&i));
coalesce(a.&col,b.&col) as &col,
%end;
%mend;



%macro join_cond;
%do i=1 %to &key_col_cnt;
%let col=%sysfunc(scan( &key_col,&i));
 a.&col =b.&col and 
%end;
%mend;


proc sort data=have1 out=have1_sorted;
by &key_col ;run;
proc sort data=have2 out=have2_sorted;;
by &key_col ;run;

proc transpose data=have1 out=have1_trans (rename= (_name_=col_name));
by &key_col ;
var  &compare_col ;
run;

proc transpose data=have2 out=have2_trans(rename= (_name_=col_name)) ;
by &key_col ;
var &compare_col;
run;

proc sql;
create table compare_table
as
select %select_col
coalesce(a.col_name,b.col_name) as variable_name
,strip(a.col1) as old_val
,strip(b.col1) as new_val
,case when strip(a.col1) <> strip(b.col1) and a.id= b.id and a.record =b.record then 'Modified'
      when strip(a.col1) <> strip(b.col1) and a.id=. then 'Added'
      when strip(a.col1) <> strip(b.col1) and b.id=. then 'Deleted' end as remark 

from have1_trans a
full join have2_trans b
on %join_cond
 a.col_name = b.col_name;
quit;


Automated the above code using two macro variable key_col and compare_col.

User need to supply value to these two macro variables and proc sql changes would be handled itself.

 

%let key_col=id var  record ;
%let compare_col=sex sal fold rate;

 

Here there is an assumption key_col and compare_col   columns should be present both in have1 and have2. 

Contributor
Posts: 28

Re: compare two datasets

You are awesome Rahul.

 

as you said user will supply the key coloums for key_col macro but can we a code like to take rest all coloums as compare_col.

Regular Contributor
Posts: 242

Re: compare two datasets

Surely, you can take it as an excerise yourself. List all the columns using proc dataset and remove key_col. The left would be compare_col list.

Contributor
Posts: 28

Re: compare two datasets

Hi Rahul,

 

Thank you for giving me a chance to learn.

 

I have tried the following code and identified the other coloumns. now how to give all these in compare_col macro.

 

proc contents data=have2 out=contents2 noprint;
run;

proc sort data=contents2;
by varnum;
where varnum > 3;
run;

 

Kind Regards,
Raju


assign.PNG
Super User
Posts: 17,912

Re: compare two datasets

Please see the recently update macro example section to find the solution to your problem. 

 

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

 

 

Solution
‎08-30-2016 02:29 AM
Regular Contributor
Posts: 242

Re: compare two datasets

 

proc contents data=have2 out=contents2 noprint;
run;

proc sql noprint;
select name into :compare_col separated by ' '
from contents2
where varnum>3;
quit;
%put &compare_col ;

This is similar to what you have written below. But it can fail if key column lie after varnum >3

 

Do refer to @Reeza post.

 

If you think you got answer to your problem. Please mark it as solution. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 25 replies
  • 656 views
  • 10 likes
  • 4 in conversation