BookmarkSubscribeRSS Feed
sunilreddy
Fluorite | Level 6

Could you please help me on below

Want to compare two datasets(new1 and old1) on multiple columns, also has unique key variable on both datasets

if Unique key variable and all remaining columns are matched in two datasets (new1 and old1) then write status as "Equal"
if Unique key variable is existing in new1 and not existing in old1  then write status as "New"
if Unique key variable is not existing in new1 and existing in old1 on key variable then write status as "Deleted"
if Unique key variable is matched and all remaining columns are not matched on both new1 and old1 datasets then write status as "Updated"

5 REPLIES 5
ballardw
Super User

What do you mean by compare "on multiple columns"? Compare the values of some, any  or all variables other than the key for each matching key? or only for those in New1 or only in Old1? Do you want to create a data set with the "Equal" "New" or "Deleted" values?

I would start by sorting each dataset on the key variable.

Then

Proc compare base=old1 compare=new1; Run;

There are options for compare that allow selecting results from the base or the compare dataset, but this should get you started as to what is going on in the data.

Ron_MacroMaven
Lapis Lazuli | Level 10

The way I read your Q

you are looking to identify the differences between two snapshots.

This macro handles that problem:

http://www.sascommunity.org/wiki/Macro_Extract

Ron Fehd  transaction maven

UrvishShah
Fluorite | Level 6

Hi,

The way you have presanted your query, possible solution is as follow...I hope this works...

In this example, i have created dataset containing households with sales and units...that is table_A contains hhd,sales and units for one period and table_B contains hhd,sales and units for second period...and compare both period households and assign your mentioned conditions...In my code, HHD is unique key in both the datasets...

data table_A;
   input hhd $ sales units;
   cards;
A 10 4
B 20 5
C 30 2
D 40 3
F 60 1
;

data table_B;
   input hhd $ sales units;
   cards;
A 10 4
B 20 5
C 30 2
D 40 3
E 50 4
D 60 .
;

proc sort data = table_A;
   by hhd;
run;

proc sort data = table_B;
   by hhd;
run;

%macro compare;

proc contents data = table_A out = list_A(keep = name) noprint;
run;

proc contents data = table_B out = list_B(keep = name) noprint;
run;

proc sql noprint;
  select count(name) into :count_A
  from list_A;

  select count(name) into :count_B
  from list_B;

  select name into :name_A separated by " "
  from list_A;

  select name into :name_B separated by " "
  from list_B;
quit;

proc datasets nolist;
   modify table_B;
   rename %do i = 1 %to &count_B.;
              %scan(&name_B.,&i.) = %scan(&name_B.,&i.)_Y2
    %end;;
quit;

proc datasets nolist;
   modify table_A;
   rename %do i = 1 %to &count_A.;
              %scan(&name_A.,&i.) = %scan(&name_A.,&i.)_Y1
    %end;;
quit;

data both(drop = _temp: in_A in_B);
   retain hhd sales_Y1 units_Y1 sales_Y2 units_Y2 item;
   length item $10.;
   merge table_A(in = a rename = (hhd_Y1 = hhd))
         table_B(in = b rename = (hhd_Y2 = hhd));
   by hhd;
   in_A = a;
   in_B = b;

   %do i = 2 %to &count_B.;

       _temp&i. = %scan(&name_B.,&i.)_Y2 - %scan(&name_A.,&i.)_Y1;

    if (in_A = 1 and in_B = 1) and _temp&i. = 0 then item = "Equal";
    if in_a = 0 and in_B = 1 then item = "New";
    if in_a = 1 and in_B = 0 then item = "Deleted";
    if (in_a = 1 and in_B = 1) and (_temp&i. NE 0 or _temp&i. = .) then item = "Updated";

   %end;

run;

%mend;

%compare;

-Urvish

Haikuo
Onyx | Level 15

Here is another approach using Metadata, raw input borrowed from Urvish:

data table_A;

  input hhd $ sales units;

  cards;

A 10 4

B 20 5

C 30 2

D 40 3

F 60 1

;

data table_B;

  input hhd $ sales units;

  cards;

A 10 4

B 20 5

C 30 2

D 40 3

E 50 4

D 60 .

;

proc sort data = table_A;

  by hhd;

run;

proc sort data = table_B;

  by hhd;

run;

proc sql NOPRINT;

  select cats(name,'=','new_',name) into :ren separated by ' '

from dictionary.columns

where libname='WORK'

AND MEMNAME='TABLE_A'

AND UPCASE(NAME) NE 'HHD'

;

SELECT CATS(NAME,'=','NEW_',NAME) INTO :COMP SEPARATED BY ' AND '

from dictionary.columns

where libname='WORK'

AND MEMNAME='TABLE_A'

AND UPCASE(NAME) NE 'HHD'

;

QUIT;

data want;

  merge TABLE_A(IN=OLD) TABLE_B(IN=NEW RENAME=(&REN));

  BY HHD;

LENGTH STATUS $8.;

IF OLD AND NEW THEN DO;

IF &COMP THEN STATUS='EQUAL';

ELSE STATUS='UPDATED';

END;

ELSE IF OLD AND NOT NEW THEN STATUS='DELETED';

ELSE IF NEW AND NOT OLD THEN STATUS='NEW';

RUN;

Haikuo

UrvishShah
Fluorite | Level 6

Hi Haikuo,

Your code is also meeting the requirement...Good one...

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 850 views
  • 1 like
  • 5 in conversation