DATA Step, Macro, Functions and more

base sas

Reply
Frequent Contributor
Posts: 115

base sas

Could you please help me on below

Want to compare two datasets(new1 and old1) on multiple columns, also has unique key variable on both datasets

if Unique key variable and all remaining columns are matched in two datasets (new1 and old1) then write status as "Equal"
if Unique key variable is existing in new1 and not existing in old1  then write status as "New"
if Unique key variable is not existing in new1 and existing in old1 on key variable then write status as "Deleted"
if Unique key variable is matched and all remaining columns are not matched on both new1 and old1 datasets then write status as "Updated"

Super User
Posts: 11,343

Re: base sas

Posted in reply to sunilreddy

What do you mean by compare "on multiple columns"? Compare the values of some, any  or all variables other than the key for each matching key? or only for those in New1 or only in Old1? Do you want to create a data set with the "Equal" "New" or "Deleted" values?

I would start by sorting each dataset on the key variable.

Then

Proc compare base=old1 compare=new1; Run;

There are options for compare that allow selecting results from the base or the compare dataset, but this should get you started as to what is going on in the data.

Regular Contributor
Posts: 227

Re: base sas

Posted in reply to sunilreddy

The way I read your Q

you are looking to identify the differences between two snapshots.

This macro handles that problem:

http://www.sascommunity.org/wiki/Macro_Extract

Ron Fehd  transaction maven

Regular Contributor
Posts: 195

Re: base sas

Posted in reply to sunilreddy

Hi,

The way you have presanted your query, possible solution is as follow...I hope this works...

In this example, i have created dataset containing households with sales and units...that is table_A contains hhd,sales and units for one period and table_B contains hhd,sales and units for second period...and compare both period households and assign your mentioned conditions...In my code, HHD is unique key in both the datasets...

data table_A;
   input hhd $ sales units;
   cards;
A 10 4
B 20 5
C 30 2
D 40 3
F 60 1
;

data table_B;
   input hhd $ sales units;
   cards;
A 10 4
B 20 5
C 30 2
D 40 3
E 50 4
D 60 .
;

proc sort data = table_A;
   by hhd;
run;

proc sort data = table_B;
   by hhd;
run;

%macro compare;

proc contents data = table_A out = list_A(keep = name) noprint;
run;

proc contents data = table_B out = list_B(keep = name) noprint;
run;

proc sql noprint;
  select count(name) into :count_A
  from list_A;

  select count(name) into :count_B
  from list_B;

  select name into :name_A separated by " "
  from list_A;

  select name into :name_B separated by " "
  from list_B;
quit;

proc datasets nolist;
   modify table_B;
   rename %do i = 1 %to &count_B.;
              %scan(&name_B.,&i.) = %scan(&name_B.,&i.)_Y2
    %end;;
quit;

proc datasets nolist;
   modify table_A;
   rename %do i = 1 %to &count_A.;
              %scan(&name_A.,&i.) = %scan(&name_A.,&i.)_Y1
    %end;;
quit;

data both(drop = _temp: in_A in_B);
   retain hhd sales_Y1 units_Y1 sales_Y2 units_Y2 item;
   length item $10.;
   merge table_A(in = a rename = (hhd_Y1 = hhd))
         table_B(in = b rename = (hhd_Y2 = hhd));
   by hhd;
   in_A = a;
   in_B = b;

   %do i = 2 %to &count_B.;

       _temp&i. = %scan(&name_B.,&i.)_Y2 - %scan(&name_A.,&i.)_Y1;

    if (in_A = 1 and in_B = 1) and _temp&i. = 0 then item = "Equal";
    if in_a = 0 and in_B = 1 then item = "New";
    if in_a = 1 and in_B = 0 then item = "Deleted";
    if (in_a = 1 and in_B = 1) and (_temp&i. NE 0 or _temp&i. = .) then item = "Updated";

   %end;

run;

%mend;

%compare;

-Urvish

Respected Advisor
Posts: 3,156

Re: base sas

Posted in reply to UrvishShah

Here is another approach using Metadata, raw input borrowed from Urvish:

data table_A;

  input hhd $ sales units;

  cards;

A 10 4

B 20 5

C 30 2

D 40 3

F 60 1

;

data table_B;

  input hhd $ sales units;

  cards;

A 10 4

B 20 5

C 30 2

D 40 3

E 50 4

D 60 .

;

proc sort data = table_A;

  by hhd;

run;

proc sort data = table_B;

  by hhd;

run;

proc sql NOPRINT;

  select cats(name,'=','new_',name) into :ren separated by ' '

from dictionary.columns

where libname='WORK'

AND MEMNAME='TABLE_A'

AND UPCASE(NAME) NE 'HHD'

;

SELECT CATS(NAME,'=','NEW_',NAME) INTO :COMP SEPARATED BY ' AND '

from dictionary.columns

where libname='WORK'

AND MEMNAME='TABLE_A'

AND UPCASE(NAME) NE 'HHD'

;

QUIT;

data want;

  merge TABLE_A(IN=OLD) TABLE_B(IN=NEW RENAME=(&REN));

  BY HHD;

LENGTH STATUS $8.;

IF OLD AND NEW THEN DO;

IF &COMP THEN STATUS='EQUAL';

ELSE STATUS='UPDATED';

END;

ELSE IF OLD AND NOT NEW THEN STATUS='DELETED';

ELSE IF NEW AND NOT OLD THEN STATUS='NEW';

RUN;

Haikuo

Regular Contributor
Posts: 195

Re: base sas

Hi Haikuo,

Your code is also meeting the requirement...Good one...

Ask a Question
Discussion stats
  • 5 replies
  • 237 views
  • 1 like
  • 5 in conversation