Re: output incremental data changes

wkossack_nspirehealth_com · Posted 02-24-2014 04:00 PM

I have a dataset that new data and changes are added to over time. Each month a copy is saved

I'm looking for a way to compare the current data to the last monthly copy and output an incremental file that is loaded into a database

For example, if a new row is added to the data, it needs to be output. Also if a value changes that was previously loaded the entire row needs to be output

key fields are subject and visit and test_number. There are 40 other data values both text and numeric

I've been working with proc compare but run into a problem. Several variables need to be left out of the compare but included in the incremental file but not used to compare. Short of rejoining the output file back to the current data how do I create an incremental file with the non-compare variables?

DBailey · Posted 02-24-2014 04:16 PM

proc sql;

create table work.changes as

select

t1.*

from

have t1

left outer join prior_have t2

on t1.visit=t2.visit and t1.test_number=t2.test_number

where

t2.test_number is null /*newly added records*/

or ( /*+changed records*/

t1.col1 ne t2.col1

or t1.col2 ne t2.col2

.

);

quit;

wkossack_nspirehealth_com · Posted 02-24-2014 04:23 PM

I thought of something like that except there are a large number of columns to compare

ballardw · Posted 02-24-2014 05:22 PM

If the order of the data isn't different, i.e. the changes are at the end, OR they can be matched by some subset of varaiables to uniquely identify each record (sort both datasets accordingly) then you may be able to use Proc Compare with the LISTCOMPOBS option.

pradeepalankar · Posted 02-25-2014 08:40 AM

hi,

didn't understood your query completely in first place, based on what i understood is the below solution:

if you need changes/new records which are there in new table and not in old one,

/*1. compared based on some key columns.*/

if this so then try this:

data previous_table_temp;

set previous_table;

compare_value1=md5(cats(key_column1,.....,key_columnN));

run;

data current_table_temp;

set current_table;

compare_value2=md5(cats(key_column1,.....,key_columnN));

run;

proc sql;

create table incremantal_copy(drop=compare_value2) as

select * from current_table_temp where compare_value2 not in (select compare_value1 from previous_table_temp);

quit;

if there are no such thing like "Key_Columns" over which comparison should be done then simple EXCEPT will give you the result:

proc sql;

create table incremantal_copy as

select * from current_table

except

select * from previous_table;

quit;

check if this helps...and if you need something else please explain.

wkossack_nspirehealth_com · Posted 02-25-2014 04:22 PM

I like it

I've coded this and given it to someone to more completely check the output

pradeepalankar · Posted 02-26-2014 04:29 AM

Ok, let us know if it helps

wkossack_nspirehealth_com · Posted 04-08-2014 10:58 AM

I have not heard complaints from the person needing the code change. My assumption is that this worked

thanks

Patrick · Posted 02-26-2014 06:10 AM

You could create a digest value over all your columns for which you want to have some sort of change tracking.

Below code looks a bit complicated but I've created it this way to reduce data volumes transferred between the database and SAS. Even more efficient would be to maintain an XREF table for already loaded data on the SAS side. But this would require even more coding.

If there is a lot of change then a more efficient approach would be to load your whole source dataset into a temporary table in the database and then have all the processing there. The main idea of creating a digest value and store it with the data remains though.

For the code to perform you will need an index over your key for the table in the database.

libname dbref (work);
libname sasref (work);

data dbref.old_month_in_db;
key=_n_;
set sashelp.class;
digest_value = put(MD5(catx('|',name,age,height)), $hex32.);
run;

data sasref.new_month_in_SAS;
key=_n_;
set sashelp.class end=last;
if name in ('Henry','James') then age=16;
if name='Alfred' then height=90;
if name='Louise' then weight=60;
output;
if last then
    do;
      key+1;
      name='New Guy';
      age=12;
      output;
    end;
run;

data sasref.x_ref_new_month_in_SAS (keep=key digest_value);
set sasref.new_month_in_SAS(keep=key name age height);
digest_value = put(MD5(catx('|',name,age,height)), $hex32.);
run;

proc datasets lib=dbref nolist nowarn;
delete x_ref_new_month_in_db_temp_table;
run;
append data=sasref.x_ref_new_month_in_SAS out=dbref.x_ref_new_month_in_db_temp_table;
run;
run;

proc sql;
create table sasref.change_or_new_rec_in_sas as
    select
      new.key,
      case when missing(old.key) then '1' else '0' end as _New_Flg length=1
    from dbref.x_ref_new_month_in_db_temp_table as new left join dbref.old_month_in_db as old
      on new.key=old.key
    having missing(old.key) or new.digest_value ne old.digest_value
    ;
quit;

data sasref.new_rec sasref.change_rec;
set sasref.new_month_in_SAS;
if _n_=1 then
    do;
      declare hash h1(dataset:'sasref.change_or_new_rec_in_sas');
      _rc=h1.defineKey('key');
      _rc=h1.defineData('_New_Flg');
      _rc=h1.defineDone();
      drop _rc _New_Flg;
    end;

if h1.find()=0 then
    do;
      if _New_Flg='1' then output sasref.new_rec;
      else if _New_Flg='0' then output sasref.change_rec;
    end;
run;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away