BookmarkSubscribeRSS Feed
wkossack_nspirehealth_com
Calcite | Level 5

I have a dataset that new data and changes are added to over time.  Each month a copy is saved

I'm looking for a way to compare the current data to the last monthly copy and output an incremental file that is loaded into a database

For example, if a new row is added to the data, it needs to be output.  Also if a value changes that was previously loaded the entire row needs to be output

key fields are subject and visit and test_number.  There are 40 other data values both text and numeric

I've been working with proc compare but run into a problem.  Several variables need to be left out of the compare but included in the incremental file but not used to compare.  Short of rejoining the output file back to the current data how do I create an incremental file with the non-compare variables?

8 REPLIES 8
DBailey
Lapis Lazuli | Level 10

proc sql;

create table work.changes as

select

     t1.*

from

     have t1

     left outer join prior_have t2

          on t1.visit=t2.visit and t1.test_number=t2.test_number

where

     t2.test_number is null /*newly added records*/

     or (  /*+changed records*/

          t1.col1 ne t2.col1

          or t1.col2 ne t2.col2

          .

          .

          .

          );

quit;

wkossack_nspirehealth_com
Calcite | Level 5

I thought of something like that except there are a large number of columns to compare

ballardw
Super User

If the order of the data isn't different, i.e. the changes are at the end, OR they can be matched by some subset of varaiables to uniquely identify each record (sort both datasets accordingly) then you may be able to use Proc Compare with the LISTCOMPOBS option.

pradeepalankar
Obsidian | Level 7

hi,

didn't understood your query completely in first place, based on what i understood is the below solution:

if you need changes/new records which are there in new table and not in old one,

/*1. compared based on some key columns.*/

if this so then try this:

data previous_table_temp;

set previous_table;

compare_value1=md5(cats(key_column1,.....,key_columnN));

run;

data current_table_temp;

set current_table;

compare_value2=md5(cats(key_column1,.....,key_columnN));

run;

proc sql;

create table incremantal_copy(drop=compare_value2) as

select * from current_table_temp where compare_value2 not in (select compare_value1 from previous_table_temp);

quit;

if there are no such thing like "Key_Columns" over which comparison should be done then simple EXCEPT will give you the result:

proc sql;

create table incremantal_copy as

select * from current_table

except

select * from previous_table;

quit;

check if this helps...and if you need something else please explain.

wkossack_nspirehealth_com
Calcite | Level 5

I like it

I've coded this and given it to someone to more completely check the output

pradeepalankar
Obsidian | Level 7

Ok, let us know if it helps Smiley Happy

wkossack_nspirehealth_com
Calcite | Level 5

I have not heard complaints from the person needing the code change.  My assumption is that this worked

thanks

Patrick
Opal | Level 21

You could create a digest value over all your columns for which you want to have some sort of change tracking.

Below code looks a bit complicated but I've created it this way to reduce data volumes transferred between the database and SAS. Even more efficient would be to maintain an XREF table for already loaded data on the SAS side. But this would require even more coding.

If there is a lot of change then a more efficient approach would be to load your whole source dataset into a temporary table in the database and then have all the processing there. The main idea of creating a digest value and store it with the data remains though.

For the code to perform you will need an index over your key for the table in the database.

libname dbref (work);
libname sasref (work);

data dbref.old_month_in_db;
  key=_n_;
  set sashelp.class;
  digest_value = put(MD5(catx('|',name,age,height)), $hex32.);
run;

data sasref.new_month_in_SAS;
  key=_n_;
  set sashelp.class end=last;
  if name in ('Henry','James') then age=16;
  if name='Alfred' then height=90;
  if name='Louise' then weight=60;
  output;
  if last then
    do;
      key+1;
      name='New Guy';
      age=12;
      output;
    end;
run;


data sasref.x_ref_new_month_in_SAS (keep=key digest_value);
  set sasref.new_month_in_SAS(keep=key name age height);
  digest_value = put(MD5(catx('|',name,age,height)), $hex32.);
run;


proc datasets lib=dbref nolist nowarn;
  delete x_ref_new_month_in_db_temp_table;
  run;
  append data=sasref.x_ref_new_month_in_SAS out=dbref.x_ref_new_month_in_db_temp_table;
  run;
run;


proc sql;
  create table sasref.change_or_new_rec_in_sas as
    select
      new.key,
      case when missing(old.key) then '1' else '0' end as _New_Flg length=1
    from dbref.x_ref_new_month_in_db_temp_table as new left join dbref.old_month_in_db as old
      on new.key=old.key
    having missing(old.key) or new.digest_value ne old.digest_value
    ;
quit;


data sasref.new_rec sasref.change_rec;
  set sasref.new_month_in_SAS;
  if _n_=1 then
    do;
      declare hash h1(dataset:'sasref.change_or_new_rec_in_sas');
      _rc=h1.defineKey('key');
      _rc=h1.defineData('_New_Flg');
      _rc=h1.defineDone();
      drop _rc _New_Flg;
    end;

  if h1.find()=0 then
    do;
      if _New_Flg='1' then output sasref.new_rec;
      else if _New_Flg='0' then output sasref.change_rec;
    end;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1988 views
  • 0 likes
  • 5 in conversation