I have a dataset that new data and changes are added to over time. Each month a copy is saved
I'm looking for a way to compare the current data to the last monthly copy and output an incremental file that is loaded into a database
For example, if a new row is added to the data, it needs to be output. Also if a value changes that was previously loaded the entire row needs to be output
key fields are subject and visit and test_number. There are 40 other data values both text and numeric
I've been working with proc compare but run into a problem. Several variables need to be left out of the compare but included in the incremental file but not used to compare. Short of rejoining the output file back to the current data how do I create an incremental file with the non-compare variables?
proc sql;
create table work.changes as
select
t1.*
from
have t1
left outer join prior_have t2
on t1.visit=t2.visit and t1.test_number=t2.test_number
where
t2.test_number is null /*newly added records*/
or ( /*+changed records*/
t1.col1 ne t2.col1
or t1.col2 ne t2.col2
.
.
.
);
quit;
I thought of something like that except there are a large number of columns to compare
If the order of the data isn't different, i.e. the changes are at the end, OR they can be matched by some subset of varaiables to uniquely identify each record (sort both datasets accordingly) then you may be able to use Proc Compare with the LISTCOMPOBS option.
hi,
didn't understood your query completely in first place, based on what i understood is the below solution:
if you need changes/new records which are there in new table and not in old one,
/*1. compared based on some key columns.*/
if this so then try this:
data previous_table_temp;
set previous_table;
compare_value1=md5(cats(key_column1,.....,key_columnN));
run;
data current_table_temp;
set current_table;
compare_value2=md5(cats(key_column1,.....,key_columnN));
run;
proc sql;
create table incremantal_copy(drop=compare_value2) as
select * from current_table_temp where compare_value2 not in (select compare_value1 from previous_table_temp);
quit;
if there are no such thing like "Key_Columns" over which comparison should be done then simple EXCEPT will give you the result:
proc sql;
create table incremantal_copy as
select * from current_table
except
select * from previous_table;
quit;
check if this helps...and if you need something else please explain.
I like it
I've coded this and given it to someone to more completely check the output
Ok, let us know if it helps
I have not heard complaints from the person needing the code change. My assumption is that this worked
thanks
You could create a digest value over all your columns for which you want to have some sort of change tracking.
Below code looks a bit complicated but I've created it this way to reduce data volumes transferred between the database and SAS. Even more efficient would be to maintain an XREF table for already loaded data on the SAS side. But this would require even more coding.
If there is a lot of change then a more efficient approach would be to load your whole source dataset into a temporary table in the database and then have all the processing there. The main idea of creating a digest value and store it with the data remains though.
For the code to perform you will need an index over your key for the table in the database.
libname dbref (work);
libname sasref (work);
data dbref.old_month_in_db;
key=_n_;
set sashelp.class;
digest_value = put(MD5(catx('|',name,age,height)), $hex32.);
run;
data sasref.new_month_in_SAS;
key=_n_;
set sashelp.class end=last;
if name in ('Henry','James') then age=16;
if name='Alfred' then height=90;
if name='Louise' then weight=60;
output;
if last then
do;
key+1;
name='New Guy';
age=12;
output;
end;
run;
data sasref.x_ref_new_month_in_SAS (keep=key digest_value);
set sasref.new_month_in_SAS(keep=key name age height);
digest_value = put(MD5(catx('|',name,age,height)), $hex32.);
run;
proc datasets lib=dbref nolist nowarn;
delete x_ref_new_month_in_db_temp_table;
run;
append data=sasref.x_ref_new_month_in_SAS out=dbref.x_ref_new_month_in_db_temp_table;
run;
run;
proc sql;
create table sasref.change_or_new_rec_in_sas as
select
new.key,
case when missing(old.key) then '1' else '0' end as _New_Flg length=1
from dbref.x_ref_new_month_in_db_temp_table as new left join dbref.old_month_in_db as old
on new.key=old.key
having missing(old.key) or new.digest_value ne old.digest_value
;
quit;
data sasref.new_rec sasref.change_rec;
set sasref.new_month_in_SAS;
if _n_=1 then
do;
declare hash h1(dataset:'sasref.change_or_new_rec_in_sas');
_rc=h1.defineKey('key');
_rc=h1.defineData('_New_Flg');
_rc=h1.defineDone();
drop _rc _New_Flg;
end;
if h1.find()=0 then
do;
if _New_Flg='1' then output sasref.new_rec;
else if _New_Flg='0' then output sasref.change_rec;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.