I am running a program that creates a table new_table everyday. The very next day I want to create the deltas, meaning I want to create the insert, delete and update data sets between the old and new table.
Here is a minimal reproducible example:
data old_table; infile datalines4 delimiter=","; input ID $ START END; format START END yymmddd10.-l; datalines4; ID1,20332,2936547 ID2,20361,22249 ID3,20240,22249 ID4,20240,22249 ;;;; data new_table; infile datalines4 delimiter=","; input ID $ START END; format START END yymmddd10.-l; datalines4; ID1,20332,22249 ID2,20361,22249 ID4,20240,22249 ID5,20240,20820 ;;;;
Basically I want to end up with the following as an output:
Delete table:
ID3,20240,22249
Update table:
ID5,20240,20820
Insert table:
ID1,20332,22249
In summary:
I would like to know what could be the most efficient way of doing this kind of comparison/updates. I was thinking maybe a hash table that would "explode" the table into the three desired output tables might be the go-to option here.
In a second time, I would like to know if it was possible to directly update this table (meaning deleting/inserting and updating the observations at the same time, without creating the intermediary tables like before).
I have seen @SuryaKiran's answer on this Hash Output to update the Initial Dataset but this seems to be missing the delete and insert actions (it only updated UD1).
data old_table;
if _n_=1 then do;
declare hash ud(dataset:'new_table');
ud.defineKey('ID');
ud.defineData(all:'y');
ud.defineDone();
end;
modify old_table;
rcUpdate = ud.find();
if rcUpdate=0 then replace;
run;
Nota Bene: I know that the easy solution is just to replace the old table with the new one. However, in the actual data sets 99% of the observations does not change at all. Hence I only want to "update" the table and not replacing it.
For this minimal example you may use next code:
data deleted added updated;
merge old_table(in=inold rename=(start=_start end=_end)))
new_table(in=innew);
by ID;
if inold and not innew then output deleted; else
if innew and not inold then output added; else
if start ne _start or end ne _end then output updated;
run;
In case you need it for various pair tables with different variables and number of variables
this same code should be rewritten to get dynamically the variable names (from sashelp.vcolumn or sql dictionary.columns) of a table and generate the program to run;
For this minimal example you may use next code:
data deleted added updated;
merge old_table(in=inold rename=(start=_start end=_end)))
new_table(in=innew);
by ID;
if inold and not innew then output deleted; else
if innew and not inold then output added; else
if start ne _start or end ne _end then output updated;
run;
In case you need it for various pair tables with different variables and number of variables
this same code should be rewritten to get dynamically the variable names (from sashelp.vcolumn or sql dictionary.columns) of a table and generate the program to run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.