- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am running a program that creates a table new_table everyday. The very next day I want to create the deltas, meaning I want to create the insert, delete and update data sets between the old and new table.
Here is a minimal reproducible example:
data old_table; infile datalines4 delimiter=","; input ID $ START END; format START END yymmddd10.-l; datalines4; ID1,20332,2936547 ID2,20361,22249 ID3,20240,22249 ID4,20240,22249 ;;;; data new_table; infile datalines4 delimiter=","; input ID $ START END; format START END yymmddd10.-l; datalines4; ID1,20332,22249 ID2,20361,22249 ID4,20240,22249 ID5,20240,20820 ;;;;
Basically I want to end up with the following as an output:
Delete table:
ID3,20240,22249
Update table:
ID5,20240,20820
Insert table:
ID1,20332,22249
In summary:
- Deletion of ID3 (because not in the new table).
- Update of ID1 (because END date is different in the new table).
- Insert of ID5 (because not in old table).
- Nothing change for ID2 and ID4 (they are both in new and oldtables and their values does not change).
I would like to know what could be the most efficient way of doing this kind of comparison/updates. I was thinking maybe a hash table that would "explode" the table into the three desired output tables might be the go-to option here.
In a second time, I would like to know if it was possible to directly update this table (meaning deleting/inserting and updating the observations at the same time, without creating the intermediary tables like before).
I have seen @SuryaKiran's answer on this Hash Output to update the Initial Dataset but this seems to be missing the delete and insert actions (it only updated UD1).
data old_table;
if _n_=1 then do;
declare hash ud(dataset:'new_table');
ud.defineKey('ID');
ud.defineData(all:'y');
ud.defineDone();
end;
modify old_table;
rcUpdate = ud.find();
if rcUpdate=0 then replace;
run;
Nota Bene: I know that the easy solution is just to replace the old table with the new one. However, in the actual data sets 99% of the observations does not change at all. Hence I only want to "update" the table and not replacing it.
- Tags:
- sas
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For this minimal example you may use next code:
data deleted added updated;
merge old_table(in=inold rename=(start=_start end=_end)))
new_table(in=innew);
by ID;
if inold and not innew then output deleted; else
if innew and not inold then output added; else
if start ne _start or end ne _end then output updated;
run;
In case you need it for various pair tables with different variables and number of variables
this same code should be rewritten to get dynamically the variable names (from sashelp.vcolumn or sql dictionary.columns) of a table and generate the program to run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For this minimal example you may use next code:
data deleted added updated;
merge old_table(in=inold rename=(start=_start end=_end)))
new_table(in=innew);
by ID;
if inold and not innew then output deleted; else
if innew and not inold then output added; else
if start ne _start or end ne _end then output updated;
run;
In case you need it for various pair tables with different variables and number of variables
this same code should be rewritten to get dynamically the variable names (from sashelp.vcolumn or sql dictionary.columns) of a table and generate the program to run;