In order to automate my report, I completed all ETL precess and created a final table. Now I have to setup an incremental load.
Can anyone help me to provide an efficient sample code that loads the table on a daily basis by remove the last 5 days of data and load it again.
Attached here the sample data. Have1 has data till 25th April 2020 and Have2 has data till 26th April i.e today and the data gets updated on a daily basis. There are chances that the last 5 days data can be modified so always while loading data I need to delete the last 5 days of data or overwrite. This how I need to set up incremental.
In data Have2 marked few rows as edited.
Please try to do a full merge as below
proc sort data=have1;
by ProductKey Order Date CustomerKey SalesTerritoryKey SalesOrderNumber SalesOrderLineNumber OrderQuantity ;
run;
proc sort data=have2;
by ProductKey Order Date CustomerKey SalesTerritoryKey SalesOrderNumber SalesOrderLineNumber OrderQuantity ;
run;
data want;
merge have1(in=a) have2(in=b);
by ProductKey Order Date CustomerKey SalesTerritoryKey SalesOrderNumber SalesOrderLineNumber OrderQuantity ;
run;
Hi Jag,
Is it an efficient way to use the full merge and load all the data again and again on a daily basis.
Thanks
@Rahul_SAS wrote:
Hi Jag,
Is it an efficient way to use the full merge and load all the data again and again on a daily basis.
Thanks
In no way. What you want is not a merge, but a "stack", which is best accomplished by the code @yabwon gave you (it reads and writes in one sequential pass).
If you are actually doing this in a library defined for a remote DBMS, then the DBMS will perform better if you use SQL code (DELETE FROM and INSERT INTO) in an explicit passthrough. But we don't know that, because we only got Excel spreadsheets to look at.
If you need help with SAS datasets, please post them in the proper way, in data steps with datalines, so we can recreate your datasets "as is" without any second-guessing, and without all the hassle of downloading and importing potentially dangerous files that are often blocked by the firewalls of corporate sites.
If you actually want to know how to do incremental updates in Excel spreadsheets, you should ask your question in a forum dealing mainly with Excel.
Hi,
did you try some simple approach like:
data new;
set
have1(where=(date_variable < today() - 5))
have2(where=(date_variable >= today() - 5))
;
run;
?
All the best
Bart
Hi Yabwon,
Thanks for the reply.
Here I don't want to load historical data as it will take a lot of time each time I load it. While loading it, I need to remove only the last 5 days of data and replace it with new data. The source gets updated daily basis. Like Have1 has data till 27th April and Have2 has data till 28 April i.e. today so i want to remove last 5 days of data from have1 and replace it with new data using have2(source) and load it.
Thanks
To physically remove old observations, you have to rewrite the dataset. Without a complete rewrite, the observations will only be marked as deleted, and the dataset will grow permanently.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.