Hi everyone,
Actually, I am trying to solve one problem,
I have one dataset which will update daily (automatically).. (Dataset "B").
and my ETL works on that dataset. Now my new requirement is to work on only delta record.
..
requirements:
1) In that dataset one column(i.e. DATEofComplaint) will updated (on which I want to work) and I want only those observations which were updated till yesterday from lasttime ETL Runs.
even if ETL was not run for few days .
2) even if ETL runs two or more times in a day, then also data will get until yesterday ..
and then
3) I am appending output data to permanant Dataset (i.e. "F") so in that duplicate observations are not allowed and I can't use NODUP or NODUPKEY for that permanent dataset because dataset is very large.
.....................................................................................................................
I am exporting system date to know on which Day ETL runs last time.
(Suppose this ETL last time RUNS on 28AUG2017 then "lastETLDate" will 27AUG2017 )
STEP 1:
Data sasdata.SystemDate;
format LastETLDate date9.;
LastETLDate=today()-1;
run;
..............
Step 2:
then importing that date
(27AUG2017)
..............
Step 3:
Suppose today's date is 01SEP2017
data kk;
A=today()-1;
run;
A=31AUG2017
..............
Step 4:
data QueryFoRB;
set sasdata.B;
where DateofComplaint between LastETLDate and A;
run;
..
Now, what should I do to overcome the problem of appending duplicate records when ETL runs two or more time in day....
because in last step of ETL I made code that will append data to a permanant dataset which comes from Step 4.
for reference, I have attached a rough diagram of SAS EG.
program or any suggestion will be appreciated.
Thank you very much.
You need to store date and time in lastETLDate column. Each would update run lastETLDatetime and lastETLDatetime should be same for all new records.
Next time whenever you are want to run the ETL then find maximum lastETLDatetime.
Records updated after maximum lastETLDatetime should be considered.
It means that I should create one column in final output with name LastETLRuns. and put a condition on that? which will give me only updated observations.
Am I right?
Yes.
If you intend to run only once in a day then you can live with date value and if you intend to run more than one time in a day then you should go for datetime value to differentiate exections carried out in one single day.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.