BookmarkSubscribeRSS Feed
india2016
Pyrite | Level 9

Hi everyone,

Actually, I am trying to solve one problem,
I have one dataset which will update daily (automatically).. (Dataset "B").
and my ETL works on that dataset. Now my new requirement is to work on only delta record.


..
requirements:

1) In that dataset one column(i.e. DATEofComplaint) will updated (on which I want to work) and I want only those observations which were updated till yesterday from lasttime ETL Runs.
even if ETL was not run for few days .

2) even if ETL runs two or more times in a day, then also data will get until yesterday ..
and then

3) I am appending output data to permanant Dataset (i.e. "F") so in that duplicate observations are not allowed and I can't use NODUP or NODUPKEY for that permanent dataset because dataset is very large.


.....................................................................................................................

 


I am exporting system date to know on which Day ETL runs last time.

(Suppose this ETL last time RUNS on 28AUG2017 then "lastETLDate" will 27AUG2017 )


STEP 1:


Data sasdata.SystemDate;
format LastETLDate date9.;
LastETLDate=today()-1;
run;

 

..............
Step 2:
then importing that date
(27AUG2017)
..............
Step 3:
Suppose today's date is 01SEP2017

data kk;
A=today()-1;
run;

A=31AUG2017
..............
Step 4:

data QueryFoRB;
set sasdata.B;
where DateofComplaint between LastETLDate and A;
run;

..

Now, what should I do to overcome the problem of appending duplicate records when ETL runs two or more time in day....
because in last step of ETL I made code that will append data to a permanant dataset which comes from Step 4.


for reference, I have attached a rough diagram of SAS EG.

 

program or any suggestion will be appreciated.

Thank you very much.

 

 

 

 

5 REPLIES 5
RahulG
Barite | Level 11

You need to store date and time in lastETLDate column.  Each would update run lastETLDatetime and  lastETLDatetime should be same for all new records.

 

Next time whenever you are want to run the ETL then find maximum lastETLDatetime.

 

Records updated after maximum lastETLDatetime should be considered.

 

 

india2016
Pyrite | Level 9

It means that I should create one column in final output with name LastETLRuns. and put a condition on that? which will give me only updated observations.

 

Am I right?

 

 

RahulG
Barite | Level 11

Yes.

 

If you intend to run only once in a day then you can live with date value and if you intend to run more than one time in a day then you should go for datetime value to differentiate exections carried out in one single day. 

LinusH
Tourmaline | Level 20
It's considered best practice to have a timestamp column in each table reflecting the last update operation. Not only when you get a requirement like this.
Data never sleeps
LinusH
Tourmaline | Level 20
If you only have daily inserts, consider to have a daily pile table separate from the total table. Accessing this would be considerably cheaper than querying the whole history.
Data never sleeps

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 998 views
  • 2 likes
  • 3 in conversation