08-23-2017 02:22 AM
I have one query, I am working on observations which are automatically updated in the Oracle database (obs will be updated in only one table and that is used in ETL) And then runs ETL on that data-set at the specific time.
I already made ETL which runs at night which takes two hours. now I want to change in ETL which runs only on newly updated Observations from last used data and give final output.
coz I have to save time by working only on those observations which are updated instead of processing the whole observations.
ETL runs at 2 AM
and ends at 4 AM
now if I am running ETL manually at 3 PM then this ETL only works on Updated observations(i.e. only on 2 AM to 3 PM data and gives output. )
..What should I do for this task.
08-23-2017 04:30 AM
Well, not sure how you would go about this in ETL as don't use it, however what I can say is this. Oracle normally has a system field for last date/time of change. You should be able to use this in the SQL you use to extract the data, so its just a matter of putting a where on your SQL to take records only where datetime of last is after your given point:
proc sql; connect to oracle...; select * from connection to oracle ( select... from... where LASTCHGTS >= "01FEB2017T12:00"dt); disconnect from oracle; quit;
08-23-2017 04:42 AM
How would Oracle or SAS knows which are the updated observations?
isn't It only depend on date variable?.
Actually, I don't know which observations will be updated.
08-23-2017 04:53 AM
As I said above, Oracle generates a date/time stamp on data when something changes in the data row. You query against that date/time to get rows which have changed in the timepoint your are interested. At least Oracle Clinical does anyways. I would ask your Oracle Administrator to provide you exact code to extract your data from your system with your requirements, as I have none of this information.
08-23-2017 08:06 AM - edited 08-23-2017 08:13 AM
I'm not sure if Oracle always creates such a datestamp as @RW9 wrote but it's certainly something which can be implemented in Oracle (i.e. using CDAC).
I guess you're not only interested in updated records but also in deleted and newly inserted ones.
Have a chat with your Oracle DBA on how to best implement for your requirement. May be everything is already set-up and ready for you, may be there are first some configuration changes on the Oracle side required.
If I would have designed your Oracle table then it would have a created date and an updated date column. Check your table if there is already something like that there. If it is then you need nothing else than implementing a control table where you maintain the last date you've run your ETL and pick-up all records with a more current date - and then update your control table with the most current date you've processed.