09-18-2012 09:29 AM
Still being rather new to DI Studio, I'd appreciate advice on a problem.
In an Extract transformation, I want to extract all rows where Last_Updated (DATETIME20.) is between the first and last date of the previous month. In other words, it needs to extract everything from August based on a datetime20 variable. How do I go about doing this?
09-18-2012 10:37 AM
There are two basic steps to take:
The DATEPART function extracts just the date portion of your datetime variable.
The INTNX function can retrieve the first day of the months based on a date value.
Although it's not necessary to cram everything into one statement, it can be done:
where ( intnx('MONTH', "&sysdate9"d, -1) <= datepart(Last_Updated) < intnx('MONTH', "&sysdate9"d, 0) );
Note that &sysdate9 does not change, but is measured once when your session begins.
09-18-2012 11:42 AM
As always, there is more than one right way do this...
For a month/year period extraction I kinda like to use formatted dates and perform an alpha match (which may work also for less/great than comparisons).
where put(date(),monyy7.) eq put(LAST_UPDATED,dtmonyy7.)
Now, what happens here:
date() returns the system date which is converted to text using the date monyy7. format = 'SEP2012'
LAST_UPDATE is converted to text using the dtmonyy7. date/time format = 'SEP2012'
And a alpha match is performed.
Or, you could use the datetime() function instead (and use date/time dtmonyy7 format), which returns the system date an time:
where put(datetime(),dtmonyy7.) eq put(LAST_UPDATED,dtmonyy7.)
Or, if you're not after an exact match, using the date yymm7. format = '2012M09'
where put(date(),yymm7.) gt put(datepart(LAST_UPDATED),yymm7.)
Notice the use of the datepart function above to convert LAST_UPDATED as the used format requires strictly a date value (not date/time).
More on SAS formats here:
More on PUT function here:
Note: the use of date()/datetime() functions or &SYSDATE9 really depends on performance and how "accurate" you want the date to be. &SYSDATE9 evaluates at the beginning of the session, date()/datetime() at the moment the call is made to this functions.
Performance wise, the macro &SYSDATE9 should be "nicer" as it is evaluated once and then is part of the code as a constant value. The function is called each time the where is evaluated. Of course this could be meaningless or meaningful depending on the data's volume.
Cheers from Portugal.
Daniel Santos @ www.cgd.pt
09-18-2012 11:48 AM
Just noticed you wanted the previous month, not the current one.
As demonstrated previously you just need to use the INTNX function.
intnx('MONTH', date(), -1)
Sorry for that.
More on the INTNX function:
09-24-2012 04:59 AM
Just want to underline the consequence of using function calls in a where clause.
As a general rule of thumb, they will prevent you from use index optimization, and will lead to a full table scan, which, of course, for large table will have a great impact on performance.
What I usually do, is to calculate a from and to datetime constants in a pre step (to the job or to the step), and store the as macro variable, in a datetime format. In the extract you will have:
last_updated between "&FromDatetime"dt and "&ToDatetime"dt
09-24-2012 05:35 AM
ANd to add to Linus very valuable statement: The code he used also allows the query to be pushed to the database.
I normally try to avoid Precode so I often use %syfunc() instead for the same purpose as Linus outlines.
09-24-2012 06:32 AM
Wow, I haven't considered that %sysfunc could be used for hiding "normal" function calls within regular SAS code.. Thanks for that tip!
09-18-2012 10:20 PM
intck('dtmonth', last_updated, datetime() ) = 1
or to not execute datetime() in every iteration
intck('dtmonth', last_updated, %sysfunc(datetime()) ) = 1
09-26-2012 05:29 AM
In extract transformation , where clause you can mention like this...
StartDate <= &last_updated < EndDate
and you can pass the last_updated date value in pre code(as this would be changing as per your requirement).