Hi Guys,
I am very new to SAS. Hope to get some advice for the following.
I currently have decades of data. And they have dates in the format of YYYYMMDD. On every date there are customers (eg. A, B, C, D, E).
What I am trying to do is to extract the observations for each customer (A,B,C,D,E) for the last 2 days of each month.
I am not sure how to get about doing this in an efficient manner.
Thanks guys!
If you only care about calendar date, here is an option:
data want;
set have;
if Your_date = intnx('month',Your_date,0,'end') or
Your_date=intnx('month',Your_date,0,'end')-1;
run;
Haikuo
If you only care about calendar date, here is an option:
data want;
set have;
if Your_date = intnx('month',Your_date,0,'end') or
Your_date=intnx('month',Your_date,0,'end')-1;
run;
Haikuo
If it is a large dataset you might want to use a where statement with a single inequality condition for efficiency, as in:
data want ;
set have ;
where Your_date > intnx('month',Your_date,0,'end') - 2 ;
run ;
The where statement can also be coded as a (where= ()) dataset option
set have
(where = (Your_date > intnx('month',Your_date,0,'end') - 2))
;
hi ... not much difference in performance between IF and WHERE any more ...
"Performance in these examples is close enough that the choice of an IF statement versus a WHERE statement versus a WHERE option is arbitrary."
from ...
Efficiency Considerations Using the SAS System
Rick Langston, SAS Institute
Point taken, Mike.
But Rick goes on to say
"However, if the subsetting is to be performed using a LIBNAME engine against
a database that is optimized for WHERE processing, then the WHERE choice is preferred. "
This is usually the case for the data I work with. I would only use a subsetting IF when the data step has to perform intermediate calculations before entire rows can be accepted or rejected.
My other objection is aesthetic. The WHERE syntax conforms to SQL standards and is in my view more readable. An IF statement without an explicit THEN can be confusing. I would rather reverse the condition to make it more explicit:
IF (reject condition is met) THEN DELETE ;
Not necessarily. Art and I had done some tests before, and for many cases, 'if' is faster than 'where', and the benefit becomes more evident when hit rate is higher and the table is larger. I believe it is due to data step enhencement on sequential processing. Random access such as 'where' may bear an edge when hit rate is less than 2-4%, and of course, when there is an index.
Haikuo
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.