Hi,
I am trying to set a 3-year moving window starting at a given date by the following code,
data have; infile cards dsd dlm=","; input ID $ publn_date :YYMMDD10. earliest_publn_date :YYMMDD10. ; format publn_date YYMMDDD10. earliest_publn_date YYMMDDD10. ; cards; 1,1993-09-05,1990-09-04
2,1993-09-04,1990-09-04
3,1993-09-03,1990-09-04
4,1992-09-05,1990-09-04
5,1992-09-04,1990-09-04
6,1992-09-03,1990-09-04
7,1990-09-04,1990-09-04
8,1990-09-03,1990-09-04 ;;;; run; proc sql; create table want as select distinct * from have WHERE publn_date <= intnx('year', earliest_publn_date,4) /*a 3-year moving window starting at the earliest publication date */ AND publn_date >= earliest_publn_date ; quit;
it meanings, for each observation,
if 1) 'publn_date' >= the 'earliest_publn_date' and
2) 'publn_date' <= 'earliest_publn_date' + 3-year,
then it should be remain.
By using the earliest_publn_date '1990-09-04' as an example,
1) the earliest 'publn_date' should be as same as earliest_publn_date '1990-09-04',
2) the latest 'publn_date' should be '1993-09-04', it is the 3 years after '1990-09-04',
Namely, I expect to get table want
2,1993-09-04,1990-09-04 3,1993-09-03,1990-09-04 4,1992-09-05,1990-09-04 5,1992-09-04,1990-09-04 6,1992-09-03,1990-09-04 7,1990-09-04,1990-09-04
I expect to exclude obs 8 because publn_date 1990-09-03 is earlier than earliest_publn_date 1990-09-04.
I expect to exclude obs 1 because publn_date 1993-09-04 is the 3 years after '1990-09-04', and 1993-09-05 is later than it.
I failed by using my code, could you please give me some suggestions about this?
thanks in advance.
A query with GROUP BY and HAVING clauses will keep the earliest dates together and restrict by publication date.
proc sql; create table want as select * from have group by earliest_publn_date having publn_date between earliest_publn_date and intnx('year', min(earliest_publn_date), 3, 'S') order by id ; quit;
If the data has other ids with an earliest date whose 3-year window overlaps a different ids publication date (in other words an id can be in more than one earliest publication date 3-year window) you may end up needing a self join.
I have to say that I don't understand the description of the problem and how the desired output is generated. Can you go through this in a little more detail?
@Alexxxxxxx wrote:
I have rewritten my question, please have a look
I am still lost. I don't understand. Your output data set is the same as the input data set except that rows 1 and 8 are removed. It also seems as if the term "3-year moving window" means something different to you than it does to me, but I don't know what you mean.
A query with GROUP BY and HAVING clauses will keep the earliest dates together and restrict by publication date.
proc sql; create table want as select * from have group by earliest_publn_date having publn_date between earliest_publn_date and intnx('year', min(earliest_publn_date), 3, 'S') order by id ; quit;
If the data has other ids with an earliest date whose 3-year window overlaps a different ids publication date (in other words an id can be in more than one earliest publication date 3-year window) you may end up needing a self join.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.