- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear altruists,
I am a new SAS user.
I have the following dataset. I have also attached the SAS file for your kind consideration.
Company_ID | Announcement |
1001 | 2020-10-14 |
1001 | 2020-11-15 |
1001 | 2020-12-18 |
1001 | 2021-01-01 |
1001 | 2021-02-04 |
1001 | 2021-03-05 |
1001 | 2021-04-06 |
1001 | 2021-05-07 |
1001 | 2021-06-08 |
1001 | 2021-07-11 |
1001 | 2021-08-12 |
1001 | 2021-09-13 |
1001 | 2021-10-14 |
1001 | 2021-11-15 |
1001 | 2021-12-18 |
1002 | 2020-10-14 |
1002 | 2020-11-15 |
1002 | 2020-12-18 |
1002 | 2021-03-05 |
1002 | 2021-04-06 |
1002 | 2021-05-07 |
1002 | 2021-06-08 |
1002 | 2021-07-11 |
1002 | 2021-08-12 |
1002 | 2021-11-15 |
1002 | 2021-12-18 |
I want to count the number of observations by group in the preceding 12 calendar months. Please note that I am interested in counting not the observations in the previous 12 rows, but in the preceding 12 calendar months.
I am looking for the following output. As you will notice, the last row for Company_ID 1002 generates a value of 9.
Company_ID | Announcement | Prior_12_Month_Ann |
1001 | 2020-10-14 | 0 |
1001 | 2020-11-15 | 1 |
1001 | 2020-12-18 | 2 |
1001 | 2021-01-01 | 3 |
1001 | 2021-02-04 | 4 |
1001 | 2021-03-05 | 5 |
1001 | 2021-04-06 | 6 |
1001 | 2021-05-07 | 7 |
1001 | 2021-06-08 | 8 |
1001 | 2021-07-11 | 9 |
1001 | 2021-08-12 | 10 |
1001 | 2021-09-13 | 11 |
1001 | 2021-10-14 | 12 |
1001 | 2021-11-15 | 12 |
1001 | 2021-12-18 | 12 |
1002 | 2020-10-14 | 0 |
1002 | 2020-11-15 | 1 |
1002 | 2020-12-18 | 2 |
1002 | 2021-03-05 | 3 |
1002 | 2021-04-06 | 4 |
1002 | 2021-05-07 | 5 |
1002 | 2021-06-08 | 6 |
1002 | 2021-07-11 | 7 |
1002 | 2021-08-12 | 8 |
1002 | 2021-11-15 | 9 |
1002 | 2021-12-18 | 9 |
Thank you in advance for your kind support!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Company_ID Announcement : yymmdd10.;
format Announcement yymmdd10.;
cards;
1001 2020-10-14
1001 2020-11-15
1001 2020-12-18
1001 2021-01-01
1001 2021-02-04
1001 2021-03-05
1001 2021-04-06
1001 2021-05-07
1001 2021-06-08
1001 2021-07-11
1001 2021-08-12
1001 2021-09-13
1001 2021-10-14
1001 2021-11-15
1001 2021-12-18
1002 2020-10-14
1002 2020-11-15
1002 2020-12-18
1002 2021-03-05
1002 2021-04-06
1002 2021-05-07
1002 2021-06-08
1002 2021-07-11
1002 2021-08-12
1002 2021-11-15
1002 2021-12-18
;
proc sql;
create table want as
select *,(select count(*) from have where Company_ID=a.Company_ID and
Announcement between intnx('month',a.Announcement,-13)
and intnx('month',a.Announcement,-1,'e')) as count
from have as a;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Will there always be one and only one observation per month/company_id, or do you need to condense or fill that first?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No, there might be instances where there are more than one observation per month/company_id.
I should have clarified that previously in my sample dataset.
Thanks a lot for asking!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I got the solution, thanks to Ksharp.
Thank you so much for looking in to my problem!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A different take, using a data step and a hash object:
data want;
set have;
by company_id;
if _n_ = 1
then do;
length month number 8;
declare hash sum ();
sum.definekey("month");
sum.definedata("number");
sum.definedone();
end;
if first.company_id then rc = sum.clear();
month = intnx('month',announcement,0,"b");
rc = sum.find();
number = sum(number,1);
if rc
then rc = sum.add();
else rc = sum.replace();
month = intnx('month',announcement,-12,"b");
do until (month ge intnx('month',announcement,0,"b"));
if sum.find() = 0
then Prior_12_Month_Ann = sum(Prior_12_Month_Ann,number);
month = intnx("month",month,1,"b");
end;
drop rc number month;
run;
Since the code counts per calendar month, not per "month counted backwards from the current date", it creates slightly different numbers.
OTOH, it will be faster than SQL with larger datasets, as it avoids the sub-select.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Company_ID Announcement : yymmdd10.;
format Announcement yymmdd10.;
cards;
1001 2020-10-14
1001 2020-11-15
1001 2020-12-18
1001 2021-01-01
1001 2021-02-04
1001 2021-03-05
1001 2021-04-06
1001 2021-05-07
1001 2021-06-08
1001 2021-07-11
1001 2021-08-12
1001 2021-09-13
1001 2021-10-14
1001 2021-11-15
1001 2021-12-18
1002 2020-10-14
1002 2020-11-15
1002 2020-12-18
1002 2021-03-05
1002 2021-04-06
1002 2021-05-07
1002 2021-06-08
1002 2021-07-11
1002 2021-08-12
1002 2021-11-15
1002 2021-12-18
;
proc sql;
create table want as
select *,(select count(*) from have where Company_ID=a.Company_ID and
Announcement between intnx('month',a.Announcement,-13)
and intnx('month',a.Announcement,-1,'e')) as count
from have as a;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for your solution!
I just needed to make a small change in the following line and it perfectly worked!
Announcement between intnx('month',a.Announcement,-12)
Thanks again for your kind support!