DATA Step, Macro, Functions and more

Aggregating for 365 Days, and not for Full Calendar Year

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Aggregating for 365 Days, and not for Full Calendar Year

[ Edited ]

Hello,

 

I have the following problem, for which I have been unable to find a solution through searching online.

 

Lets suppose we have the following table ("Purchases"):

 

Date                 Units_Sold             Brand       Year
18/03/2010                5                   A         2010
12/04/2010                2                   A         2010
22/05/2010                1                   A         2010
25/05/2010                7                   A         2010
11/08/2011                5                   A         2011
12/07/2010                2                   B         2010
22/10/2010                1                   B         2010
05/05/2011                7                   B         2011

 

And the same logic continues until the end of 2014, for different brands.

What I want to do is calculate the number of Units_Sold for every Brand, in each year. However, I don't want to do it for the calendar year, but for the full year after a Brand appears (365 days).

 

So an example of what I don't want:

 

proc sql;
create table Dont_Want as
select Year, Brand, sum(Units_Sold) as Unit_per_Year
from Purchases
group by Year, Brand;
quit;

 

The above logic is ok if we know that e.g. Brand "A" exists throughout the whole 2010. But if Brand "A" appeared on 18/03/2010 for the first time, and exists until now, then a comparison of Years 2010 and 2011 would not be good enough since for 2010 we are "missing" 3 months.

 

So what I want to do is calculate the sum of Units_Sold of a full year of a Brand since its first appearance on the data-set. For instance:

 

for A: the sum from 18/03/2010 until 17/03/2011, then from 18/03/2011 until 17/03/2012, etc.

for B: the sum from 12/07/2010 until 11/07/2011, etc.

and so on for all Brands. 

 

Is there a smart way of doing this?

 

Thanks,

Fil


Accepted Solutions
Solution
‎01-05-2016 09:59 AM
Trusted Advisor
Posts: 1,117

Re: Aggregating for 365 Days, and not for Full Calendar Year

[ Edited ]

How about this?

data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;

proc sql;
create table startdates as
select brand, min(date) as startd
from have
group by brand;

create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, startdates b
where a.brand=b.brand
group by a.brand, YearNumber, Period;

quit;
    
proc print data=want;
run;

 

Edit: Of course you can replace table STARTDATES by an inline view:

 

proc sql;
create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, (select brand, min(date) as startd from have group by brand) b
where a.brand=b.brand
group by a.brand, YearNumber, Period;
quit;

 

View solution in original post


All Replies
Super User
Super User
Posts: 7,942

Re: Aggregating for 365 Days, and not for Full Calendar Year

So what your saying is anything min(Date) to min(Date) + 365.  What do you want to do for those dates which are after that window?

Here is an example that sets a flag 1 for within the group, 2 without.  You can then sum based on that.  

data have;
  informat date ddmmyy10.;
  input date units_sold brand $ year;
  format date date9.;
datalines;
18/03/2010                5                   A         2010
12/04/2010                2                   A         2010
22/05/2010                1                   A         2010
25/05/2010                7                   A         2010
11/08/2011                5                   A         2011
;
run;

proc sql;
  create table WANT as
  select  A.*,
          case when B.DATE <= A.DATE <= intnx('day',B.DATE,365) then 1
               else 2 end as GRP
  from    WORK.HAVE A
  left join (select distinct BRAND,min(DATE) as DATE from WORK.HAVE group by BRAND) B
  on      A.BRAND=B.BRAND;
quit;
Occasional Contributor
Posts: 5

Re: Aggregating for 365 Days, and not for Full Calendar Year

Many thanks RW9 for the quick and accurate reply. 

 

Your code seems to do most of the trick. However, in the scenario where we have e.g. 20 years of data, then the GRP variable does not work as it is limited to values of 1 and 2. In such a scenario I would be able to correctly sum only for the 1st year after a brand appears (GRP=1), and for all the rest the sum would not account for 365 days, but for all the rest 19 years (GRP=2).

 

Do you see what I mean?

Super User
Super User
Posts: 7,942

Re: Aggregating for 365 Days, and not for Full Calendar Year

Sorry, was in a meeting.  You can actually setup groups very easily for any number you want.  Dates are numbers of days since a specific date.  Hence if you subtract the min(DATE) off any date, and then floor the result divided by your number you should get groups, i.e. 

data have;
informat date ddmmyy10.;
input date units_sold brand $ year;
format date date9.;
datalines;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
19/07/2012 5 A 2011
17/01/2013 5 A 2011
;
run;
proc sql;
create table WANT as
select A.*,
floor((A.DATE - B.DATE) / 365) as GRP
from WORK.HAVE A
left join (select distinct BRAND,min(DATE) as DATE from WORK.HAVE group by BRAND) B
on A.BRAND=B.BRAND
order by A.BRAND,A.DATE;
quit;

 

Solution
‎01-05-2016 09:59 AM
Trusted Advisor
Posts: 1,117

Re: Aggregating for 365 Days, and not for Full Calendar Year

[ Edited ]

How about this?

data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;

proc sql;
create table startdates as
select brand, min(date) as startd
from have
group by brand;

create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, startdates b
where a.brand=b.brand
group by a.brand, YearNumber, Period;

quit;
    
proc print data=want;
run;

 

Edit: Of course you can replace table STARTDATES by an inline view:

 

proc sql;
create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, (select brand, min(date) as startd from have group by brand) b
where a.brand=b.brand
group by a.brand, YearNumber, Period;
quit;

 

Occasional Contributor
Posts: 5

Re: Aggregating for 365 Days, and not for Full Calendar Year

[ Edited ]
Posted in reply to FreelanceReinhard

Thanks  this solves my question. I consider this question resolved. I still have a lot to learn in terms of base SAS as some of the functions in your code are new to me (e.g. intck).

 

As an add-up is there a way to twist your code so that instead of a year, I can sum per 30, 60, or 90 days (Quarterly)?

Trusted Advisor
Posts: 1,117

Re: Aggregating for 365 Days, and not for Full Calendar Year

Yes, both the INTCK and the INTNX function are very flexible and allow for a variety of interval types other than 'year', including 'day', 'week', 'month' and 'qtr' (i.e. quarter) and arbitrary multiples thereof, e.g. 'day30' for 30 days. In addition, there are options to shift the start of the interval. Please see the online help for more details.

 

So, for 30-day periods you would replace 'year' by 'day30' (and modify the variable names referring to "Year" correpondingly) and would get an output like this (where "Month" means 30 days):

          Month                                 Units_
Brand    Number            Period             per_Month

  A         1      18/03/2010 - 16/04/2010        7
  A         3      17/05/2010 - 15/06/2010        8
  A        18      10/08/2011 - 08/09/2011        5
  B         1      12/07/2010 - 10/08/2010        2
  B         4      10/10/2010 - 08/11/2010        1
  B        10      08/04/2011 - 07/05/2011        7
Occasional Contributor
Posts: 5

Re: Aggregating for 365 Days, and not for Full Calendar Year

Posted in reply to FreelanceReinhard
Amazing, many thanks!
Super User
Posts: 10,018

Re: Aggregating for 365 Days, and not for Full Calendar Year

data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
run;

data temp;
 set have;
 by Brand;
 retain temp;
 if first.Brand then do;
  temp=Date;
 end;
 if Date gt intnx('year',temp,1,'s')-1 then temp=intnx('year',temp,1,'s');
 format temp ddmmyy10.;
run;
proc summary data=temp;
by Brand temp;
var Units_Sold ;
output out=want sum=sum_count;
run;
Super User
Posts: 10,018

Re: Aggregating for 365 Days, and not for Full Calendar Year

This could be more robust .

 

data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
run;

data temp;
 set have;
 by Brand;
 retain temp;
 if first.Brand then temp=Date;
 if Date gt intnx('year',temp,1,'s')-1 then temp=mdy(month(temp),day(temp),year(Date));
 format temp ddmmyy10.;
run;
proc summary data=temp;
by Brand temp;
var Units_Sold ;
output out=want sum=sum_count;
run;
Occasional Contributor
Posts: 5

Re: Aggregating for 365 Days, and not for Full Calendar Year

Apologies for the late reply Ksharp

 

Many thanks, this seems to also do the trick and it's nice to see different versions of the solution for the same problem.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 335 views
  • 4 likes
  • 4 in conversation