Hello,
I have the following problem, for which I have been unable to find a solution through searching online.
Lets suppose we have the following table ("Purchases"):
Date                 Units_Sold             Brand       Year
18/03/2010                5                   A         2010
12/04/2010                2                   A         2010
22/05/2010                1                   A         2010
25/05/2010                7                   A         2010
11/08/2011                5                   A         2011
12/07/2010                2                   B         2010
22/10/2010                1                   B         2010
05/05/2011                7                   B         2011
And the same logic continues until the end of 2014, for different brands.
What I want to do is calculate the number of Units_Sold for every Brand, in each year. However, I don't want to do it for the calendar year, but for the full year after a Brand appears (365 days).
So an example of what I don't want:
proc sql;
create table Dont_Want as
select Year, Brand, sum(Units_Sold) as Unit_per_Year
from Purchases
group by Year, Brand;
quit;
The above logic is ok if we know that e.g. Brand "A" exists throughout the whole 2010. But if Brand "A" appeared on 18/03/2010 for the first time, and exists until now, then a comparison of Years 2010 and 2011 would not be good enough since for 2010 we are "missing" 3 months.
So what I want to do is calculate the sum of Units_Sold of a full year of a Brand since its first appearance on the data-set. For instance:
for A: the sum from 18/03/2010 until 17/03/2011, then from 18/03/2011 until 17/03/2012, etc.
for B: the sum from 12/07/2010 until 11/07/2011, etc.
and so on for all Brands.
Is there a smart way of doing this?
Thanks,
Fil
How about this?
data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
proc sql;
create table startdates as
select brand, min(date) as startd
from have
group by brand;
create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, startdates b
where a.brand=b.brand
group by a.brand, YearNumber, Period;
quit;
    
proc print data=want;
run;
Edit: Of course you can replace table STARTDATES by an inline view:
proc sql;
create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, (select brand, min(date) as startd from have group by brand) b
where a.brand=b.brand
group by a.brand, YearNumber, Period;
quit;
So what your saying is anything min(Date) to min(Date) + 365. What do you want to do for those dates which are after that window?
Here is an example that sets a flag 1 for within the group, 2 without. You can then sum based on that.
data have;
  informat date ddmmyy10.;
  input date units_sold brand $ year;
  format date date9.;
datalines;
18/03/2010                5                   A         2010
12/04/2010                2                   A         2010
22/05/2010                1                   A         2010
25/05/2010                7                   A         2010
11/08/2011                5                   A         2011
;
run;
proc sql;
  create table WANT as
  select  A.*,
          case when B.DATE <= A.DATE <= intnx('day',B.DATE,365) then 1
               else 2 end as GRP
  from    WORK.HAVE A
  left join (select distinct BRAND,min(DATE) as DATE from WORK.HAVE group by BRAND) B
  on      A.BRAND=B.BRAND;
quit;
					
				
			
			
				
			
			
			
			
			
			
			
		Many thanks RW9 for the quick and accurate reply.
Your code seems to do most of the trick. However, in the scenario where we have e.g. 20 years of data, then the GRP variable does not work as it is limited to values of 1 and 2. In such a scenario I would be able to correctly sum only for the 1st year after a brand appears (GRP=1), and for all the rest the sum would not account for 365 days, but for all the rest 19 years (GRP=2).
Do you see what I mean?
Sorry, was in a meeting. You can actually setup groups very easily for any number you want. Dates are numbers of days since a specific date. Hence if you subtract the min(DATE) off any date, and then floor the result divided by your number you should get groups, i.e.
data have;
informat date ddmmyy10.;
input date units_sold brand $ year;
format date date9.;
datalines;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
19/07/2012 5 A 2011
17/01/2013 5 A 2011
;
run;
proc sql;
create table WANT as
select A.*,
floor((A.DATE - B.DATE) / 365) as GRP
from WORK.HAVE A
left join (select distinct BRAND,min(DATE) as DATE from WORK.HAVE group by BRAND) B
on A.BRAND=B.BRAND
order by A.BRAND,A.DATE;
quit;
How about this?
data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
proc sql;
create table startdates as
select brand, min(date) as startd
from have
group by brand;
create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, startdates b
where a.brand=b.brand
group by a.brand, YearNumber, Period;
quit;
    
proc print data=want;
run;
Edit: Of course you can replace table STARTDATES by an inline view:
proc sql;
create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, (select brand, min(date) as startd from have group by brand) b
where a.brand=b.brand
group by a.brand, YearNumber, Period;
quit;
Thanks FreelanceReinhard this solves my question. I consider this question resolved. I still have a lot to learn in terms of base SAS as some of the functions in your code are new to me (e.g. intck).
As an add-up is there a way to twist your code so that instead of a year, I can sum per 30, 60, or 90 days (Quarterly)?
Yes, both the INTCK and the INTNX function are very flexible and allow for a variety of interval types other than 'year', including 'day', 'week', 'month' and 'qtr' (i.e. quarter) and arbitrary multiples thereof, e.g. 'day30' for 30 days. In addition, there are options to shift the start of the interval. Please see the online help for more details.
So, for 30-day periods you would replace 'year' by 'day30' (and modify the variable names referring to "Year" correpondingly) and would get an output like this (where "Month" means 30 days):
Month Units_ Brand Number Period per_Month A 1 18/03/2010 - 16/04/2010 7 A 3 17/05/2010 - 15/06/2010 8 A 18 10/08/2011 - 08/09/2011 5 B 1 12/07/2010 - 10/08/2010 2 B 4 10/10/2010 - 08/11/2010 1 B 10 08/04/2011 - 07/05/2011 7
data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
run;
data temp;
 set have;
 by Brand;
 retain temp;
 if first.Brand then do;
  temp=Date;
 end;
 if Date gt intnx('year',temp,1,'s')-1 then temp=intnx('year',temp,1,'s');
 format temp ddmmyy10.;
run;
proc summary data=temp;
by Brand temp;
var Units_Sold ;
output out=want sum=sum_count;
run;This could be more robust .
data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
run;
data temp;
 set have;
 by Brand;
 retain temp;
 if first.Brand then temp=Date;
 if Date gt intnx('year',temp,1,'s')-1 then temp=mdy(month(temp),day(temp),year(Date));
 format temp ddmmyy10.;
run;
proc summary data=temp;
by Brand temp;
var Units_Sold ;
output out=want sum=sum_count;
run;Apologies for the late reply Ksharp
Many thanks, this seems to also do the trick and it's nice to see different versions of the solution for the same problem.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
