BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Filippos
Fluorite | Level 6

Hello,

 

I have the following problem, for which I have been unable to find a solution through searching online.

 

Lets suppose we have the following table ("Purchases"):

 

Date                 Units_Sold             Brand       Year
18/03/2010                5                   A         2010
12/04/2010                2                   A         2010
22/05/2010                1                   A         2010
25/05/2010                7                   A         2010
11/08/2011                5                   A         2011
12/07/2010                2                   B         2010
22/10/2010                1                   B         2010
05/05/2011                7                   B         2011

 

And the same logic continues until the end of 2014, for different brands.

What I want to do is calculate the number of Units_Sold for every Brand, in each year. However, I don't want to do it for the calendar year, but for the full year after a Brand appears (365 days).

 

So an example of what I don't want:

 

proc sql;
create table Dont_Want as
select Year, Brand, sum(Units_Sold) as Unit_per_Year
from Purchases
group by Year, Brand;
quit;

 

The above logic is ok if we know that e.g. Brand "A" exists throughout the whole 2010. But if Brand "A" appeared on 18/03/2010 for the first time, and exists until now, then a comparison of Years 2010 and 2011 would not be good enough since for 2010 we are "missing" 3 months.

 

So what I want to do is calculate the sum of Units_Sold of a full year of a Brand since its first appearance on the data-set. For instance:

 

for A: the sum from 18/03/2010 until 17/03/2011, then from 18/03/2011 until 17/03/2012, etc.

for B: the sum from 12/07/2010 until 11/07/2011, etc.

and so on for all Brands. 

 

Is there a smart way of doing this?

 

Thanks,

Fil

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

How about this?

data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;

proc sql;
create table startdates as
select brand, min(date) as startd
from have
group by brand;

create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, startdates b
where a.brand=b.brand
group by a.brand, YearNumber, Period;

quit;
    
proc print data=want;
run;

 

Edit: Of course you can replace table STARTDATES by an inline view:

 

proc sql;
create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, (select brand, min(date) as startd from have group by brand) b
where a.brand=b.brand
group by a.brand, YearNumber, Period;
quit;

 

View solution in original post

10 REPLIES 10
RW9
Diamond | Level 26 RW9
Diamond | Level 26

So what your saying is anything min(Date) to min(Date) + 365.  What do you want to do for those dates which are after that window?

Here is an example that sets a flag 1 for within the group, 2 without.  You can then sum based on that.  

data have;
  informat date ddmmyy10.;
  input date units_sold brand $ year;
  format date date9.;
datalines;
18/03/2010                5                   A         2010
12/04/2010                2                   A         2010
22/05/2010                1                   A         2010
25/05/2010                7                   A         2010
11/08/2011                5                   A         2011
;
run;

proc sql;
  create table WANT as
  select  A.*,
          case when B.DATE <= A.DATE <= intnx('day',B.DATE,365) then 1
               else 2 end as GRP
  from    WORK.HAVE A
  left join (select distinct BRAND,min(DATE) as DATE from WORK.HAVE group by BRAND) B
  on      A.BRAND=B.BRAND;
quit;
Filippos
Fluorite | Level 6

Many thanks RW9 for the quick and accurate reply. 

 

Your code seems to do most of the trick. However, in the scenario where we have e.g. 20 years of data, then the GRP variable does not work as it is limited to values of 1 and 2. In such a scenario I would be able to correctly sum only for the 1st year after a brand appears (GRP=1), and for all the rest the sum would not account for 365 days, but for all the rest 19 years (GRP=2).

 

Do you see what I mean?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, was in a meeting.  You can actually setup groups very easily for any number you want.  Dates are numbers of days since a specific date.  Hence if you subtract the min(DATE) off any date, and then floor the result divided by your number you should get groups, i.e. 

data have;
informat date ddmmyy10.;
input date units_sold brand $ year;
format date date9.;
datalines;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
19/07/2012 5 A 2011
17/01/2013 5 A 2011
;
run;
proc sql;
create table WANT as
select A.*,
floor((A.DATE - B.DATE) / 365) as GRP
from WORK.HAVE A
left join (select distinct BRAND,min(DATE) as DATE from WORK.HAVE group by BRAND) B
on A.BRAND=B.BRAND
order by A.BRAND,A.DATE;
quit;

 

FreelanceReinh
Jade | Level 19

How about this?

data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;

proc sql;
create table startdates as
select brand, min(date) as startd
from have
group by brand;

create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, startdates b
where a.brand=b.brand
group by a.brand, YearNumber, Period;

quit;
    
proc print data=want;
run;

 

Edit: Of course you can replace table STARTDATES by an inline view:

 

proc sql;
create table want as
select a.brand, intck('year', b.startd, a.date, 'c')+1 as YearNumber,
       catx(' ', put(intnx('year', b.startd, calculated YearNumber-1, 's'), ddmmyy10.), '-',
                 put(intnx('year', b.startd, calculated YearNumber, 's')-1, ddmmyy10.)) as Period length=23,
       sum(units_sold) as Units_per_Year
from have a, (select brand, min(date) as startd from have group by brand) b
where a.brand=b.brand
group by a.brand, YearNumber, Period;
quit;

 

Filippos
Fluorite | Level 6

Thanks  this solves my question. I consider this question resolved. I still have a lot to learn in terms of base SAS as some of the functions in your code are new to me (e.g. intck).

 

As an add-up is there a way to twist your code so that instead of a year, I can sum per 30, 60, or 90 days (Quarterly)?

FreelanceReinh
Jade | Level 19

Yes, both the INTCK and the INTNX function are very flexible and allow for a variety of interval types other than 'year', including 'day', 'week', 'month' and 'qtr' (i.e. quarter) and arbitrary multiples thereof, e.g. 'day30' for 30 days. In addition, there are options to shift the start of the interval. Please see the online help for more details.

 

So, for 30-day periods you would replace 'year' by 'day30' (and modify the variable names referring to "Year" correpondingly) and would get an output like this (where "Month" means 30 days):

          Month                                 Units_
Brand    Number            Period             per_Month

  A         1      18/03/2010 - 16/04/2010        7
  A         3      17/05/2010 - 15/06/2010        8
  A        18      10/08/2011 - 08/09/2011        5
  B         1      12/07/2010 - 10/08/2010        2
  B         4      10/10/2010 - 08/11/2010        1
  B        10      08/04/2011 - 07/05/2011        7
Filippos
Fluorite | Level 6
Amazing, many thanks!
Ksharp
Super User
data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
run;

data temp;
 set have;
 by Brand;
 retain temp;
 if first.Brand then do;
  temp=Date;
 end;
 if Date gt intnx('year',temp,1,'s')-1 then temp=intnx('year',temp,1,'s');
 format temp ddmmyy10.;
run;
proc summary data=temp;
by Brand temp;
var Units_Sold ;
output out=want sum=sum_count;
run;
Ksharp
Super User

This could be more robust .

 

data have;
input Date :ddmmyy10. Units_Sold Brand $ Year;
format Date ddmmyy10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
run;

data temp;
 set have;
 by Brand;
 retain temp;
 if first.Brand then temp=Date;
 if Date gt intnx('year',temp,1,'s')-1 then temp=mdy(month(temp),day(temp),year(Date));
 format temp ddmmyy10.;
run;
proc summary data=temp;
by Brand temp;
var Units_Sold ;
output out=want sum=sum_count;
run;
Filippos
Fluorite | Level 6

Apologies for the late reply Ksharp

 

Many thanks, this seems to also do the trick and it's nice to see different versions of the solution for the same problem.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1824 views
  • 4 likes
  • 4 in conversation