Hello,
I have a subset of dates and values of sales. For each data and for each point of sale, I need to sum the sales of the previous six months. The column Sellin is what I expect to have.
Example for the PV 9097947:
(dates are in Italian format)
PV | data | sale $ | Sell-in |
9097947 | 18/01/2023 | 8.400 | \ |
9097947 | 08/02/2023 | 7.200 | \ |
9097947 | 24/02/2023 | 6.750 | \ |
9097947 | 10/03/2023 | 7.800 | \ |
9097947 | 29/03/2023 | 10.575 | \ |
9097947 | 19/04/2023 | 10.875 | \ |
9097947 | 08/05/2023 | 3.750 | \ |
9097947 | 22/05/2023 | 9.000 | \ |
9097947 | 08/06/2023 | 10.950 | \ |
9097947 | 28/06/2023 | 9.750 | 75300 |
9097947 | 20/07/2023 | 10950 | 76650 |
What program can fit better what I need? I think a do loop but I don't know how to do it.
Thanks
Use a date-indexed array:
%let start = %sysfunc(inputn(1900-01-01,yymmdd10.));
%let end = %sysfunc(inputn(2099-12-31,yymmdd10.));
data want;
set have;
by pv;
array s {&start.:&end.} _temporary_;
if first.pv
then do i = &start. to &end.;
s{i} = 0;
end;
s{data} = s{data} + sales;
do i = intnx("month",data,-6,"s") to data;
sales_sum6 = sum(sales_sum6,s{i});
end;
drop i;
run;
Untested, posted from my tablet.
WOW, it works. I need two adjustments: the first one is that the sellin doesn't have to consider the sales of the day (so for example, for today it has to consider the sales from yesterday up to 6 months ago) - see the SALES_SUM6 of the last 3 rows of the table;
the other adjustment is that the variable SALES_SUM6 needS to be null if for a given date there isn't a history of at least 6 months (in our case all the dates up to 18AUG2023 need to have SALES_SUM6 null because 18AUG2023 - 180 days is 19FEB2024 and our data starts from 02MAR2023 and it is ok ) - see the first rows of the table.
See the table to understand the 2 adjustments:
PV | DATE | SELLIN | 6MONTHSAGO | SALES_SUM6 |
2011164 | 02-mar-23 | 2250 | 03SEP2022 | 0 |
2011164 | 22-mar-23 | 2325 | 23SEP2022 | 0 |
2011164 | 06-apr-23 | 2475 | 08OCT2022 | 0 |
2011164 | 21-apr-23 | 2850 | 23OCT2022 | 0 |
2011164 | 05MAY2023 | 2475 | 06-nov-22 | 0 |
2011164 | 18MAY2023 | 2400 | 19-nov-22 | 0 |
2011164 | 01JUN2023 | 2475 | 03DEC2022 | 0 |
2011164 | 15JUN2023 | 2175 | 17DEC2022 | 0 |
2011164 | 30JUN2023 | 2250 | 01JAN2023 | 0 |
2011164 | 14JUL2023 | 2475 | 15JAN2023 | 0 |
2011164 | 28JUL2023 | 2100 | 29JAN2023 | 0 |
2011164 | 18AUG2023 | 1125 | 19-feb-23 | 26250 |
2011164 | 04SEP2023 | 2025 | 08-mar-23 | 25125 |
2011164 | 20SEP2023 | 2550 | 24-mar-23 | 24825 |
I hope my explanation is clear and there is a solution ofr my requests. Thank you so much! I'm very grateful
data want;
set have;
by pv;
array s {&start.:&end.} _temporary_;
if first.pv
then do i = &start. to &end.;
s{i} = .;
end;
s{data} = sum(s{data},sales);
do i = intnx("month",data,-6,"s") to data - 1;
sales_sum6 = sum(sales_sum6,s{i});
end;
drop i;
run;
If all array elements are missing, the SUM function will always return a missing value.
And if you want to look back 180 days instead of 6 months, replace the INTNX function call with
data - 180
Keep in mind that SAS dates are counts of days.
Great! Just one last question. What type of change I need in the code below if I want to calculate for each pv and for each date the mean of the days? It seems to work but the results are wrong.
data freq;
set dataset;
by pv;
array s {&start.:&end.} _temporary_;
if first.pv
then do i = &start. to &end.;
s{i} = .;
end;
s{data} = mean(s{data},days);
do i = intnx("day",data,-180,"s") to data - 1;
freq = mean(days,s{i});
end;
format i date9.;
*drop i;
run;
Here the sample of data (here mean_days is not correct):
pv | date | days | mean_days | |
2011164 | 02-mar-23 | 03SEP2022 | . | . |
2011164 | 22-mar-23 | 23SEP2022 | 20 | 10 |
2011164 | 06-apr-23 | 08OCT2022 | 15 | 7.5 |
2011164 | 21-apr-23 | 23OCT2022 | 15 | 7.5 |
2011164 | 05MAY2023 | 06-nov-22 | 14 | 7 |
2011164 | 18MAY2023 | 19-nov-22 | 13 | 6.5 |
2011164 | 01JUN2023 | 03DEC2022 | 14 | 7 |
2011164 | 15JUN2023 | 17DEC2022 | 14 | 7 |
2011164 | 30JUN2023 | 01JAN2023 | 15 | 7.5 |
2011164 | 14JUL2023 | 15JAN2023 | 14 | 7 |
2011164 | 28JUL2023 | 29JAN2023 | 14 | 7 |
2011164 | 18AUG2023 | 19-feb-23 | 21 | 10.5 |
2011164 | 04SEP2023 | 08-mar-23 | 17 | 8.5 |
Thanks in advance.
Please define in plain language what you mean by "mean of days".
"Days" are the days between each date of sale and the previous one. I need to understand the average number of days that pass between one sale and another for the PV, still considering the last six months for each date of sale.
Here what I expect to have:
pv | date | date - 180 days | days | mean_days WRONG | mean days CORRECT |
2011164 | 02-mar-23 | 03-set-22 | 0 | . | 0 |
2011164 | 22-mar-23 | 23-set-22 | 20 | 10 | 0 |
2011164 | 06-apr-23 | 08-ott-22 | 15 | 7.5 | 10 |
2011164 | 21-apr-23 | 23-ott-22 | 15 | 7.5 | 11,66666667 |
2011164 | 05-mag-23 | 06-nov-22 | 14 | 7 | 12,5 |
2011164 | 18-mag-23 | 19-nov-22 | 13 | 6.5 | 12,8 |
2011164 | 01-giu-23 | 03-dic-22 | 14 | 7 | 12,83333333 |
2011164 | 15-giu-23 | 17-dic-22 | 14 | 7 | 13 |
2011164 | 30-giu-23 | 01-gen-23 | 15 | 7.5 | 13,125 |
2011164 | 14-lug-23 | 15-gen-23 | 14 | 7 | 13,33333333 |
2011164 | 28-lug-23 | 29-gen-23 | 14 | 7 | 13,4 |
2011164 | 18-ago-23 | 19-feb-23 | 21 | 10.5 | 13,45454545 |
2011164 | 04-set-23 | 08-mar-23 | 17 | 8.5 | 15,36363636 |
2011164 | 20-set-23 | 24-mar-23 | 16 | 8 | 15,09090909 |
I don't get your formula. In observation 3, you would have 2 intervals of 20 and 15 days, giving a mean of (20 + 15) / 2 = 17.5.
You need to create a second array for the intervals:
data want;
set have;
by pv;
array s {&start.:&end.} _temporary_;
array i {&start.:&end.} _temporary_;
_l = dif(data);
if first.pv
then do _i = &start. to &end.;
s{_i} = .;
i{_i} = .;
end;
else i{data} = _l;
s{data} = sum(s{data},sales);
_c = 0;
_i_sum = 0;
do _i = data - 180 to data - 1;
sales_sum6 = sum(sales_sum6,s{_i});
if i{_i} ne .
then do;
_c = _c + 1;
_i_sum = _i_sum + i{_i};
mean_days = _i_sum / _c;
end;
drop _i _l _c _i_sum;
run;
And if you want us to test code against your data, post it in a working DATA step with DATALINES which creates the dataset.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.