Dear SAS Community:
I have a list of people making predictions about the value of a product and I would like to compute the average and median prediction at the end of each month for all the predictions issued for the last 90 days. Also if there are multiple predictions from 1 person, I just want to keep the last one.
data have;
infile datalines dsd truncover;
input ProductID:$8. Date:YYMMDDN8. PersonID:Z6. Prediction:32.;
format Date YYMMDDN8. PersonID Z6.;
datalines4;
87482X10,20140309,119962,16
87482X10,20140310,079092,16
87482X10,20140310,080474,16
87482X10,20140310,071182,16
87482X10,20140407,079092,16.5
87482X10,20140807,119962,17
87482X10,20140808,079092,17
87482X10,20141105,080474,17
87482X10,20141203,072481,16
87482X10,20150407,079092,17.5
87482X10,20150430,072481,17
87482X10,20150430,119962,18
87482X10,20150430,080474,18
87482X10,20150430,071182,18
87482X10,20150701,079092,20
87482X10,20150730,072481,18
87482X10,20150730,080474,20
87482X10,20150730,071182,19
87482X10,20150826,080474,19
87482X10,20150930,119962,17.5
87482X10,20151002,072481,17.5
87482X10,20151006,079092,19
87482X10,20151029,119962,18.5
87482X10,20151029,080474,20
87482X10,20151030,072481,18
87482X10,20151030,079092,19.5
87482X10,20151204,080474,21
87482X10,20151228,119962,19.5
87482X10,20151229,072481,19
87482X10,20151229,071182,21
87482X10,20160106,079092,21
87482X10,20160126,071182,16
87482X10,20160127,119962,20
87482X10,20160302,080474,21.5
87482X10,20160428,119962,22
87482X10,20160428,080474,22.5
87482X10,20160727,119962,23
87482X10,20160728,080474,24
;;;;
I know I can use procedure means but with this procedure but am not sure how to compute the forecast for the last 90 days at the end of the month.
proc sort data=have;
by productid date value estimid prediction personID;
run;
data have1; set have;
by productid date value estimid prediction personID;
if last.personID;
run;
proc means data=have2 noprint;
by productid date;
var /*prediction */ new_prediction;
output out= average (drop=_type_ _freq_)
mean=average n=numpredictions;
run;
Thanks!
If I understand you correctly, you want to calculate statistics on prediction data 90 days back for each "end of month" date that is present in your data? Am I correct?
Yes that is correct. I would like to have monthly predictions which would mean predictions at the end of the month.
I know I can use procedure means but with this procedure but am not sure how to compute the forecast for the last 90 days at the end of the month.
Use a WHERE statement and run PROC MEANS twice, once to get the means of each month and then once for the 90 days.
Or create your own custom format for 90 day periods and use the multi-label format option available in PROC MEANS. If it was a fixed quarter you could use the Quarter format for PROC MEANS.
Thanks could you please let me know the code with the where statement.
Here is the idea. For each product productid I would like to compute the average prediction for the different personID. So the average is not across time but across personIDs. Each person may have different forecasts over the last 90 days but I take the most recent forecast. Then using calendar month end dates I calculate the average over personIDs.
productid | YYMMDDN8 | personID | prediction |
1 | 1/1/18 | 01 | 10 |
1 | 1/14/18 | 02 | 9 |
1 | 1/15/18 | 01 | 11 |
1 | 1/20/18 | 02 | 8 |
1 | 1/22/18 | 03 | 5 |
Average | 8=average(11,8,5) |
@Agent1592 wrote:
Thanks could you please let me know the code with the where statement.
Here is the idea. For each product productid I would like to compute the average prediction for the different personID. So the average is not across time but across personIDs. Each person may have different forecasts over the last 90 days but I take the most recent forecast. Then using calendar month end dates I calculate the average over personIDs.
productid YYMMDDN8 personID prediction 1 1/1/18 01 10 1 1/14/18 02 9 1 1/15/18 01 11 1 1/20/18 02 8 1 1/22/18 03 5 Average 8=average(11,8,5)
How are you defining 90 days? From the last day in the data, from a specific time start? From the last three months?
I want the average prediction as of lets say 1/31/2018 so I would take the last forecast if it is not older than 90 days before 1/31/2018 (i.e. before the end of the month).
@Agent1592 Show one proper complete example of data and your computed value for one product defining how you computed the 90 days period. So far it's not clear.
productid | date | personID | prediction | End of Month | Average |
87482X10 | 20140309 | 119962 | 16 | ||
87482X10 | 20140310 | 79092 | 16 | ||
87482X10 | 20140310 | 80474 | 16 | ||
87482X10 | 20140310 | 71182 | 16 | 3/31/2014 | 16 |
87482X10 | 20140407 | 79092 | 16.5 | 4/30/2014 | 16.13 |
5/31/2014 | 16.13 | ||||
6/30/2014 | 16.13 |
The idea is to compute the 90 day average prediction before the end of the month for each product id. S0 if we want to calculate the average for 04/30/2014 we take all the most recent predictions for each analyst as from 01/30/2014 to 04/30/2014. The average is 16.13. You see personID=79092 gave 2 predictions within this time but we take his most recent prediction of 16.5.
Hi @Agent1592 I'm afraid if you could get your focus into fine attention to details in your requirement, this problem is not that challenging subject to you giving us "well detailed requirement with neat with good representative samples of HAVE and WANT like @MarkWik suggested.
Anyway, for me SAS coding is very therapeutic, and here is my solution. However, we may have to work interactively back and forth meaning if you can be responsive enough to help me understand your req to test effectively, I'm willing to help you with the changes that my current approach may require.
data have;
input productid : $15. date :yymmdd10. personID : $15. prediction;
format date yymmdd10.;
datalines;
87482X10 20140309 119962 16
87482X10 20140310 79092 16
87482X10 20140310 80474 16
87482X10 20140310 71182 16
87482X10 20140407 79092 16.5
;
/*Before the following step, you MUST filter with last.personid after sorting by ascending date to make sure you only take the last prediciton of personid */
data temp;
set have;
by productid;
k=lag(date);
if first.productid then _grp=1;
else if intck('month',k,date)>=1 then _grp+1;
drop k;
run;
data want;
if _n_=1 then do;
if 0 then set temp(rename=(date=_date prediction=_prediction));
dcl hash h(dataset: 'temp(rename=(date=_date prediction=_prediction))', ordered: 'a',multidata:'y');
h.definekey('productid');
h.definedata('_date','_prediction');
h.definedone();
end;
array _temp(1000);
call missing(of _temp(*));
do until(last.productid);
set temp;
by productid _grp;
if last._grp then
do;
_n=0;
do while(h.do_over(key:productid) eq 0);
if intnx('week',date,-90)<=_date<=date then
do;
_n+1;
_temp(_n)=_prediction;
end;
end;
_sum=sum(of _temp(*));
average=divide(_sum,_n);
end;
output;
end;
drop _:;
run;
@novinosrin Thanks for your reply. Basically I am trying to create a monthly moving average at the end of the month for all the predictions made in the last 90 days before the end of the month.
data want1; set want;
format date date9.;
MEND = intnx('month',date,1)-1;
format MEND date9.;
MEND_90=intnx('day',MEND,-90);
format MEND_90 date9.;
run;
So the moving average will include all predictions but if there are multiple predictions I just need to take the last one (last.personID)
0<date<=MEND_90).
Is it possible to do this easier with a proc expand in SAS?
I have never really used SAS ETS module and its procs. Maybe you should post the question SAS forecasting and econometric board rather than programming board. An ETS expert might just help you out
It is quite easy to program however rather hard to maintain unless the person who asks help here is also at the same level to comprehend the programs.
PROC EXPAND can't handle the multiple observations, you can first clean that up with TIMESERIES or a data step and then use PROC EXPAND.
This is 12 months and STD but you can modify it for 3 months or 90 days (they are different) and the Mean instead.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.