BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Filipvdr
Pyrite | Level 9

Hello,

My dataset looks likes this:

DATE           MONTH      VALUE

01032011      MAR2011    8.50

02032011      MAR2011    8.50

02032011      MAR2011    5.50

02032011      MAR2011    8.50

14032011      MAR2011    7.50

24032011      MAR2011    9.00

24042011      APR2011     8.50

24042011      APR2011     8.50

26042011      APR2011     2.50

24042011      APR2011     3.50

16052011      MAY2011     2.50

24052011      MAY2011     13.50

25052011      MAY2011     4.50

26052011      MAY2011     3.50

27052011      MAY2011     2.50

25062011      JUN2011     10.50

26062011      JUN2011     13.50

27062011      JUN2011     12.50

28062011      JUN2011     13.50

According to 2 macro variables i have to calculate floating averages/median/percentile.

%let calc = average;

%let bin = 3;

The number of bin is in months. So for this example if i want to calculate the number for JUN2011 the number has to be calculated on the values of APR , MAY and JUN. If the variable bin is 2 i have to calculate JUN on JUN and MAY , MAY on APR and MAY and so on..

Could anyone set me on my way? I have no idea how to start on this one.

Regards,

Filip

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Actually, I have another way. If you don't mind.

data x(drop=month);
input DATE  : ddmmyy12.     MONTH  : monyy5.    VALUE ;
format date date9. month monyy5.;
cards;
01032011      MAR2011    8.50
02032011      MAR2011    8.50
02032011      MAR2011    5.50
02032011      MAR2011    8.50
14032011      MAR2011    7.50
24032011      MAR2011    9.00
24042011      APR2011     8.50
24042011      APR2011     8.50
26042011      APR2011     2.50
24042011      APR2011     3.50
16052011      MAY2011     2.50
24052011      MAY2011     13.50
25052011      MAY2011     4.50
26052011      MAY2011     3.50
27052011      MAY2011     2.50
25062011      JUN2011     10.50
26062011      JUN2011     13.50
27062011      JUN2011     12.50
28062011      JUN2011     13.50
;
run;
data x;
 set x;
 monyy=mdy(month(date),1,year(date));
 format monyy monyy5.;
run;




%macro cal(monyy=);
data xx;
 set x(where=(monyy between intnx('month',&monyy,-&bin+1,'b') and &monyy));
 month=put(&monyy,monyy5.);
run;
proc summary data=xx nway;
 class month;
 var value;
 output out=result(drop=_:)  mean= median= p75= /autoname;
run;
%mend cal;


%let bin = 3;

proc sort data=x(keep=monyy) out=temp nodupkey;by monyy;run;
data _null_;
 set temp;
 call execute('%cal(monyy='||strip(monyy)||')');
 if _n_ eq 1 then call execute('data final;set result;stop;run;');
 call execute('proc append base=final data=result force;run;');
run;

 


Ksharp

View solution in original post

18 REPLIES 18
Ksharp
Super User

What output do you want?

DeepakNMehta
Calcite | Level 5

Do you want average for all the previous months (as value in bin macro variable) for every row in a datamart/datalines?

Filipvdr
Pyrite | Level 9

Output would be month + average (could be median or percentile as well)

Also it could be a dataset with more then this 4 months but 4 years for example.

so in this case (numers are not correct).

Month        Value

MAR2011   5                will have an average only of March because no data available for JAN and FEB

APR2011   10                will have an average of March and Apr together

MAY2011   11                will have an average of March and Apr and May together

JUN2011    12                will have an average of Apr and May and Juntogether

Ksharp
Super User

OK. Try this.

data x;
input DATE  : ddmmyy12.     MONTH  : monyy5.    VALUE ;
format date date9. month monyy5.;
cards;
01032011      MAR2011    8.50
02032011      MAR2011    8.50
02032011      MAR2011    5.50
02032011      MAR2011    8.50
14032011      MAR2011    7.50
24032011      MAR2011    9.00
24042011      APR2011     8.50
24042011      APR2011     8.50
26042011      APR2011     2.50
24042011      APR2011     3.50
16052011      MAY2011     2.50
24052011      MAY2011     13.50
25052011      MAY2011     4.50
26052011      MAY2011     3.50
27052011      MAY2011     2.50
25062011      JUN2011     10.50
26062011      JUN2011     13.50
27062011      JUN2011     12.50
28062011      JUN2011     13.50
;
run;
%let bin=3;
proc sql ;
create table want(drop=n) as
 select month,(select avg(value)  from x where month between intnx('month',a.month,-&bin,'s') and a.month) as mean,count(*) as n
  from x as a
   group by month;
quit;

Ksharp

Filipvdr
Pyrite | Level 9

Ksharp, thanks a lot. This will help me a lot already.

A question that pops into my mind: how to do percentile 70 for example in sql? Median is just an other function but does their exist a percentile function as well? 

Filipvdr
Pyrite | Level 9

Unfortunate.. something is going wrong when I test it on my data:

This is the table want (with N kept). I get for APR2011 106 rows=n? With a mean which is always different.. any idea?

APR2011 12.382727273 106

APR2011 12.300052083 106

APR2011 12.17015873 106

APR2011 12.363972603 106

APR2011 12.268622754 106

APR2011 11.871 106

APR2011 12.1575 106

APR2011 12.1575 106

APR2011 12.241363636 106

APR2011 11.871 106

APR2011 12.426490385 106

APR2011 11.968776978 106

APR2011 12.219783784 106

APR2011 12.27355 106

APR2011 12.23695122 106

APR2011 11.886891892 106

EDIT: I think the problem is that, in comparison with your datalines the real numberic value of month is different here..

I tried to replace it with month_char but it does not work

Ksharp
Super User

It is obvious that your real data is different from me.

Are your data(APR2011) the real date value ?

I use monyy. input APR2011 that means 01APR2011, so in my example all the APR2011 has the same value 01APR2011.

and I guess you should need to create a new variable like mine.

month=mdy(month(date),1,year(date));

Ksharp

Ksharp
Super User

Opps. for your situation.

You should change %let bin=3 into %let bin=2 , because you only need two month back.

Ksharp
Super User

Sorry. You can try this.

data x;
input DATE  : ddmmyy12.     MONTH  : monyy5.    VALUE ;
format date date9. month monyy5.;
cards;
01032011      MAR2011    8.50
02032011      MAR2011    8.50
02032011      MAR2011    5.50
02032011      MAR2011    8.50
14032011      MAR2011    7.50
24032011      MAR2011    9.00
24042011      APR2011     8.50
24042011      APR2011     8.50
26042011      APR2011     2.50
24042011      APR2011     3.50
16052011      MAY2011     2.50
24052011      MAY2011     13.50
25052011      MAY2011     4.50
26052011      MAY2011     3.50
27052011      MAY2011     2.50
25062011      JUN2011     10.50
26062011      JUN2011     13.50
27062011      JUN2011     12.50
28062011      JUN2011     13.50
;
run;
%let bin=2;
proc sql ;
create table want as
 select distinct month,(select mean(value) as mean  from x where month between intnx('month',a.month,-&bin,'b') and a.month) as mean
  from x as a ;
quit;

Ksharp

Keith
Obsidian | Level 7

I had the idea of creating multilabel formats from the dates in the data, this will enable you to use certain SAS procedures (SUMMARY, MEANS, TABULATE, REPORT) to calculate the measures of interest.  This means you have access to the percentiles 1,5,10,25,50,75,90,95,99.  You would normally use PROC UNIVARIATE to calculate other percentiles, but unfortunately this procedure doesn't support multilabel formats.

Here is my code, which uses the intitial dataset created by KSharp (although I don't need the MONTH column).

%let bin=3;

proc sql noprint;

select min(date), intck('month',min(date),max(date)) into :min_dt, :num_mths

from x;

quit;

 

data mth_fmt;

format start end monyy5.;

retain fmtname 'roll_mth' type 'N' hlo 'M';

do i=0 to &num_mths.;

start=intnx('month',&min_dt.,i-&bin.+1);

end=intnx('month',start,&bin.-1,'e');

label=vvalue(end);

output;

end;

drop i;

run;

 

proc format cntlin=mth_fmt;

run;

 

proc summary data=x nway;

class date / mlf order=data;

format date roll_mth.;

var value;

output out=want (drop=_:) mean= median= p75= n= / autoname;

run;

Ksharp
Super User

Keith,

Congratulations! That is a wonderful solution. Educated.

Ksharp

Filipvdr
Pyrite | Level 9

Indeed Keith, your solution works perfectly. Thanks both for your help.

Ksharp
Super User

Actually, I have another way. If you don't mind.

data x(drop=month);
input DATE  : ddmmyy12.     MONTH  : monyy5.    VALUE ;
format date date9. month monyy5.;
cards;
01032011      MAR2011    8.50
02032011      MAR2011    8.50
02032011      MAR2011    5.50
02032011      MAR2011    8.50
14032011      MAR2011    7.50
24032011      MAR2011    9.00
24042011      APR2011     8.50
24042011      APR2011     8.50
26042011      APR2011     2.50
24042011      APR2011     3.50
16052011      MAY2011     2.50
24052011      MAY2011     13.50
25052011      MAY2011     4.50
26052011      MAY2011     3.50
27052011      MAY2011     2.50
25062011      JUN2011     10.50
26062011      JUN2011     13.50
27062011      JUN2011     12.50
28062011      JUN2011     13.50
;
run;
data x;
 set x;
 monyy=mdy(month(date),1,year(date));
 format monyy monyy5.;
run;




%macro cal(monyy=);
data xx;
 set x(where=(monyy between intnx('month',&monyy,-&bin+1,'b') and &monyy));
 month=put(&monyy,monyy5.);
run;
proc summary data=xx nway;
 class month;
 var value;
 output out=result(drop=_:)  mean= median= p75= /autoname;
run;
%mend cal;


%let bin = 3;

proc sort data=x(keep=monyy) out=temp nodupkey;by monyy;run;
data _null_;
 set temp;
 call execute('%cal(monyy='||strip(monyy)||')');
 if _n_ eq 1 then call execute('data final;set result;stop;run;');
 call execute('proc append base=final data=result force;run;');
run;

 


Ksharp

Filipvdr
Pyrite | Level 9

thanks for both answers, it helped me a lot in my coding

though i experience an other difficulty at the moment:

%let default_size_definition=70;


proc summary data=finished_requests nway;
class date / mlf order=data;
format date roll_mth_calc.;
var quota;
output out=TQS (drop=_:) p&default_size_definition.=TQS;
run;

this will not work as proc summary uses only 75 and not 70

so i found i can use proc univariate but the format does not work here? any help?

proc univariate data = finished_requests noprint;
class date / mlf order=data;
format date roll_mth_calc.;
var quota;
output out = caprate
pctlpts = 10 to 100 by 10
pctlpre = pct_;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 2028 views
  • 8 likes
  • 4 in conversation