Re: cumulative stats such as median

csetzkorn · Posted 07-23-2018 11:08 AM

I have a dataset, which contains:

Date

,GadgetId

,SomeMeasurement

I would like to calculate the median of SomeMeasurement for every month whilst considering the retrospective/previous data. Example:

Date	GadgetId	SomeMeasurement
31-Jan-15	A1	5
26-Jan-15	A1	3
26-Jan-15	A1	3
26-Jan-15	A1	3
03-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	4
02-Feb-15	A1	5
02-Feb-15	A1	5
03-Feb-15	A1	5
02-Feb-15	A1	5
07-Feb-15	A1	4
03-Feb-15	A1	5

In month Jan 2015 one would consider the values of this month only to calculate the median. In month Feb 2015 one would consider the values in Jan 2015 and Feb 2015, in Dec 2017 one would consider the data for Dec 2017 and all the previous months etc.

Please note that each dataset contains several GadgetIds so a BY GadgetId would be required I suppose. Also each GadgetId has different number samples/dates (some may only have 1 year's worth of data whereas others may have several year's worth of data).

Reeza · Posted 07-23-2018 11:12 AM

PROC EXPAND.

csetzkorn · Posted 07-23-2018 11:19 AM

Thanks. I think this is sas/ets which we do not have )-:

RW9 · Posted 07-23-2018 11:16 AM

How many "and so on"'s are we talking? I mean you could keep all the values in an array for instance then median each row.

data want;
  set have;
  array vals{100} 8;
  retain vals:;
  retain num;
  num=ifn(_n_=1,1,num+1);
  vals{num}=somemeasurement;
  result=median(of vals{*});
run;

That is given 100 observations.

I am not sure I quote see the logic here though, why doing a rolling median? Would not a monthly or yearly be appropriate?

csetzkorn · Posted 07-23-2018 11:21 AM

Thanks. It could be 3-4 years worth of data. so in month 12 of year 4 I have to use data of all 4 years to get the median. Please also not that I have to use a BY for different gadgets. Each gadget can have 3-4 years but the amount of data is dynamic - i.e. depends on the gadget.

novinosrin · Posted 07-23-2018 11:26 AM

Can you please provide a more complete sample data with gadgets and the rest?

csetzkorn · Posted 07-23-2018 11:40 AM

Done - sorry if it was not clear enough ...

Reeza · Posted 07-23-2018 11:43 AM

You only included one gadget, he asked for a few.

The second solution I posted deals with BY groups - see the BY and IF FIRST statement that resets things.

@csetzkorn wrote:
Done - sorry if it was not clear enough ...

novinosrin · Posted 07-23-2018 12:28 PM

Helps when you provide complete and comprehensive samples and details

data have;
  input Date : date9. gadgetid $ SomeMeasurement;
  format date date9.;
datalines;
31-Jan-15	A1	5
26-Jan-15	A1	3
26-Jan-15	A1	3
26-Jan-15	A1	3
03-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	4
02-Feb-15	A1	5
02-Feb-15	A1	5
03-Feb-15	A1	5
02-Feb-15	A1	5
07-Feb-15	A1	4
03-Feb-15	A1	5
31-Jan-15	B1	5
26-Jan-15	B1	3
26-Jan-15	B1	3
26-Jan-15	B1	3
03-Feb-15	B1	5
07-Feb-15	B1	5
07-Feb-15	B1	5
07-Feb-15	B1	4
02-Feb-15	B1	5
02-Feb-15	B1	5
03-Feb-15	B1	5
02-Feb-15	B1	5
07-Feb-15	B1	4
03-Feb-15	B1	5
;
run;
data temp;
set have;
by gadgetid;
if first.gadgetid then grp=0;
formatted_date=date;
if  month(date) ne lag(month(date)) then grp+1;
format formatted_date monyy7.;
run;
data want;
_k=_n_;
_c=0;
array t(20) _temporary_ ;/*array subscript arbitrary,should assign a big one to hold*/
call missing(median,of t(*));
do  until(last.gadgetid);
do  until(last.grp);
set temp;
by gadgetid grp;
_c+1;
t(_c)=SomeMeasurement;
if last.grp then do; median=median(median,of t(*));output;end;
end;
end;
drop _:;
run;

novinosrin · Posted 07-23-2018 01:28 PM

slight correction to the data want step:

data want;
_k=_n_;
_c=0;
array t(20) _temporary_ ;/*array subscript arbitrary,should assign a big one to hold*/
call missing(median,of t(*));
do  until(last.gadgetid);
do  until(last.grp);
set temp;
by gadgetid grp;
_c+1;
t(_c)=SomeMeasurement;
if last.grp then do; median=median(of t(*));output;end;
end;
end;
drop _: grp;
run;

csetzkorn · Posted 07-24-2018 03:51 AM

Thanks. Does "should assign a big one" mean that i can assign one which is bigger then what is needed, just in case?

novinosrin · Posted 07-24-2018 07:43 AM

@csetzkorn Yes, the bigger subscript makes sure values(elements doesn't go out of range.

For example, if you believe there could be 10000 records per gadgetid

Reeza · Posted 07-23-2018 11:16 AM

And temporary arrays method, make your array 31 to have a full month of data. If you have repeated measurements for a month are they considered the same? I noticed you had two observations for month=1 and 1 for month =3. If you have a variable number per month you may want to standardize or aggregate that somehow first.

https://gist.github.com/statgeek/27e23c015eae7953eff2

data want;

set sashelp.stocks; 
by stock notsorted;

array p{0:30} _temporary_;


if first.stock then call missing(of p{*});
p{mod(_n_,31)} = open;
lowest = median(of p{*});
highest = max(of p{*});


run;

csetzkorn · Posted 07-23-2018 02:36 PM

yes there could be several values per day as indicated in the example.

Ksharp · Posted 07-24-2018 08:41 AM

If you have SAS9.4

data have;
  input Date : date9. gadgetid $ SomeMeasurement;
  new_date=intnx('month',date,0);
  format date new_date date9.;
datalines;
31-Jan-15	A1	5
26-Jan-15	A1	3
26-Jan-15	A1	3
26-Jan-15	A1	3
03-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	4
02-Feb-15	A1	5
02-Feb-15	A1	5
03-Feb-15	A1	5
02-Feb-15	A1	5
07-Feb-15	A1	4
03-Feb-15	A1	5
31-Jan-15	B1	5
26-Jan-15	B1	3
26-Jan-15	B1	3
26-Jan-15	B1	3
03-Feb-15	B1	5
07-Feb-15	B1	5
07-Feb-15	B1	5
07-Feb-15	B1	4
02-Feb-15	B1	5
02-Feb-15	B1	5
03-Feb-15	B1	5
02-Feb-15	B1	5
07-Feb-15	B1	4
03-Feb-15	B1	5
;
run;
proc sql;
create table want as
 select *,(select median(SomeMeasurement) from have 
where gadgetid=a.gadgetid and new_date<=a.new_date) as median
  from have as a;
quit;

Registration is open

SAS Training: Just a Click Away