BookmarkSubscribeRSS Feed
csetzkorn
Lapis Lazuli | Level 10

I have a dataset, which contains:

 

Date

,GadgetId

,SomeMeasurement

 

I would like to calculate the median of SomeMeasurement for every month whilst considering the retrospective/previous data. Example:

 

DateGadgetIdSomeMeasurement
31-Jan-15A15
26-Jan-15A13
26-Jan-15A13
26-Jan-15A13
03-Feb-15A15
07-Feb-15A15
07-Feb-15A15
07-Feb-15A14
02-Feb-15A15
02-Feb-15A15
03-Feb-15A15
02-Feb-15A15
07-Feb-15A14
03-Feb-15A15

 

In month Jan 2015 one would consider the values of this month only to calculate the median. In month Feb 2015 one would consider the values in Jan 2015 and Feb 2015, in Dec 2017 one would consider the data for Dec 2017 and all the previous months etc.

 

Please note that each dataset contains several GadgetIds so a BY GadgetId would be required I suppose. Also each GadgetId has different number samples/dates (some may only have 1 year's worth of data whereas others may have several year's worth of data). 

14 REPLIES 14
Reeza
Super User

PROC EXPAND. 

 

csetzkorn
Lapis Lazuli | Level 10

Thanks. I think this is sas/ets which we do not have )-:

RW9
Diamond | Level 26 RW9
Diamond | Level 26

How many "and so on"'s are we talking?  I mean you could keep all the values in an array for instance then median each row.  

data want;
  set have;
  array vals{100} 8;
  retain vals:;
  retain num;
  num=ifn(_n_=1,1,num+1);
  vals{num}=somemeasurement;
  result=median(of vals{*});
run;

That is given 100 observations. 

 

I am not sure I quote see the logic here though, why doing a rolling median?  Would not a monthly or yearly be appropriate?

csetzkorn
Lapis Lazuli | Level 10

Thanks. It could be 3-4 years worth of data. so in month 12 of year 4 I have to use data of all 4 years to get the median. Please also not that I have to use a BY for different gadgets. Each gadget can have 3-4 years but the amount of data is dynamic - i.e. depends on the gadget.

novinosrin
Tourmaline | Level 20

Can you please provide a more complete sample data with gadgets and the rest?

csetzkorn
Lapis Lazuli | Level 10
Done - sorry if it was not clear enough ...
Reeza
Super User

You only included one gadget, he asked for a few. 

The second solution I posted deals with BY groups - see the BY and IF FIRST statement that resets things.

 


@csetzkorn wrote:
Done - sorry if it was not clear enough ...

 

novinosrin
Tourmaline | Level 20

Helps when you provide complete and comprehensive samples and details 

 

data have;
  input Date : date9. gadgetid $ SomeMeasurement;
  format date date9.;
datalines;
31-Jan-15	A1	5
26-Jan-15	A1	3
26-Jan-15	A1	3
26-Jan-15	A1	3
03-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	4
02-Feb-15	A1	5
02-Feb-15	A1	5
03-Feb-15	A1	5
02-Feb-15	A1	5
07-Feb-15	A1	4
03-Feb-15	A1	5
31-Jan-15	B1	5
26-Jan-15	B1	3
26-Jan-15	B1	3
26-Jan-15	B1	3
03-Feb-15	B1	5
07-Feb-15	B1	5
07-Feb-15	B1	5
07-Feb-15	B1	4
02-Feb-15	B1	5
02-Feb-15	B1	5
03-Feb-15	B1	5
02-Feb-15	B1	5
07-Feb-15	B1	4
03-Feb-15	B1	5
;
run;
data temp;
set have;
by gadgetid;
if first.gadgetid then grp=0;
formatted_date=date;
if  month(date) ne lag(month(date)) then grp+1;
format formatted_date monyy7.;
run;
data want;
_k=_n_;
_c=0;
array t(20) _temporary_ ;/*array subscript arbitrary,should assign a big one to hold*/
call missing(median,of t(*));
do  until(last.gadgetid);
do  until(last.grp);
set temp;
by gadgetid grp;
_c+1;
t(_c)=SomeMeasurement;
if last.grp then do; median=median(median,of t(*));output;end;
end;
end;
drop _:;
run;
novinosrin
Tourmaline | Level 20

slight correction to the data want step:

 

data want;
_k=_n_;
_c=0;
array t(20) _temporary_ ;/*array subscript arbitrary,should assign a big one to hold*/
call missing(median,of t(*));
do  until(last.gadgetid);
do  until(last.grp);
set temp;
by gadgetid grp;
_c+1;
t(_c)=SomeMeasurement;
if last.grp then do; median=median(of t(*));output;end;
end;
end;
drop _: grp;
run;
csetzkorn
Lapis Lazuli | Level 10
Thanks. Does "should assign a big one" mean that i can assign one which is bigger then what is needed, just in case?
novinosrin
Tourmaline | Level 20

@csetzkorn Yes, the bigger subscript makes sure values(elements doesn't go out of range.

For example, if you believe there could be 10000 records per gadgetid

Reeza
Super User

And temporary arrays method, make your array 31 to have a full month of data. If you have repeated measurements for a month are they considered the same? I noticed you had two observations for month=1 and 1 for month =3. If you have a variable number per month you may want to standardize or aggregate that somehow first.

 

https://gist.github.com/statgeek/27e23c015eae7953eff2

 

data want;

set sashelp.stocks; 
by stock notsorted;

array p{0:30} _temporary_;


if first.stock then call missing(of p{*});
p{mod(_n_,31)} = open;
lowest = median(of p{*});
highest = max(of p{*});


run;
csetzkorn
Lapis Lazuli | Level 10

yes there could be several values per day as indicated in the example.

Ksharp
Super User

If you have SAS9.4

 

data have;
  input Date : date9. gadgetid $ SomeMeasurement;
  new_date=intnx('month',date,0);
  format date new_date date9.;
datalines;
31-Jan-15	A1	5
26-Jan-15	A1	3
26-Jan-15	A1	3
26-Jan-15	A1	3
03-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	5
07-Feb-15	A1	4
02-Feb-15	A1	5
02-Feb-15	A1	5
03-Feb-15	A1	5
02-Feb-15	A1	5
07-Feb-15	A1	4
03-Feb-15	A1	5
31-Jan-15	B1	5
26-Jan-15	B1	3
26-Jan-15	B1	3
26-Jan-15	B1	3
03-Feb-15	B1	5
07-Feb-15	B1	5
07-Feb-15	B1	5
07-Feb-15	B1	4
02-Feb-15	B1	5
02-Feb-15	B1	5
03-Feb-15	B1	5
02-Feb-15	B1	5
07-Feb-15	B1	4
03-Feb-15	B1	5
;
run;
proc sql;
create table want as
 select *,(select median(SomeMeasurement) from have 
where gadgetid=a.gadgetid and new_date<=a.new_date) as median
  from have as a;
quit;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1697 views
  • 1 like
  • 5 in conversation