BookmarkSubscribeRSS Feed
yeaforme
Calcite | Level 5

Curious how best to go about simple time weighting.  Essentially, I have quasi-daily data that I need to weight based upon when in the year it occurred relative to all other data points for that year and ID where the first observation (temporally) receives a weight of 1/n and the last observation receives a weight of n/n or 1.

 

So for example, if a given ID-year has the following data points (and only these data points):

 

DataDate Var1

 

Apr101980 50

Apr111980 55

Apr171980 55

Apr191980 40

Apr241980 30

 

Then the weights need to be as follows:

 

DataDate Var1 Weight

 

Apr101980 50 1/5

Apr111980 55 2/5

Apr171980 55 3/5

Apr191980 40 4/5

Apr241980 30 5/5

 

Such that the weighted sum for the given ID for the year 1980 would be = 50*1/5 + 55*2/5 + 55*3/5 + 40*4/5 + 30*5/5 = 127.

 

The actual data set is quite large, includes thousands of IDs, 40+ years, and probably 200+ daily observations (not always the same per year or even per ID in the same year).  Any help on where to start would be appreciated.

9 REPLIES 9
novinosrin
Tourmaline | Level 20
data have;
input DataDate :$10. Var1;
n+1;/*creating counter, you could also use n=_n_*/
cards;
Apr101980 50
Apr111980 55
Apr171980 55
Apr191980 40
Apr241980 30
;
proc sql;
create table want(drop=n) as
select *,var1*n/count(*) as weight
from have;
quit;
yeaforme
Calcite | Level 5

Wow, is it really that simple?  Thanks novinosrin.  I'll try it out and see if it works for my larger dataset.

novinosrin
Tourmaline | Level 20

Please show a sample output of what you want for your input so I could modify. @PaigeMiller has pointed out my mistake . 

 

Also, please provide me better representative sample with ID var etc

yeaforme
Calcite | Level 5
For an extended sample such as the below:
ID DataDate Var1

1 Apr101980 50
1 Apr111980 55
1 Apr171980 55
1 Apr191980 40
1 Apr241980 30

1 Feb81981 40
1 Feb101981 40
1 Feb181981 40
1 Feb271981 40

2 March181981 30
2 June271981 27
2 Sept81981 20

I would want the following output

ID DataDate Var1 Weight

1 Apr101980 50 1/5
1 Apr111980 55 2/5
1 Apr171980 55 3/5
1 Apr191980 40 4/5
1 Apr241980 30 5/5

1 Feb81981 40 1/4
1 Feb101981 40 2/4
1 Feb181981 40 3/4
1 Feb271981 40 4/4

2 March181981 30 1/3
2 June271981 27 2/3
2 Sept81981 20 3/3


Of course, I'm not picky on the weight variable. It could be two variables, one which contains the temporal order and a second which contains the total n. I can manipulate to get what I want afterward. Eventually, the weights will be applied to the variable to get a sum for the year.
novinosrin
Tourmaline | Level 20
data have;
input ID DataDate : $15. Var1;
cards;
1 Apr101980 50
1 Apr111980 55
1 Apr171980 55
1 Apr191980 40
1 Apr241980 30
1 Feb81981 40
1 Feb101981 40
1 Feb181981 40
1 Feb271981 40
2 March181981 30
2 June271981 27
2 Sept81981 20
;
data temp;
set have;
year=substr(DataDate,length(DataDate)-3);
/*in your real, date should be numeric, so you can do year=year(date)*/
run;


data want;
do n=1 by 1 until(last.year);
set temp;
by id year;
end;
do _n_=1 by 1 until (last.year);
set temp;
by id year;
want=Var1*_n_/n;
output;
end;
drop n;
run;
PaigeMiller
Diamond | Level 26

@novinosrin wrote:
data have;
input DataDate :$10. Var1;
n+1;/*creating counter, you could also use n=_n_*/
cards;
Apr101980 50
Apr111980 55
Apr171980 55
Apr191980 40
Apr241980 30
;
proc sql;
create table want(drop=n) as
select *,var1*n/count(*) as weight
from have;
quit;

This fails if the data set contains multiple years.

--
Paige Miller
PaigeMiller
Diamond | Level 26

assign a sequential number by year, then weighted sum, then divide by N

 

Assumes data is properly sorted and each record has the value for Year (not date as in the data you provided)

 

data have2;
    set have;
    by year;
    if first.year then seq=0;
    seq+1;
run;

proc summary nway data=have2;
    class year;
    var var1/weight=seq;
    output out=have3 sum=sum_var1;
run;

data want;
    set have3;
    final_answer=sum_var1/_freq_;
run;
--
Paige Miller
yeaforme
Calcite | Level 5
If I'm understanding the above code correctly, the proc summary provides the weights, whereas the last data function creates an average, correct?
PaigeMiller
Diamond | Level 26

PROC SUMMARY provides a weighted sum

 

The data step following divided by the N to turn it into an average

--
Paige Miller
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2604 views
  • 0 likes
  • 3 in conversation