BookmarkSubscribeRSS Feed
yeaforme
Calcite | Level 5

Curious how best to go about simple time weighting.  Essentially, I have quasi-daily data that I need to weight based upon when in the year it occurred relative to all other data points for that year and ID where the first observation (temporally) receives a weight of 1/n and the last observation receives a weight of n/n or 1.

 

So for example, if a given ID-year has the following data points (and only these data points):

 

DataDate Var1

 

Apr101980 50

Apr111980 55

Apr171980 55

Apr191980 40

Apr241980 30

 

Then the weights need to be as follows:

 

DataDate Var1 Weight

 

Apr101980 50 1/5

Apr111980 55 2/5

Apr171980 55 3/5

Apr191980 40 4/5

Apr241980 30 5/5

 

Such that the weighted sum for the given ID for the year 1980 would be = 50*1/5 + 55*2/5 + 55*3/5 + 40*4/5 + 30*5/5 = 127.

 

The actual data set is quite large, includes thousands of IDs, 40+ years, and probably 200+ daily observations (not always the same per year or even per ID in the same year).  Any help on where to start would be appreciated.

9 REPLIES 9
novinosrin
Tourmaline | Level 20
data have;
input DataDate :$10. Var1;
n+1;/*creating counter, you could also use n=_n_*/
cards;
Apr101980 50
Apr111980 55
Apr171980 55
Apr191980 40
Apr241980 30
;
proc sql;
create table want(drop=n) as
select *,var1*n/count(*) as weight
from have;
quit;
yeaforme
Calcite | Level 5

Wow, is it really that simple?  Thanks novinosrin.  I'll try it out and see if it works for my larger dataset.

novinosrin
Tourmaline | Level 20

Please show a sample output of what you want for your input so I could modify. @PaigeMiller has pointed out my mistake . 

 

Also, please provide me better representative sample with ID var etc

yeaforme
Calcite | Level 5
For an extended sample such as the below:
ID DataDate Var1

1 Apr101980 50
1 Apr111980 55
1 Apr171980 55
1 Apr191980 40
1 Apr241980 30

1 Feb81981 40
1 Feb101981 40
1 Feb181981 40
1 Feb271981 40

2 March181981 30
2 June271981 27
2 Sept81981 20

I would want the following output

ID DataDate Var1 Weight

1 Apr101980 50 1/5
1 Apr111980 55 2/5
1 Apr171980 55 3/5
1 Apr191980 40 4/5
1 Apr241980 30 5/5

1 Feb81981 40 1/4
1 Feb101981 40 2/4
1 Feb181981 40 3/4
1 Feb271981 40 4/4

2 March181981 30 1/3
2 June271981 27 2/3
2 Sept81981 20 3/3


Of course, I'm not picky on the weight variable. It could be two variables, one which contains the temporal order and a second which contains the total n. I can manipulate to get what I want afterward. Eventually, the weights will be applied to the variable to get a sum for the year.
novinosrin
Tourmaline | Level 20
data have;
input ID DataDate : $15. Var1;
cards;
1 Apr101980 50
1 Apr111980 55
1 Apr171980 55
1 Apr191980 40
1 Apr241980 30
1 Feb81981 40
1 Feb101981 40
1 Feb181981 40
1 Feb271981 40
2 March181981 30
2 June271981 27
2 Sept81981 20
;
data temp;
set have;
year=substr(DataDate,length(DataDate)-3);
/*in your real, date should be numeric, so you can do year=year(date)*/
run;


data want;
do n=1 by 1 until(last.year);
set temp;
by id year;
end;
do _n_=1 by 1 until (last.year);
set temp;
by id year;
want=Var1*_n_/n;
output;
end;
drop n;
run;
PaigeMiller
Diamond | Level 26

@novinosrin wrote:
data have;
input DataDate :$10. Var1;
n+1;/*creating counter, you could also use n=_n_*/
cards;
Apr101980 50
Apr111980 55
Apr171980 55
Apr191980 40
Apr241980 30
;
proc sql;
create table want(drop=n) as
select *,var1*n/count(*) as weight
from have;
quit;

This fails if the data set contains multiple years.

--
Paige Miller
PaigeMiller
Diamond | Level 26

assign a sequential number by year, then weighted sum, then divide by N

 

Assumes data is properly sorted and each record has the value for Year (not date as in the data you provided)

 

data have2;
    set have;
    by year;
    if first.year then seq=0;
    seq+1;
run;

proc summary nway data=have2;
    class year;
    var var1/weight=seq;
    output out=have3 sum=sum_var1;
run;

data want;
    set have3;
    final_answer=sum_var1/_freq_;
run;
--
Paige Miller
yeaforme
Calcite | Level 5
If I'm understanding the above code correctly, the proc summary provides the weights, whereas the last data function creates an average, correct?
PaigeMiller
Diamond | Level 26

PROC SUMMARY provides a weighted sum

 

The data step following divided by the N to turn it into an average

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1153 views
  • 0 likes
  • 3 in conversation