Hi
I am trying to create variables that are the sum of weighted observations from variables one, two, three, four, and five over a flexible rolling time frame.
Here is an example of what I am trying to do:
create variable named 2004 = ([(.1)*(one + two + three + four + five)]where year=2005) + ([(.1)*(one + two + three + four + five)]where year=2006) + .. + ([(.1)*(one + two + three + four + five)]where year=2010)
create variable named 2005 = ([(.1)*(one + two + three + four + five)]where year=2006) + ([(.1)*(one + two + three + four + five)]where year=2007) + .. + ([(.1)*(one + two + three + four + five)]where year=2011)
The idea is to create a variable that is the sum of weighted observations from the ten subsequent years. I understand how to explicitly write this out in the data step per variable created. I would like to understand how to more efficiently code the flexible year specification part. Below is a simple example of what the data looks like.
year | one | two | three | four | five |
2005 | 36,565,000 | 38,964,000 | 43,091,000 | 41,709,000 | 64,508,000 |
2006 | 36,761,000 | 39,090,000 | 42,869,000 | 42,619,000 | 66,213,000 |
2007 | 37,282,000 | 39,514,000 | 42,582,000 | 43,291,000 | 67,981,000 |
2008 | 37,342,000 | 39,707,000 | 41,966,000 | 43,723,000 | 69,878,000 |
I am assuming you will want one row data set with ypur named variables.
data x;
informat one two three four five comma12.;
input year one two three four five ;
datalines;
2005 36,565,000 38,964,000 43,091,000 41,709,000 64,508,000
2006 36,761,000 39,090,000 42,869,000 42,619,000 66,213,000
2007 37,282,000 39,514,000 42,582,000 43,291,000 67,981,000
2008 37,342,000 39,707,000 41,966,000 43,723,000 69,878,000
2009 37,422,000 39,717,000 41,866,000 43,733,000 69,888,000
2010 37,501,000 39,727,000 41,766,000 43,743,000 69,898,000
2011 37,468,000 39,737,000 41,666,000 43,753,000 69,978,000
;
run;
data want;
array y [2004:2010] y2004-y2010;
do until(last);
set x end=last;
upperlimit = MIN(2010,Year);
do yr = 2004 to UpperLimit;
y[yr]+((.1)*(one + two + three + four + five)*(year > yr and year <= (yr+6)));
end;
end;
put (y2004-y2010)(=);
keep y2004-y2010;
run;
y2004=138091600 y2005=138868100 y2006=116112900 y2007=93047900 y2008=69786300 y2009=46523700 y2010=23260200
Hi Alpay
Thanks for your code. I am trying to figure out what the following line of code is doing:
y[yr]+((.1)*(one + two + three + four + five)*(year > yr and year <= (yr+6)));
I understand the middle section but what do the pieces of code: y[yr] and ..*(year>yr and year <=(yr+6)) do?
y is an array of variables y2004 thru y2010. Its values initialized to missing at the beginning of the data step.
yr is a variable to denote the index value for array y.
y[yr]+ part will increment the value of the variable referenced (e.g., yr=2005 means y2005 variable).
*(year>yr and year <=(yr+6)) part is an inline indicator whether year is greater than yr and year is less than or equal to yr + 6.
If the criteria holds, it will be computed as 1 otherwise 0.
For your case, starting year and ending year values need to be changed to 1938 and 2002, respectively.
Also, the expression *(year>yr and year <=(yr+6)) needs to be replaced with *(year>yr and year <=(yr+10)).
I want know whether there will be multiple obs for the same year.
data x; informat one two three four five comma12.; input year one two three four five ; datalines; 2005 36,565,000 38,964,000 43,091,000 41,709,000 64,508,000 2006 36,761,000 39,090,000 42,869,000 42,619,000 66,213,000 2007 37,282,000 39,514,000 42,582,000 43,291,000 67,981,000 2008 37,342,000 39,707,000 41,966,000 43,723,000 69,878,000 2009 37,422,000 39,717,000 41,866,000 43,733,000 69,888,000 2010 37,501,000 39,727,000 41,766,000 43,743,000 69,898,000 2011 37,468,000 39,737,000 41,666,000 43,753,000 69,978,000 ; run; proc sql; create table temp as select a.year, 0.1*( select sum(sum(one,two,three,four,five)) from x where x.year between a.year+1 and a.year+6 ) as weight_sum from (select distinct year from x) as a; quit; proc transpose data=temp out=want prefix=year ; var weight_sum; id year; run;
Ksharp
I am trying to yield one observation per year. The variables I am trying to create represent a single year value where each variable, say y2004 has a corresponding data point that is equal to a weighted sum of 10 years of the variables one, two, three, four and five from the years 2005:2015. I pasted in an example data set for exposition, the data I am using goes from 1948 to current so the output would correspond to a single vector of data for the variables y1938: y2002.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.