BookmarkSubscribeRSS Feed
jdub
Calcite | Level 5

Hi

I am trying to create variables that are the sum of weighted observations from variables one, two, three, four, and five over a flexible rolling time frame.

Here is an example of what I am trying to do:

create variable named 2004 = ([(.1)*(one + two + three + four + five)]where year=2005) + ([(.1)*(one + two + three + four + five)]where year=2006) + .. + ([(.1)*(one + two + three + four + five)]where year=2010)

create variable named 2005 = ([(.1)*(one + two + three + four + five)]where year=2006) + ([(.1)*(one + two + three + four + five)]where year=2007) + .. + ([(.1)*(one + two + three + four + five)]where year=2011)

The idea is to create a variable that is the sum of weighted observations from the ten subsequent years.  I understand how to explicitly write this out in the data step per variable created.  I would like to understand how to more efficiently code the flexible year specification part.  Below is a simple example of what the data looks like.

yearonetwothreefourfive
200536,565,00038,964,00043,091,00041,709,00064,508,000
200636,761,00039,090,00042,869,00042,619,00066,213,000
200737,282,00039,514,00042,582,00043,291,00067,981,000
200837,342,00039,707,00041,966,00043,723,00069,878,000
5 REPLIES 5
Alpay
Fluorite | Level 6

I am assuming you will want one row data set with ypur named variables.

data x;

informat one    two    three    four    five comma12.;

input year    one    two    three    four    five ;

datalines;

2005 36,565,000 38,964,000 43,091,000 41,709,000 64,508,000

2006 36,761,000 39,090,000 42,869,000 42,619,000 66,213,000

2007 37,282,000 39,514,000 42,582,000 43,291,000 67,981,000

2008 37,342,000 39,707,000 41,966,000 43,723,000 69,878,000

2009 37,422,000 39,717,000 41,866,000 43,733,000 69,888,000

2010 37,501,000 39,727,000 41,766,000 43,743,000 69,898,000

2011 37,468,000 39,737,000 41,666,000 43,753,000 69,978,000

;

run;

data want;

array y [2004:2010] y2004-y2010;

do until(last);

    set x end=last;

    upperlimit = MIN(2010,Year);

    do yr = 2004 to UpperLimit;

        y[yr]+((.1)*(one + two + three + four + five)*(year > yr and year <= (yr+6)));

    end;

end;

put (y2004-y2010)(=);

keep y2004-y2010;

run;

y2004=138091600 y2005=138868100 y2006=116112900 y2007=93047900 y2008=69786300 y2009=46523700 y2010=23260200

jdub
Calcite | Level 5

Hi Alpay

Thanks for your code.  I am trying to figure out what the following line of code is doing:

y[yr]+((.1)*(one + two + three + four + five)*(year > yr and year <= (yr+6)));

I understand the middle section but what do the pieces of code: y[yr] and ..*(year>yr and year <=(yr+6)) do?

Alpay
Fluorite | Level 6

y is an array of variables y2004 thru y2010. Its values initialized to missing at the beginning of the data step.

yr is a variable to denote the index value for array y.

y[yr]+ part will increment the value of the variable referenced (e.g., yr=2005 means y2005 variable).

*(year>yr and year <=(yr+6)) part is an inline indicator whether year is greater than yr and year is less than or equal to yr + 6.

If the criteria holds, it will be computed as 1 otherwise 0.

For your case, starting year and ending year values need to be changed to 1938 and 2002, respectively.

Also, the expression *(year>yr and year <=(yr+6)) needs to be replaced with *(year>yr and year <=(yr+10)).

Ksharp
Super User

I want know whether there will be multiple obs for the same year.

data x;
informat one    two    three    four    five comma12.;
input year    one    two    three    four    five ;
datalines;
2005 36,565,000 38,964,000 43,091,000 41,709,000 64,508,000
2006 36,761,000 39,090,000 42,869,000 42,619,000 66,213,000
2007 37,282,000 39,514,000 42,582,000 43,291,000 67,981,000
2008 37,342,000 39,707,000 41,966,000 43,723,000 69,878,000
2009 37,422,000 39,717,000 41,866,000 43,733,000 69,888,000
2010 37,501,000 39,727,000 41,766,000 43,743,000 69,898,000
2011 37,468,000 39,737,000 41,666,000 43,753,000 69,978,000
;
run;
proc sql;
create table temp as 
 select a.year, 0.1*( select sum(sum(one,two,three,four,five)) from x where x.year between  a.year+1 and a.year+6 ) as weight_sum
  from (select distinct year from x) as a;
quit;
proc transpose data=temp out=want prefix=year ;
 var weight_sum;
 id year;
run;


Ksharp

jdub
Calcite | Level 5

I am trying to yield one observation per year.  The variables I am trying to create represent a single year value where each variable, say y2004 has a corresponding data point that is equal to a weighted sum of 10 years of the variables one, two, three, four and five from the years 2005:2015.  I pasted in an example data set for exposition, the data I am using goes from 1948 to current so the output would correspond to a single vector of data for the variables y1938: y2002.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 916 views
  • 3 likes
  • 3 in conversation