I have a data like
Year Age x1 x2
1 1 v11 v12
2 1 v21 v22
...
21 1 v211 v212
22 1 v221 v222
....
The last year is 133 and then the age 2 starts (goes through the same years) until the highest age 106.
I need to calculate a new variable of the variables x1 and x2. That is the sum of the years 1,2,3,4,5 and 2,3,4,5,6 and so on in x1 (in each age separately) divided By the sum of the same calculation in x2.
Year Age x1 x2 x3
1 1 v11 v12 (v11+v21+v31+v41+v51)/(v12+v22+v32+v42+v52)
2 1 v21 v22 (v21+v31+v41+v51+v61)/(v22+v32+v42+v52+v62)
...
21 1 v211 v212 (v211+v221+v231+v241+v251)/(v212+v222+v232+v242+v252)
22 1 v221 v222 (v221+v231+v241+v251+v261)/(v222+v232+v242+v252+v262)
After that I need another calculation of for example the value in x3 in year 1 divided By the value in x3 in year 21, the natural logarith of those divided By the amount of years between them (20).
1/20*ln(x3(year1)/x3(year21)) and so fort that in each case there is the 20 year difference. That calculation can be x4 and it should be placed according the latter year.
I hope you could understand what i meant!
Requirement is quite straightforward by using SQL;
select age, sum(x1) / sum(x2) as mean
from have
group by year
;
If you really want this information on each detail row, you just add Year to the SQL and it will automatically remerge your calculation on the detail rows.
On the LOG issue, please show some sample desired output to help us understand better.
I think that approach doesn't meet the stated objectives of 5 year increments.
proc sql;
create table work.want as
select
t1.year,
t1.age,
t1.x1,
t1.x2,
sum(t2.x1)/sum(t2.x2) as x3
from
have t1
left join have t2
on t1.age=t2.age
and t1.year ge t2.year
and t1.year+5 lt t2.year
group by
t1.year,
t1.age,
t1.x1,
t1.x2;
create table want2 as
select
t1.*,
1/20*ln(t2.x3/t1.x3) as x4
from
want t1
left join want t2
on t1.age=t2.age
and t1.year-20=t2.year;
quit;
I don't get it, the x3 and x4 have just missing values in the produced table.
try this one to get the first table:
proc sql;
create table work.want as
select
t1.year,
t1.age,
t1.x1,
t1.x2,
sum(t2.x1) as t2x1,
sum(t2.x2) as t2x2,
sum(t2.x1)/sum(t2.x2) as x3
from
have t1
left join have t2
on t1.age=t2.age
and t2.year between t1.year and (t1.year+4)
group by
t1.year,
t1.age,
t1.x1,
t1.x2;
quit;
Thank you so much! It's working now. Can you tell me one more thing. In case I wanted to have the x4's placed in the first year, what should be changed?
I think all you would have to do is change the join:
proc sql;
create table want2 as
select
t1.*,
1/20*ln(t2.x3/t1.x3) as x4
from
want t1
left join want t2
on t1.age=t2.age
and t1.year=t2.year-20;
quit;
Thanks! Hey, one more thing came up. I need a new variable that is x3 but lagged two years. That's to say, the value of x3 in year 1 would be in year 3. I can't use plain lag function because it interrupts the order of later ages after the 1st age. So basically the first two years in each age should be missing. I'd rather have a new variable and not mess the original x3 since I will need both.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.