Dear who can help this out;
I have state level socioeconomic data set from census, income, race ratio, marriage ratio, education level etc. It's in each 10 year period (1980, 1990, 2000). I want to linearly interpolate each variable values for the years in between (values for 1981~1989, 1991~1999).
Data structure now looks like this
State year income race ratio marriage ratio education level
Alabama 1980 10000 0.40 0.52 0.34
Alabama 1990 12000 0.44 0.50 0.32
Alabama 2000 13000 0.48 0.58 0.39
California 1980 15000 0.20 0.63 0.42
California 1990 16000 0.25 0.66 0.43
California 2000 16500 0.30 0.69 0.47
Any idea on this particular procedure with this set of data would be appreciated!
Thank you,
MB
You could use this code which is a bit more general:
data have2;
length state $16;
input State year income race_ratio marriage_ratio education_level;
datalines;
Alabama 1980 10000 0.40 0.52 0.34
Alabama 1990 12000 0.44 0.50 0.32
Alabama 2000 13000 0.48 0.58 0.39
California 1980 15000 0.20 0.63 0.42
California 1990 16000 0.25 0.66 0.43
California 2000 16500 0.30 0.69 0.47
;
data want2;
set have2; by state notsorted;
array prev{5} _temporary_; array actual{5} _temporary_;
array val{5} year income race_ratio marriage_ratio education_level;
do i = 1 to dim(val);
actual{i} = val{i};
prev{i} = lag(val{i});
end;
if first.state then output;
else do;
do year = prev{1}+1 to actual{1};
do i = 2 to dim(val);
val{i} = ((actual{1}-year)*prev{i} + (year-prev{1})*actual{i}) / (actual{1}-prev{1});
end;
output;
end;
end;
drop i;
run;
proc print data=want2 noobs; run;
PG
Proc transpose data = want out = step1
prefix = 'yr_'
;
By State ;
id Year ;
Var _Numeric_ ;
run ;
/* you may get a note that year is both an id and analysis variable */
Data step2 ;
Array annual {*} yr_1980 - yr_2000 ;
set step1 ;
Do k = 1 to 9 ;
annual (k + 1) = (k * yr_1980 + (10 - k) * yr_1990) / 9 ;
annual (k + 11) = (k * yr_1990 + (10 - k) * yr_2000) / 9 ;
end ;
Drop k ;
Run ;
Proc Transpose data = step2
out = want (drop = _name_)
;
By State ;
Run ;
Alert! untested code. You might have to have another datastep to strip year out of _Name_ in want (in which case do not drop it in the transpose!
Richard in NZ again
Message was edited by: Richard Carson Changed divisor to 9 from 10.
Thank you Richard for your detailed explanation.
My question is that it seems like the order of k and (10-k) in the following equation matter:
annual (k + 1) = (k * yr_1980 + (10 - k) * yr_1990) / 9 ;
annual (k + 11) = (k * yr_1990 + (10 - k) * yr_2000) / 9 ;
For instance, if the variables have increased over the decades, the above formula gives me a decreasing patter in the values for the variable.
If age is 30 at 1980 and 34 in 1990, then it's rational to see values for 1981-1985 to be smaller than values for 1986-1989. But the above code gives me the opposite. Any thoughts would be helpful!
Thank you!
Markov
You could use this code which is a bit more general:
data have2;
length state $16;
input State year income race_ratio marriage_ratio education_level;
datalines;
Alabama 1980 10000 0.40 0.52 0.34
Alabama 1990 12000 0.44 0.50 0.32
Alabama 2000 13000 0.48 0.58 0.39
California 1980 15000 0.20 0.63 0.42
California 1990 16000 0.25 0.66 0.43
California 2000 16500 0.30 0.69 0.47
;
data want2;
set have2; by state notsorted;
array prev{5} _temporary_; array actual{5} _temporary_;
array val{5} year income race_ratio marriage_ratio education_level;
do i = 1 to dim(val);
actual{i} = val{i};
prev{i} = lag(val{i});
end;
if first.state then output;
else do;
do year = prev{1}+1 to actual{1};
do i = 2 to dim(val);
val{i} = ((actual{1}-year)*prev{i} + (year-prev{1})*actual{i}) / (actual{1}-prev{1});
end;
output;
end;
end;
drop i;
run;
proc print data=want2 noobs; run;
PG
Thank you very much PG!
@Markov
Obviously if you have access to Proc Expand that is the best toool to use.
OK, I did warn you that mycode was untested! Only 3 errors: Prefix did not do what I wanted, multipliers in the wrong order and the divisor should have been 10 as I had entered it at first. Here is the tested (v9.3) code:
options validvarname=v7 ;
Proc transpose data = have out = step1
;
By State ;
id Year ;
Var _Numeric_ ;
run ;
Data step2 ;
Array annual {*} _1980 - _2000 ;
set step1 ;
Do k = 1 to 9 ;
annual (k + 1) = ((10 - k) * _1980 + k * _1990) / 10 ;
annual (k + 11) = ((10 - k) * _1990 + k * _2000) / 10 ;
end ;
Drop k ;
Run ;
Proc Transpose data = step2
out = want (drop = _name_)
;
By State ;
Run ;
Richard in NZ
Thank you very much Richard! Your code is very helpful in showing me the logic behind the procedure.
Best,
Markov
If you have PROC EXPAND it will expand your decade data into years and allow you to compute various types of interpolated values. I would try to make example but I don't have SAS/ETS.
You can implement DN's suggestion as follows:
data have;
length state $16;
input State year income race_ratio marriage_ratio education_level;
yearD = mdy(1, 1, year);
drop year;
datalines;
Alabama 1980 10000 0.40 0.52 0.34
Alabama 1990 12000 0.44 0.50 0.32
Alabama 2000 13000 0.48 0.58 0.39
California 1980 15000 0.20 0.63 0.42
California 1990 16000 0.25 0.66 0.43
California 2000 16500 0.30 0.69 0.47
;
proc expand data=have out=want from=year10 to=year method=join;
by state notsorted;
id yearD;
run;
proc print data=want noobs; run;
PG
Thank you PG for your explanation.
My few questions on your code would be:
1) is there a particular reason you assigned a new year variable (yearD) and drop year variable?
2)When you specify "from" and "to", what is year10 for? is it a variable that I was supposed to create prior to run proc expand? Also, since we dropped "year" variable in the previous step, is it "yearD" that we need to use for "to"? I am trying to understand your code and it would be great if you can elaborate on this.
Thank you!
Markov
I think it would be good for you to look at the documentation for PROC EXPAND.
Thanks DN, I am looking into the documents for proc expand for linear interpolation
from=year10 means from 10 year intervals, to=year means convert to 1 year intervals. yearD are proper SAS dates, which is what is expected by proc Expand. You can see the effect of using SAS dates in the interpolation which appears slightly irregular. These small irregularities are the consequence of leap years.
PG
Since the values are most likely rounded, I would be very hesitant about keeping additional digits implying more precision than the data might actually allow.
Can you explain why you're doing this?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.