Solved: Re: Need a help on Linear Interpolation

Markov · Posted 01-06-2014 01:09 AM

Dear who can help this out;

I have state level socioeconomic data set from census, income, race ratio, marriage ratio, education level etc. It's in each 10 year period (1980, 1990, 2000). I want to linearly interpolate each variable values for the years in between (values for 1981~1989, 1991~1999).

Data structure now looks like this

State year income race ratio marriage ratio education level

Alabama 1980 10000 0.40 0.52 0.34

Alabama 1990 12000 0.44 0.50 0.32

Alabama 2000 13000 0.48 0.58 0.39

California 1980 15000 0.20 0.63 0.42

California 1990 16000 0.25 0.66 0.43

California 2000 16500 0.30 0.69 0.47

Any idea on this particular procedure with this set of data would be appreciated!

Thank you,

MB

PGStats · Posted 01-06-2014 05:12 PM

You could use this code which is a bit more general:

data have2;
length state $16;
input State     year   income race_ratio marriage_ratio education_level;
datalines;
Alabama 1980 10000     0.40          0.52              0.34
Alabama 1990 12000     0.44          0.50              0.32
Alabama 2000 13000     0.48          0.58              0.39
California 1980 15000     0.20          0.63              0.42
California 1990 16000     0.25          0.66              0.43
California 2000 16500     0.30          0.69              0.47
;

data want2;
set have2; by state notsorted;
array prev{5} _temporary_; array actual{5} _temporary_;
array val{5} year income race_ratio marriage_ratio education_level;
do i = 1 to dim(val);
     actual{i} = val{i};
     prev{i} = lag(val{i});
     end;
if first.state then output;
else do;
     do year = prev{1}+1 to actual{1};
          do i = 2 to dim(val);
               val{i} = ((actual{1}-year)*prev{i} + (year-prev{1})*actual{i}) / (actual{1}-prev{1});
               end;
          output;
          end;
     end;
drop i;
run;

proc print data=want2 noobs; run;

PG

View solution in original post

RichardinOz · Posted 01-06-2014 04:07 AM

Proc transpose data = want out = step1

prefix = 'yr_'

;

By State ;

id Year ;

Var _Numeric_ ;

run ;

/* you may get a note that year is both an id and analysis variable */

Data step2 ;

Array annual {*} yr_1980 - yr_2000 ;

set step1 ;

Do k = 1 to 9 ;

annual (k + 1) = (k * yr_1980 + (10 - k) * yr_1990) / 9 ;

annual (k + 11) = (k * yr_1990 + (10 - k) * yr_2000) / 9 ;

end ;

Drop k ;

Run ;

Proc Transpose data = step2

out = want (drop = _name_)

;

By State ;

Run ;

Alert! untested code. You might have to have another datastep to strip year out of _Name_ in want (in which case do not drop it in the transpose!

Richard in NZ again

Message was edited by: Richard Carson Changed divisor to 9 from 10.

Markov · Posted 01-06-2014 04:31 PM

Thank you Richard for your detailed explanation.

My question is that it seems like the order of k and (10-k) in the following equation matter:

annual (k + 1) = (k * yr_1980 + (10 - k) * yr_1990) / 9 ;

annual (k + 11) = (k * yr_1990 + (10 - k) * yr_2000) / 9 ;

For instance, if the variables have increased over the decades, the above formula gives me a decreasing patter in the values for the variable.

If age is 30 at 1980 and 34 in 1990, then it's rational to see values for 1981-1985 to be smaller than values for 1986-1989. But the above code gives me the opposite. Any thoughts would be helpful!

Thank you!

Markov

PGStats · Posted 01-06-2014 05:12 PM

You could use this code which is a bit more general:

data have2;
length state $16;
input State     year   income race_ratio marriage_ratio education_level;
datalines;
Alabama 1980 10000     0.40          0.52              0.34
Alabama 1990 12000     0.44          0.50              0.32
Alabama 2000 13000     0.48          0.58              0.39
California 1980 15000     0.20          0.63              0.42
California 1990 16000     0.25          0.66              0.43
California 2000 16500     0.30          0.69              0.47
;

data want2;
set have2; by state notsorted;
array prev{5} _temporary_; array actual{5} _temporary_;
array val{5} year income race_ratio marriage_ratio education_level;
do i = 1 to dim(val);
     actual{i} = val{i};
     prev{i} = lag(val{i});
     end;
if first.state then output;
else do;
     do year = prev{1}+1 to actual{1};
          do i = 2 to dim(val);
               val{i} = ((actual{1}-year)*prev{i} + (year-prev{1})*actual{i}) / (actual{1}-prev{1});
               end;
          output;
          end;
     end;
drop i;
run;

proc print data=want2 noobs; run;

PG

Markov · Posted 01-06-2014 06:20 PM

Thank you very much PG!

RichardinOz · Posted 01-06-2014 07:41 PM

@Markov

Obviously if you have access to Proc Expand that is the best toool to use.

OK, I did warn you that mycode was untested! Only 3 errors: Prefix did not do what I wanted, multipliers in the wrong order and the divisor should have been 10 as I had entered it at first. Here is the tested (v9.3) code:

options validvarname=v7 ;
Proc transpose data = have out = step1
          ;
     By     State ;
     id     Year ;
     Var _Numeric_ ;
run ;

Data step2 ;
     Array annual {*} _1980 - _2000 ;
     set step1 ;
     Do k = 1 to 9 ;
          annual (k + 1) = ((10 - k) * _1980 + k * _1990) / 10 ;
          annual (k + 11) = ((10 - k) * _1990 + k * _2000) / 10 ;
     end ;
     Drop k ;
Run ;

Proc Transpose data = step2
               out = want (drop = _name_)
               ;
     By      State ;
Run ;

Richard in NZ

Markov · Posted 01-06-2014 09:29 PM

Thank you very much Richard! Your code is very helpful in showing me the logic behind the procedure.

Best,

Markov

data_null__ · Posted 01-06-2014 07:23 AM

If you have PROC EXPAND it will expand your decade data into years and allow you to compute various types of interpolated values. I would try to make example but I don't have SAS/ETS.

PGStats · Posted 01-06-2014 10:28 AM

You can implement DN's suggestion as follows:

data have;
length state $16;
input State     year   income race_ratio marriage_ratio education_level;
yearD = mdy(1, 1, year);
drop year;
datalines;
Alabama 1980 10000     0.40          0.52              0.34
Alabama 1990 12000     0.44          0.50              0.32
Alabama 2000 13000     0.48          0.58              0.39
California 1980 15000     0.20          0.63              0.42
California 1990 16000     0.25          0.66              0.43
California 2000 16500     0.30          0.69              0.47
;

proc expand data=have out=want from=year10 to=year method=join;
by state notsorted;
id yearD;
run;

proc print data=want noobs; run;

PG

Markov · Posted 01-06-2014 04:35 PM

Thank you PG for your explanation.

My few questions on your code would be:

1) is there a particular reason you assigned a new year variable (yearD) and drop year variable?

2)When you specify "from" and "to", what is year10 for? is it a variable that I was supposed to create prior to run proc expand? Also, since we dropped "year" variable in the previous step, is it "yearD" that we need to use for "to"? I am trying to understand your code and it would be great if you can elaborate on this.

Thank you!

Markov

data_null__ · Posted 01-06-2014 04:41 PM

I think it would be good for you to look at the documentation for PROC EXPAND.

Markov · Posted 01-06-2014 05:01 PM

Thanks DN, I am looking into the documents for proc expand for linear interpolation

PGStats · Posted 01-06-2014 06:24 PM

from=year10 means from 10 year intervals, to=year means convert to 1 year intervals. yearD are proper SAS dates, which is what is expected by proc Expand. You can see the effect of using SAS dates in the interpolation which appears slightly irregular. These small irregularities are the consequence of leap years.

PG

ballardw · Posted 01-06-2014 01:28 PM

Since the values are most likely rounded, I would be very hesitant about keeping additional digits implying more precision than the data might actually allow.

Reeza · Posted 01-06-2014 04:54 PM

Can you explain why you're doing this?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away