BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Markov
Calcite | Level 5

Dear who can help this out;

I have state level socioeconomic data set from census, income, race ratio, marriage ratio, education level etc. It's in each 10 year period (1980, 1990, 2000). I want to linearly interpolate each variable values for the years in between (values for 1981~1989, 1991~1999).

Data structure now looks like this

State     year   income race ratio marriage ratio education level

Alabama  1980 10000     0.40          0.52              0.34

Alabama  1990 12000     0.44          0.50              0.32

Alabama  2000 13000     0.48          0.58              0.39

California 1980 15000     0.20          0.63              0.42

California 1990 16000     0.25          0.66              0.43

California 2000 16500     0.30          0.69              0.47

Any idea on this particular procedure with this set of data would be appreciated!

Thank you,

MB

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could use this code which is a bit more general:


data have2;
length state $16;
input State     year   income race_ratio marriage_ratio education_level;
datalines;
Alabama  1980 10000     0.40          0.52              0.34
Alabama  1990 12000     0.44          0.50              0.32
Alabama  2000 13000     0.48          0.58              0.39
California 1980 15000     0.20          0.63              0.42
California 1990 16000     0.25          0.66              0.43
California 2000 16500     0.30          0.69              0.47
;

data want2;
set have2; by state notsorted;
array prev{5} _temporary_; array actual{5} _temporary_;
array val{5} year income race_ratio marriage_ratio education_level;
do i = 1 to dim(val);
     actual{i} = val{i};
     prev{i} = lag(val{i});
     end;
if first.state then output;
else do;
     do year = prev{1}+1 to actual{1};
          do i = 2 to dim(val);
               val{i} = ((actual{1}-year)*prev{i} + (year-prev{1})*actual{i}) / (actual{1}-prev{1});
               end;
          output;
          end;
     end;
drop i;
run;

proc print data=want2 noobs; run;

PG

PG

View solution in original post

18 REPLIES 18
RichardinOz
Quartz | Level 8

Proc transpose data = want out = step1

          prefix = 'yr_'

          ;

     By     State ;

     id     Year ;

     Var _Numeric_ ;

run ;


/*     you may get a note that year is both an id and analysis variable */


Data step2 ;

     Array annual {*}  yr_1980 - yr_2000 ;

     set step1 ;

     Do k = 1 to 9 ;

          annual (k + 1) = (k * yr_1980 + (10 - k) * yr_1990) / 9 ;

          annual (k + 11) = (k * yr_1990 + (10 - k) * yr_2000) / 9 ;

     end ;

     Drop k ;

Run ;


Proc Transpose data = step2

               out = want (drop = _name_)

               ;

     By      State ;

Run ;

Alert! untested code.  You might have to have another datastep to strip year out of _Name_ in want (in which case do not drop it in the transpose!

Richard in NZ again

Message was edited by: Richard Carson Changed divisor to 9 from 10. 

Markov
Calcite | Level 5

Thank you Richard for your detailed explanation.

My question is that it seems like the order of k and (10-k) in the following equation matter:

          annual (k + 1) = (k * yr_1980 + (10 - k) * yr_1990) / 9 ;

          annual (k + 11) = (k * yr_1990 + (10 - k) * yr_2000) / 9 ;

For instance, if the variables have increased over the decades, the above formula gives me a decreasing patter in the values for the variable.

If age is 30 at 1980 and 34 in 1990, then it's rational to see values for 1981-1985 to be smaller than values for 1986-1989. But the above code gives me the opposite. Any thoughts would be helpful!

Thank you!

Markov


PGStats
Opal | Level 21

You could use this code which is a bit more general:


data have2;
length state $16;
input State     year   income race_ratio marriage_ratio education_level;
datalines;
Alabama  1980 10000     0.40          0.52              0.34
Alabama  1990 12000     0.44          0.50              0.32
Alabama  2000 13000     0.48          0.58              0.39
California 1980 15000     0.20          0.63              0.42
California 1990 16000     0.25          0.66              0.43
California 2000 16500     0.30          0.69              0.47
;

data want2;
set have2; by state notsorted;
array prev{5} _temporary_; array actual{5} _temporary_;
array val{5} year income race_ratio marriage_ratio education_level;
do i = 1 to dim(val);
     actual{i} = val{i};
     prev{i} = lag(val{i});
     end;
if first.state then output;
else do;
     do year = prev{1}+1 to actual{1};
          do i = 2 to dim(val);
               val{i} = ((actual{1}-year)*prev{i} + (year-prev{1})*actual{i}) / (actual{1}-prev{1});
               end;
          output;
          end;
     end;
drop i;
run;

proc print data=want2 noobs; run;

PG

PG
Markov
Calcite | Level 5

Thank you very much PG!

RichardinOz
Quartz | Level 8

@Markov

Obviously if you have access to Proc Expand that is the best toool to use.

OK, I did warn you that mycode was untested!  Only 3 errors:  Prefix did not do what I wanted, multipliers in the wrong order and the divisor should have been 10 as I had entered it at first.  Here is the tested (v9.3) code:

options validvarname=v7 ;
Proc transpose data = have out = step1
          ;
     By     State ;
     id     Year ;
     Var _Numeric_ ;
run ;

Data step2 ;
     Array annual {*}  _1980 - _2000 ;
     set step1 ;
     Do k = 1 to 9 ;
          annual (k + 1) = ((10 - k) * _1980 + k * _1990) / 10 ;
          annual (k + 11) = ((10 - k) * _1990 + k * _2000) / 10 ;
     end ;
     Drop k ;
Run ;

Proc Transpose data = step2
               out = want (drop = _name_)
               ;
     By      State ;
Run ;

Richard in NZ

Markov
Calcite | Level 5

Thank you very much Richard! Your code is very helpful in showing me the logic behind the procedure.

Best,

Markov

data_null__
Jade | Level 19

If you have PROC EXPAND it will expand your decade data into years and allow you to compute various types of interpolated values.  I would try to make example but I don't have SAS/ETS.

PGStats
Opal | Level 21

You can implement DN's suggestion as follows:

data have;
length state $16;
input State     year   income race_ratio marriage_ratio education_level;
yearD = mdy(1, 1, year);
drop year;
datalines;
Alabama  1980 10000     0.40          0.52              0.34
Alabama  1990 12000     0.44          0.50              0.32
Alabama  2000 13000     0.48          0.58              0.39
California 1980 15000     0.20          0.63              0.42
California 1990 16000     0.25          0.66              0.43
California 2000 16500     0.30          0.69              0.47
;

proc expand data=have out=want from=year10 to=year method=join;
by state notsorted;
id yearD;
run;

proc print data=want noobs; run;

PG

PG
Markov
Calcite | Level 5

Thank you PG for your explanation.

My few questions on your code would be:

1) is there a particular reason you assigned a new year variable (yearD) and drop year variable?

2)When you specify "from" and "to", what is year10 for? is it a variable that I was supposed to create prior to run proc expand? Also, since we dropped "year" variable in the previous step, is it "yearD" that we need to use for "to"? I am trying to understand your code and it would be great if you can elaborate on this.

Thank you!

Markov

data_null__
Jade | Level 19

I think it would be good for you to look at the documentation for PROC EXPAND.

Markov
Calcite | Level 5

Thanks DN, I am looking into the documents for proc expand for linear interpolation Smiley Happy

PGStats
Opal | Level 21

from=year10 means from 10 year intervals,  to=year means convert to 1 year intervals. yearD are proper SAS dates, which is what is expected by proc Expand. You can see the effect of using SAS dates in the interpolation which appears slightly irregular. These small irregularities are the consequence of leap years.

PG

PG
ballardw
Super User

Since the values are most likely rounded, I would be very hesitant about keeping additional digits implying more precision than the data might actually allow.

Reeza
Super User

Can you explain why you're doing this?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 3004 views
  • 7 likes
  • 7 in conversation