BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mspak
Quartz | Level 8

Dear all,

 

I would liek to perform interpolation for my data attached. 

Identification code = FIPS

Time period = Year

7 Variables of interest: assn nccs pvote respn sc_score  stdsc_score

Available data for year 1990, 1997, 2005 and 2009.

Missing data: 1991 - 1996; 1998 - 2004; 2006 - 2008

 

I would like to linearly interpolate the missing data.

 

Anyone can help?

 

Thanks.

 

Regards,

MSPAK

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If you have SAS/ETS then proc expand is the way to go (method=join added in subsequent edit) :

 

data have2;
  set have;
  date=mdy(12,31,year);
  format date date9.;
run;

proc expand data=have2 out=new from=year  method=join;
  by fips;
  id date;
  convert assn nccs pvote respn sc_score std_scscore;
run;

 

 

If you don't then a judicious use of lags and arrays will work.  The program below assumes that you have data which uniformly has exactly the same missing pattern for all six (I did not see seven) variables. So if ASSN is missing, so are the other 5, and vice versa.

 

The technique here is to alway have available the curent (in array CURD) and preceding data (array LAGD). The progam also assumes that you are only interpolating and never extrapolating (i.e. you always have the first year and last year not missig).

 

data want (drop=cury lagy v coef);
  set have ;
  where assn^=.;
  by fips;

  array d{*}    assn nccs pvote respn sc_score std_scscore ;
  array curd{6} _temporary_;
  array lagd{6} _temporary_;

  cury=year;
  lagy=lag(year);

  do v=1 to dim(d);
    curd{v}=d{v};
    lagd{v}=lag(curd{v});
  end;

  if first.fips then output;
  else do year=lagy+1 to cury;
    coef=(year-lagy)/(cury-lagy);
    do v=1 to dim(d); 
      d{v}=lagd{v} + coef*(curd{v}-lagd{v});
    end;
    output;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

If you have SAS/ETS then proc expand is the way to go (method=join added in subsequent edit) :

 

data have2;
  set have;
  date=mdy(12,31,year);
  format date date9.;
run;

proc expand data=have2 out=new from=year  method=join;
  by fips;
  id date;
  convert assn nccs pvote respn sc_score std_scscore;
run;

 

 

If you don't then a judicious use of lags and arrays will work.  The program below assumes that you have data which uniformly has exactly the same missing pattern for all six (I did not see seven) variables. So if ASSN is missing, so are the other 5, and vice versa.

 

The technique here is to alway have available the curent (in array CURD) and preceding data (array LAGD). The progam also assumes that you are only interpolating and never extrapolating (i.e. you always have the first year and last year not missig).

 

data want (drop=cury lagy v coef);
  set have ;
  where assn^=.;
  by fips;

  array d{*}    assn nccs pvote respn sc_score std_scscore ;
  array curd{6} _temporary_;
  array lagd{6} _temporary_;

  cury=year;
  lagy=lag(year);

  do v=1 to dim(d);
    curd{v}=d{v};
    lagd{v}=lag(curd{v});
  end;

  if first.fips then output;
  else do year=lagy+1 to cury;
    coef=(year-lagy)/(cury-lagy);
    do v=1 to dim(d); 
      d{v}=lagd{v} + coef*(curd{v}-lagd{v});
    end;
    output;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mspak
Quartz | Level 8
Thank you for the solution. Both of the programs work well. However, the imputed figures as a result of linear interpolation is slightly different.
mkeintz
PROC Star

I forgot that the default interpolation method for PROC EXPAND is cubic spline.  I think using METHOD=JOIN will do linear interpolation. According to the proc expand documentation:

 

The JOIN Method

The JOIN method fits a continuous curve to the data by connecting successive straight line segments. For point-in-time data, the JOIN method connects successive nonmissing input values with straight lines. For interval total or average data, interval midpoints are used as the break points, and ordinates are chosen so that the integrals of the piecewise linear curve agree with the input totals.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2210 views
  • 2 likes
  • 2 in conversation