BookmarkSubscribeRSS Feed
WendyT
Pyrite | Level 9

Hello SAS folks-

 

I need your help again. I’m working on some daily water level data, and, of course, it has gaps. I’ve been asked to use a simple linear interpolation to fill in the gaps. No problem – we have SAS/ETS, and that’s a simple PROC EXPAND with method=join.

 

proc expand data=have out=want  from=daily  ;
        by station ;
          convert daily_stage=dly_interp / method=join ;
        id date ;
run ;

 

Then it got more complex… I’ve now been asked to interpolate, but only when the gap is less than a month, so I picked 30 days for a starting point.

 

I think the solution may be in transformout=  but so far I can’t see it, and I’ve had no luck searching.

 

The only idea I have had is to do the interpolation first, then use missing() on the original data (found that on the boards here somewhere) both forwards and backwards to find the gaps, then keep the interpolation based on the gap counts.

 

Here’s some example code. It generates some data, then makes missing values for varying periods (have). %gapper sorts the values and does a running count of missing values in a gap. It’s run twice – once forward and again backward in time. The sql puts the forward and backward data back together, and uses a case statement to  'pick' the interpolated value based on the gap counts.

 

The example is written backwards, as column "original" would actually be the interpolated values, and "original_&gapsize" would be the data values, but it's the same process of finding gaps. 

 

Could someone please help me with a better solution? 

 

We are running SAS 9.4 TS1M3 and SAS/ETS 14.1 via remote login to a Windows virtual server.

 

Thanks so much for any help you can give me!

 

WendyT

 

%let gapsize=30 ;
data have (drop=i);
do i= 1 to 2 ;
 if i=1 then station='aaa' ;
   else station='bbb' ;
  date='01JAN2017'd ; 
    original=1 ;
      do day=1 TO 200 ;
        date= date + 1 ;
        original= original + 0.1 ;
        original_&gapsize = original ;
          if day ge  30 and day <  40 then original_&gapsize = . ;
          if day ge  60 and day <  80 then original_&gapsize = . ;
          if day ge  90 and day < 118 then original_&gapsize = . ;
          if day ge 130 and day < 165 then original_&gapsize = . ;
        output ;
      end ;
  end ;
  format date yymmdd10. ;
run ;

%macro gapper(direction,order) ;
proc sort data=have ; by  station &order date ; run ;
  data gapsize_&direction.  ;
    set have ;
      by station original_&gapsize notsorted ;
        if missing(original_&gapsize) then do ;
          if first.original_&gapsize then n_&direction. =0 ;
          n_&direction. +1 ;
        end ;
        else do ; 
           n_&direction. =0 ; 
        end ;
  run ;
%mend ;

%gapper(forward, ) ;
%gapper(backward,descending) ;

proc sql noprint ;
create table want as select
station
,date format=yymmdd10.
,day
,original
,original_&gapsize
,case
   when sum(n_forward,n_backward) gt &gapsize then .
   else original
   end as interp_&gapsize
,n_backward
,n_forward
  from gapsize_forward natural join gapsize_backward 
   order by station, date  ;
quit ;
3 REPLIES 3
Ksharp
Super User
%let gapsize=30 ;
data have (drop=i);
do i= 1 to 2 ;
 if i=1 then station='aaa' ;
   else station='bbb' ;
  date='01JAN2017'd ; 
    original=1 ;
      do day=1 TO 200 ;
        date= date + 1 ;
        original= original + 0.1 ;
        original_&gapsize = original ;
          if day ge  30 and day <  40 then original_&gapsize = . ;
          if day ge  60 and day <  80 then original_&gapsize = . ;
          if day ge  90 and day < 118 then original_&gapsize = . ;
          if day ge 130 and day < 165 then original_&gapsize = . ;
        output ;
      end ;
  end ;
  format date yymmdd10. ;
run ;
data want;

do n=1 by 1 until(last.original_30);
 set have;
 by station original_30 notsorted;
end;

do until(last.original_30);
 set have;
 by station original_30 notsorted;
 output;
end;

run;

proc print;run;

Once you got WANT table , you can use proc expand next.

 

proc expand data=want ...........

 

and use the following to set the value missing.

 

if n gt 30 and original_30=. then  dly_interp=. ; 

WendyT
Pyrite | Level 9

Hi Ksharp!

 

Thanks so much for your code, which works beautifully on the test data I provided.  I've been working with my data and puzzling over values for n where data was present. I finally figured it out:  when the value repeats, n starts counting. 

 

I added another line to the 'if' series to make the data repeat so you can see what I mean.

 

The only thing I could think of is to add another condition in the middle so that the if the data is not missing, then n = 1, but I don't think that's a very good solution.

 

Thanks so much for your help... sorry to ask again.

 

WendyT

 

%let gapsize=30 ;
data have (drop=i);
do i= 1 to 2 ;
 if i=1 then station='aaa' ;
   else station='bbb' ;
  date='01JAN2017'd ; 
    original=1 ;
      do day=1 TO 200 ;
        date= date + 1 ;
        original= original + 0.1 ;
        original_&gapsize = original ;
          if day ge  30 and day <  40 then original_&gapsize = . ;
          if day ge  60 and day <  80 then original_&gapsize = . ;
          if day ge  90 and day < 118 then original_&gapsize = . ;
          if day ge 130 and day < 165 then original_&gapsize = . ;
		  if original_&gapsize ge 1.5 and original_&gapsize le 2.5 then original_&gapsize = 2 ;
        output ;
      end ;
  end ;
  format date yymmdd10. ;
run ;
data want;

do n=1 by 1 until(last.original_30);
 set have;
 by station original_30 notsorted;
end;

if original_30 ne . then n = 1 ;

do until(last.original_30) ;
 set have;
 by station original_30 notsorted;
 output;
end;

run;

proc print;run;

 

 

 

 

 

Ksharp
Super User
There is no need to add one more IF statement.
You can get what you want after getting WANT table.


data want;
 set want;
if n gt 30 and original_30=. then  dly_interp=. ; 
run;



SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1131 views
  • 0 likes
  • 2 in conversation