how to impute missing records

Accepted Solution Solved
Reply
Contributor
Posts: 56
Accepted Solution

how to impute missing records

Hi,

Would like to know how to impute missing values for missing records using SAS datestep. Suppose if I have a dataset:

Id                   Date                   Value

N11               Oct 1,2012            300

N11               Oct 2.2012            200

N11               Oct 4,2012            250

N11                Oct 6,2012           300

/* the record for id N11 for the date Oct 5th,2012 is missing as oct 6th follows oct 4th. I want to impute value of Oct 4,2012 for Oct5th,2012.and get record as N11    Oct 5,2012    250.

Any ideas please?



Thanks,

Andy


Accepted Solutions
Solution
‎10-07-2012 12:49 PM
Super Contributor
Posts: 1,636

Re: how to impute missing records

Hi Art,

I modified your code in case there are more then one ID. Thanks!

data have;

  input Id $  :date: mmddyy10.   Value;

  format date mmddyy10.;

  cards;

N11               10/01/2012          300

N11               10/03/2012          200

N11               10/06/2012          250

N11               10/07/2012          300

N12               11/01/2012          300

N12               11/02/2012          200

N12               11/04/2012          250

N12               11/05/2012          300

;

data want (keep=id date value);

  set have;

  by id;

  retain last_date last_value;

  hold_date=date;

  hold_value=value;

  if not first.id then do;

    if date ne last_date+1 then do date=last_date+1 to hold_date-1;

      value=last_value;

      output;

    end;

  end;

  value=hold_value;

  last_date=date;

  last_value=value;

  output;

run;

proc print;run;

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: how to impute missing records

What about Oct 3rd?  Do you just want the last non-missing value in all cases?

Contributor
Posts: 56

Re: how to impute missing records

OOPS, Gosh, Glad you noticed oct 3rd is missing too. Right, generally thats the catch. How to identify first the ones that are missing in the continuous dates and then using above logic to impute them. Well I guess a lag function ordered by date should do in my opinion , but my hands seem to be tied on how to get it done.Smiley Sad

Another possibility:

lets assume oct 3, oct 4 oct 5 are missing. Here i wanna impute oct 2 value for those 3. Basically the previous value needs to be imputed to all following continuous missing ones.

Frequent Contributor
Posts: 88

Re: how to impute missing records

Hi Andy,

i guess you wanted to insert recore between date oct 4,2012 and oct 6,2012. if this is what you want then see below.

using Proc sql insert query you can do this since you don't have any Primary key assign to variable nor you have any unique variable so i choose below way,

1) Use proc sql to update last query set value=0 where date = oct 6,2012 so the dataset will be like below

id                   Date                   Value

N11               Oct 1,2012            300

N11               Oct 2.2012            200

N11               Oct 4,2012            250

N11                Oct 6,2012           0

2) Now update again use Proc sql update statement set date=oct 5,2012 where value =0 so the dataset now look like below

id                   Date                   Value

N11               Oct 1,2012            300

N11               Oct 2.2012            200

N11               Oct 4,2012            250

N11                Oct 5,2012           0

3)Now again use proc sql update statement set value=250 where date= Oct 5,2012 so the dataset now look like below

d                   Date                   Value

N11               Oct 1,2012            300

N11               Oct 2.2012            200

N11               Oct 4,2012            250

N11                Oct 5,2012           250

4) Now simply use the Proc sql Insert Query with insert into datasetname values (N11 'Oct6,2012' 300); so this is what you will get you wanted

d                   Date                   Value

N11               Oct 1,2012            300

N11               Oct 2.2012            200

N11               Oct 4,2012            250

N11                Oct 5,2012           250

N11                Oct 6,2012           300

I believe this is very not feasible way to do that if you have 10000 observation , but this is what i got since you don't have any unique variable. one more thing i know that obs is created automatically when you create dataset and its unique but i don't know wheather you can use that obs value in SAS code or in proc sql.

Thanks and Regards,

Tushar J.

PROC Star
Posts: 7,363

Re: how to impute missing records

Conversely, you could do everything using the retain statement.  e.g.:

data want (keep=id date value);

  set have;

  retain last_date last_value;

  hold_date=date;

  hold_value=value;

  if _n_ gt 1 then do;

    if date ne last_date+1 then do date=last_date+1 to hold_date-1;

      value=last_value;

      output;

    end;

  end;

  value=hold_value;

  last_date=date;

  last_value=value;

  output;

run;

Solution
‎10-07-2012 12:49 PM
Super Contributor
Posts: 1,636

Re: how to impute missing records

Hi Art,

I modified your code in case there are more then one ID. Thanks!

data have;

  input Id $  :date: mmddyy10.   Value;

  format date mmddyy10.;

  cards;

N11               10/01/2012          300

N11               10/03/2012          200

N11               10/06/2012          250

N11               10/07/2012          300

N12               11/01/2012          300

N12               11/02/2012          200

N12               11/04/2012          250

N12               11/05/2012          300

;

data want (keep=id date value);

  set have;

  by id;

  retain last_date last_value;

  hold_date=date;

  hold_value=value;

  if not first.id then do;

    if date ne last_date+1 then do date=last_date+1 to hold_date-1;

      value=last_value;

      output;

    end;

  end;

  value=hold_value;

  last_date=date;

  last_value=value;

  output;

run;

proc print;run;

Frequent Contributor
Posts: 88

Re: how to impute missing records

Hi Art,

Thanks for this retain logic.

Respected Advisor
Posts: 3,124

Re: how to impute missing records

Hi,

I vaguely remember that the similar discussion has been made before. Lag() maybe useful if you only have fixed day gaps, so you know the 'n' value of lagn() ahead of the time  that you can apply them accordingly. On top of that, there is danger when you use lag() conditionally, so please search it for details.

In general, if you have SAS/ETS, then proc timeseries is built for this kind of task; if you however don't have it, you can always rely on data step:

(if your raw is just like the one you provided, then you would need _infile_ to read it in)

data have;

input;

id=scan(_infile_,1);

date=input(cats(scan(_infile_,3),scan(_infile_,2),scan(_infile_,4)),date9.);

value=input(scan(_infile_,5),best.);

format date date9.;

cards;

N11               Oct 1,2012            300

N11               Oct 2.2012            200

N11               Oct 4,2012            250

N11                Oct 6,2012           300

;

/*ETS*/

proc timeseries data=have out=want;

id date interval=day setmissing=previous;

var value;

by id;

run;

/*Data step*/

data want;

set have;

by id;

set have (firstobs=2 keep=date rename=date=_date)

    have (obs=1 drop=_all_);

if last.id then call missing(_date);

output;

do while (intck('day',date,_date)>1) ;

     date=intnx('day',date,1);

     output;

end;

drop _date;

run;

Haikuo

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 744 views
  • 9 likes
  • 5 in conversation