fill in missing, conditional

Reply
Occasional Contributor
Posts: 19

fill in missing, conditional

Hello,

I have a (another) question on filling in missing values.

I have a dataset that looks like this:

data have;

infile cards missover;

input scrssn :$11. year;

cards;

111-22-3333 2008

111-22-3333 2009

111-22-3333 .

111-22-3333 .

222-33-4444 .

222-33-4444 2009

222-33-4444 .

222-33-4444 2011

333-44-5555 .

333-44-5555 .

333-44-5555 2010

333-44-5555 .

;

run;

I want to fill in the year variable where missing (by scrssn), but only if the missing is in-between two non-missing.  So in the above example, the only year variable that would be filled in would be "2010" for 222-33-4444.  I know how to use the RETAIN statement to carry the previous value but I don't know how to put this type of restriction/condition ("only do when in-between non-missing") on the logic...

Any ideas would be so helpful--thank you!

Kelly    

Super Contributor
Posts: 275

Re: fill in missing, conditional

data want;

   merge have have(firstobs=2 rename=(scrssn=_scrssn year=_year));

   retain temp;

   if missing(_year) then temp=year;

   if scrssn=_scrssn and missing(year) then year=(temp+_year)/2;

   drop temp _:;

run;

Respected Advisor
Posts: 3,775

Re: fill in missing, conditional

You can use FIRST and LAST to tell you when missing YEAR is surrounded.

data fill;
   set have;
   by scrssn year notsorted;
  
if first.scrssn then locf = .;
  
if not(first.scrssn or last.scrssn) and missing(year) and first.year and last.year then year=locf;
   locf = year;
  
retain locf;
   drop locf;
   run;

Capture.PNG
Respected Advisor
Posts: 4,998

Re: fill in missing, conditional

A trickier problem than it appears at first ... here's one way.  Assuming that you have a SAS data set already, and it is properly sorted:

data want;

  n=0;

  do until (last.scrssn);

     set have;

     by scrssn;

     n + 1;

     if year > . then do;

        if first_nonmissing = . then first_nonmissing = n;

        last_nonmissing = n;

     end;

  end;

  n=0;

  do until (last.scrssn);

     set have;

     by scrssn;

     n + 1;

     if year > . then last_year_found = year;

     else if (first_nonmissing < n < last_nonmissing) then do;

        year = last_year_found + 1;

        last_year_found = year;

     end;

     output;

  end;

  drop first_nonmissing last_nonmissing last_year_found;

run;

It's untested code, so you may need to debug it a little.  You might want to omit the DROP statement at first, to get a feel for what it's trying to do.

Good luck.

Respected Advisor
Posts: 3,124

Re: fill in missing, conditional

At first, it seems to be a simple data manipulation until all things considered. The basic idea is straightforward while the implementation is not.  The following code is trying to exhaust most of the scenarios which your real data may encounter, for that purpose, your raw inputs have been modified.

Key notes about the code:

  1. Hash table will be populated only if ‘year’ is missing, and ‘year’ is increased starting from a ‘_beg’ value, which will be the last non-missing ‘year’.
  2. When ‘year’ is not missing, conditions are set to determine whether the hash table needs to be download and output. If it is a download, another condition needs to check to determine if ‘year’ is populated before output.
  3. The Hash table will be reset (h.clear()) after each download.

data have;

     infile cards missover;

     input scrssn :$11. year;

     cards;

111-22-3333 2008

111-22-3333 2009

111-22-3333 .

111-22-3333 .

222-33-4444 .

222-33-4444 2009

222-33-4444 .

222-33-4444 .

222-33-4444 2012

222-33-4444 .

222-33-4444 .

222-33-4444 2015

222-33-4444 .

222-33-4444 .

333-44-5555 .

333-44-5555 .

333-44-5555 2010

333-44-5555 .

;

run;

data want;

     if _n_=1 then

           do;

                declare hash h(ordered:'a');

                h.definekey('year');

                h.definedata('year');

                h.definedone();

                declare hiter hi('h');

           end;

     do until (last.year);

           set have;

           by scrssn year notsorted;

           if missing (year) then

                do;

                     _beg+1;

                     year=_beg;

                     _rc=h.replace();

                     if last.scrssn then

                           do;

                                do _rc=hi.first() by 0 while (_rc=0);

                                     call missing (year);

                                     output;

                                     _rc=hi.next();

                                end;

                                _rc=h.clear();

                          end;

                end;

           else

                do;

                     if h.num_items>0 then

                           do;

                                if year=_beg+1 then

                                     do;

                                           _beg=year;

                                           do _rc=hi.first() by 0 while (_rc=0);

                                                output;

                                                _rc=hi.next();

                                           end;

                                     end;

                                else

                                     do;

                                           _beg=year;

                                           do _rc=hi.first() by 0 while (_rc=0);

                                                call missing (year);

                                                output;

                                                _rc=hi.next();

                                           end;

                                     end;

                                _rc=h.clear();

                                year=_beg;

                                output;

                           end;

                     else

                           do;

                                output;

                                _beg=year;

                           end;

                end;

     end;

     drop _:;

run;

Occasional Contributor
Posts: 19

Re: fill in missing, conditional

Thank you all for your suggestions!  I agree, I thought it would be an easy coding problem until I actually sat down and tried to code it.  I will play with your codes--I found a long winded way of doing it which involved creating an indicator variable =1 if year was not missing and then using that in a retain statement (first sorting by 'scrssn year', then 'scrssn descending year') to create two additional flags.  then I used a logic statement to flag the records that should be kept/dropped.  Similar to this: http://www.ats.ucla.edu/stat/sas/code/fillin_missing.htm

Anyway, it works, but it's not very elegant.  Thanks again for your help and I am looking forward to playing with the alternate/suggested codes above.  

Grand Advisor
Posts: 9,594

Re: fill in missing, conditional

Every interestng Question.

 
data have;
infile cards missover;
input scrssn :$11. year;
cards;
111-22-3333 2008
111-22-3333 2009
111-22-3333 .
111-22-3333 .
222-33-4444 .
222-33-4444 2009
222-33-4444 .
222-33-4444 2011
333-44-5555 .
333-44-5555 .
333-44-5555 2010
333-44-5555 .
;
run;
data want;
length _scrssn  $ 40;
retain _scrssn  _y ;
 do until(not missing(year) or last.scrssn);
  set have ;
  by scrssn ;
 end;
 _year=year; n=0;
 do until(not missing(year) or last.scrssn);
  set have ;
  by scrssn ;
  if _scrssn=scrssn and not missing(_year) then do;
    n+1; y=_y+n; 
  end;
   else y=year;
  output;
 end;
_scrssn=scrssn; _y=year;
drop _: year n; 
run;
 



Xia Keshan

消息编辑者为:xia keshan

Ask a Question
Discussion stats
  • 6 replies
  • 389 views
  • 6 likes
  • 6 in conversation