DATA Step, Macro, Functions and more

backfill some observations

Reply
Frequent Contributor
Posts: 122

backfill some observations

There are missing values in my data set, and I want to backfill some missing values. See the example below

 

year     firm     value

2000     A          1

2001     A          .

2002     A          .

2003     A          9

2000     B          2

2001     B          .

2002     B          3

2003     B          1

 

I only want to backfill ONE year, after manipulation, I would like to get the results below.

 

 

year     firm     value

2000     A          1

2001     A          .

2002     A          9

2003     A          9

2000     B          2

2001     B          3

2002     B          3

2003     B          1

 

I actually know how to backfill values, but it will backfill all missing values. I want to know how to apply the criterion that only one year's data will be backfilled. Can any one help me with this? Thanks.

Super User
Posts: 17,750

Re: backfill some observations

In a roundabout way:

 

data have;
    input year firm  $   value;
    cards;
2000     A          1
2001     A          .
2002     A          .
2003     A          9
2000     B          2
2001     B          .
2002     B          3
2003     B          1
;
run;

proc sort data=have;
    by firm descending year;
run;

data want;
    set have;
    by firm;
    prev_value=lag(value);

    if first.firm then
        prev_value=.;

    if value=. then
        new_value=prev_value;
    else
        new_value=value;
run;

proc sort data=want;
    by firm year;
run;
Frequent Contributor
Posts: 122

Re: backfill some observations

Thank you. But how can this way apply the year screening criterion? I only want to backfill three years of miss data.

Super User
Posts: 17,750

Re: backfill some observations


SeanZ wrote:

Thank you. But how can this way apply the year screening criterion? I only want to backfill three years of miss data.


Then why did you ask for one year?

 

Please post data that reflects your new question.

Frequent Contributor
Posts: 122

Re: backfill some observations

Thank you for your reply. I paste a new data set to reflect my question. I would like to backfill N years of missing data. In this case, I would like to set N=2. Eventually, I hope to control and set different N.

 

year     firm     value

2000     A          1

2001     A          .

2002     A          .

2003     A          .

2004     A          9

2000     B          2

2001     B          .

2002     B          3

2003     B          .

2004     B          .

2005     B          2

 

The data I want

 

year     firm     value

2000     A          1

2001     A          .

2002     A          9

2003     A          9

2004     A          9

2000     B          2

2001     B          3

2002     B          3

2003     B          2

2004     B          2

2005     B          2

 

Thanks.

Super User
Posts: 17,750

Re: backfill some observations

Slight variation, change the 2 to whatever number you want. Idea is basic, add a counter to determine how many records you've backfilled and then use that in your condition.

 

data have;
    input year firm  $   value;
    cards;
2000     A          1
2001     A          .
2002     A          .
2003     A          .
2004     A          9
2000     B          2
2001     B          .
2002     B          3
2003     B          .
2004     B          .
2005     B          2
;
run;

proc sort data=have;
    by firm descending year;
run;



data want;
    set have;
    by firm;
    retain new_value backfill;

    if first.firm then do;
       backfill=0;
        new_value=.;
    end;

    if value=. then do;
        backfill+1;
        if backfill > 2 then new_value=.;
    end;
    else do; 
        backfill=0;
        new_value=value;
    end;
run;

proc sort data=want;
    by firm year;
run;
Respected Advisor
Posts: 4,641

Re: backfill some observations

Here is my take on it:

 

data have;
input year firm $ value;
datalines;
2000     A          1
2001     A          .
2002     A          .
2003     A          .
2004     A          9
2000     B          2
2001     B          .
2002     B          3
2003     B          .
2004     B          .
2005     B          2
;

proc sort data=have; by firm descending year; run;

%let n=2; /* The number of years to fill in */

data want(sortedby=firm descending year);
y = 99999;
do until(last.firm);
    set have; by firm;
    if missing(value) then do;
        if year + &n. >= y then value = v; 
        end;
    else do;
        y = year;
        v = value;
        end;
    output;
    end;
drop v y;
run;

proc sort data=want; by firm year; run;

proc print; run;

Note that by using the year in the replacement condition, omitted or duplicated years should be handled properly. The sortedby= dataset option is an attempt to improve the efficiency of the last sort (I don't know if it makes a real difference).

PG
Respected Advisor
Posts: 4,641

Re: backfill some observations

Or use SQL:

 


proc sql;
create table want as
select a.year, a.firm, coalesce(a.value, b.value) as value
from have as a left join
    have as b on a.firm=b.firm and a.year+1 = b.year
order by firm, year;
quit;
PG
Frequent Contributor
Posts: 122

Re: backfill some observations

Thanks for the solution.  I like this solution very much. But how to extend this to the case that I want to backfill 3 years instead if one?

Regards,

Sean
Super User
Posts: 10,466

Re: backfill some observations


SeanZ wrote:
Thanks for the solution.  I like this solution very much. But how to extend this to the case that I want to backfill 3 years instead if one?

Regards,

Sean

Is your requirement specific numbers of years or "all but the first missing"?

Frequent Contributor
Posts: 122

Re: backfill some observations

I will define the number of years to fill for missing values.

Respected Advisor
Posts: 4,641

Re: backfill some observations

[ Edited ]

Although the SQL method can be extended to fill more than one year, it will not scale well to an arbitrary number of years. The same can be said for most methods presented so far, except for @Reeza's reverse sort approach. 

PG
Super User
Posts: 5,072

Re: backfill some observations

More specifically (but untested), it looks like you could take Reeza's approach and make minor changes.  First, indicate how many years you are willing to backfill:

 

%let n=2;

 

Then change one line in the middle.  Currently, the code contains:

 

if backfill > 2 then new_value = .;

 

Change that to read:

 

if backfill <= &N then value = new_value;

 

I'm not sure if Reeza intended that the proper result would be in VALUE or in NEW_VALUE.  In this modification, the proper value is VALUE, and you can drop NEW_VALUE.

Super User
Posts: 5,072

Re: backfill some observations

Assuming you don't want to backfill across firms, here's one way:

 

data want;

set have;

by firm;

set have (keep=value rename=(value=next_value) firstobs=2) have (drop=_all_);

if last.firm=0 and value=. then value = next_value;

drop next_value;

run;

 

It does replace all missing values.  However, if there are two missing values in a row, the replacement value (for the first one) will also be a missing value.

Ask a Question
Discussion stats
  • 13 replies
  • 296 views
  • 1 like
  • 5 in conversation