DATA Step, Macro, Functions and more

Replace NULL values with most recent value

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Replace NULL values with most recent value

Hi,

 

I have data in the following format

 

policy_id,lst_renew_date,small_area

101,20180507,00011

102,20160508,00022

102,20170508,NULL

102,20180509,00022

103,20160510,00033

103,20170511,00033

103,20180511,NULL

 

I want to clean the data so that every policy_id has a small_area code where possible.

 

How could I go about creating replacing the NULL values for small_area with the most recent small_area code for that that policy_id?

 

Regards

Finbar


Accepted Solutions
Solution
2 weeks ago
Super Contributor
Posts: 348

Re: Replace NULL values with most recent value

Posted in reply to finbar_gillen

Hi,

 

To assist, I have created a data step that anyone can run to create the test data, this includes some extra rows (policies 105-107) some with contiguous NULL values and one policy with no NULL values at all. The second data step process the data in the manner described so far, I believe.

 

data have;
   infile datalines dsd;

   input policy_id      : $3.
         lst_renew_date : yymmdd10.
         small_area     : $5.
   ;

   datalines;
101,20180507,00011
102,20160508,00022
102,20170508,NULL
102,20180509,00022
103,20160510,00033
103,20170511,00033
103,20180511,NULL
104,20170511,00044
104,20170512,NULL
104,20170513,NULL
104,20170514,NULL
104,20170515,00044
105,20170511,NULL
105,20170512,NULL
105,20170513,00055
105,20170514,NULL
106,20170511,NULL
106,20170512,NULL
107,20170511,00077
107,20170512,00077
;


data want;
   set have;
   by policy_id;

   retain saved_small_area '     ';

   if first.policy_id then
      /* at start of a policy group reset saved value */
      saved_small_area = small_area;
   else
      /* otherwise save a non-NULL value if possible */
      if     saved_small_area eq 'NULL'
         and small_area       ne 'NULL'
      then
         saved_small_area = small_area;

   /* replace a NULL small_area if possible */
   if     small_area       eq 'NULL'
      and saved_small_area ne 'NULL'
   then
      small_area = saved_small_area;
run;

 

 

If there are more scenarios of data that need to be catered for then try to represent them all in a data step we can all run (like the first step above), otherwise we might just be making incorrect assumptions.

 

HTH.

 

 

Regards,

Amir.

View solution in original post


All Replies
Respected Advisor
Posts: 3,294

Re: Replace NULL values with most recent value

Posted in reply to finbar_gillen

You could use the LAG function to determine the previous value and insert it in place on NULL. Assumes the data is properly sorted by POLICY_ID. 

 

UNTESTED CODE

 

data want;
    set have;
    by policy_id;
    prev_small_area=lag(small_area);
    if not first.policy_id and small_area='NULL' then small_area=prev_small_area;
run;
--
Paige Miller
New Contributor
Posts: 4

Re: Replace NULL values with most recent value

Posted in reply to PaigeMiller
Would I need to sort this by policy_id and lst_renew_date? As I want to select the most recent small_area.
Respected Advisor
Posts: 3,294

Re: Replace NULL values with most recent value

Posted in reply to finbar_gillen

@finbar_gillen wrote:
Would I need to sort this by policy_id and lst_renew_date? As I want to select the most recent small_area.

Sounds correct to me.

--
Paige Miller
Super Contributor
Posts: 348

Re: Replace NULL values with most recent value

Posted in reply to finbar_gillen

Hi,

 

Although your sample data does not show it, consider if you need to handle 2 or more contiguous rows that have NULL in small_area.

 

Regards,

Amir.

 

New Contributor
Posts: 4

Re: Replace NULL values with most recent value

[ Edited ]

Hi Amir,

 

Yeah, I do have cases lke that. What I would need is for the code to look for the first instance where there is a non Null value. 

 

Could an if statement be added to the above code to ignore where the value is a null?

 

Regards

Finbar

Solution
2 weeks ago
Super Contributor
Posts: 348

Re: Replace NULL values with most recent value

Posted in reply to finbar_gillen

Hi,

 

To assist, I have created a data step that anyone can run to create the test data, this includes some extra rows (policies 105-107) some with contiguous NULL values and one policy with no NULL values at all. The second data step process the data in the manner described so far, I believe.

 

data have;
   infile datalines dsd;

   input policy_id      : $3.
         lst_renew_date : yymmdd10.
         small_area     : $5.
   ;

   datalines;
101,20180507,00011
102,20160508,00022
102,20170508,NULL
102,20180509,00022
103,20160510,00033
103,20170511,00033
103,20180511,NULL
104,20170511,00044
104,20170512,NULL
104,20170513,NULL
104,20170514,NULL
104,20170515,00044
105,20170511,NULL
105,20170512,NULL
105,20170513,00055
105,20170514,NULL
106,20170511,NULL
106,20170512,NULL
107,20170511,00077
107,20170512,00077
;


data want;
   set have;
   by policy_id;

   retain saved_small_area '     ';

   if first.policy_id then
      /* at start of a policy group reset saved value */
      saved_small_area = small_area;
   else
      /* otherwise save a non-NULL value if possible */
      if     saved_small_area eq 'NULL'
         and small_area       ne 'NULL'
      then
         saved_small_area = small_area;

   /* replace a NULL small_area if possible */
   if     small_area       eq 'NULL'
      and saved_small_area ne 'NULL'
   then
      small_area = saved_small_area;
run;

 

 

If there are more scenarios of data that need to be catered for then try to represent them all in a data step we can all run (like the first step above), otherwise we might just be making incorrect assumptions.

 

HTH.

 

 

Regards,

Amir.

New Contributor
Posts: 4

Re: Replace NULL values with most recent value

Hi Amir,

 

Thanks for that, I also found this link which seems to give me a good output:

 

https://stackoverflow.com/questions/42961591/filling-in-missing-values-with-forward-backward-method-...

 

Thanks,

Finbar

Super Contributor
Posts: 348

Re: Replace NULL values with most recent value

Posted in reply to finbar_gillen

Hi,

 

Well done on continuing your search, the best solution is not necessarily the first or second one.

 

Yes the solution looks workable (I haven't tested it myself) and if you don't know it, it introduces the DOW loop technique which can be very useful. I found the following useful in this regard:

 

http://support.sas.com/resources/papers/proceedings09/038-2009.pdf

 

I would advise you compare the performance of any solutions you come across, as, for example, in the solution you provided a link to, the data is being read twice (there are two set statements).

 

 

Regards,

Amir.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 106 views
  • 0 likes
  • 3 in conversation