BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
finbar_gillen
Calcite | Level 5

Hi,

 

I have data in the following format

 

policy_id,lst_renew_date,small_area

101,20180507,00011

102,20160508,00022

102,20170508,NULL

102,20180509,00022

103,20160510,00033

103,20170511,00033

103,20180511,NULL

 

I want to clean the data so that every policy_id has a small_area code where possible.

 

How could I go about creating replacing the NULL values for small_area with the most recent small_area code for that that policy_id?

 

Regards

Finbar

1 ACCEPTED SOLUTION

Accepted Solutions
Amir
PROC Star

Hi,

 

To assist, I have created a data step that anyone can run to create the test data, this includes some extra rows (policies 105-107) some with contiguous NULL values and one policy with no NULL values at all. The second data step process the data in the manner described so far, I believe.

 

data have;
   infile datalines dsd;

   input policy_id      : $3.
         lst_renew_date : yymmdd10.
         small_area     : $5.
   ;

   datalines;
101,20180507,00011
102,20160508,00022
102,20170508,NULL
102,20180509,00022
103,20160510,00033
103,20170511,00033
103,20180511,NULL
104,20170511,00044
104,20170512,NULL
104,20170513,NULL
104,20170514,NULL
104,20170515,00044
105,20170511,NULL
105,20170512,NULL
105,20170513,00055
105,20170514,NULL
106,20170511,NULL
106,20170512,NULL
107,20170511,00077
107,20170512,00077
;


data want;
   set have;
   by policy_id;

   retain saved_small_area '     ';

   if first.policy_id then
      /* at start of a policy group reset saved value */
      saved_small_area = small_area;
   else
      /* otherwise save a non-NULL value if possible */
      if     saved_small_area eq 'NULL'
         and small_area       ne 'NULL'
      then
         saved_small_area = small_area;

   /* replace a NULL small_area if possible */
   if     small_area       eq 'NULL'
      and saved_small_area ne 'NULL'
   then
      small_area = saved_small_area;
run;

 

 

If there are more scenarios of data that need to be catered for then try to represent them all in a data step we can all run (like the first step above), otherwise we might just be making incorrect assumptions.

 

HTH.

 

 

Regards,

Amir.

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

You could use the LAG function to determine the previous value and insert it in place on NULL. Assumes the data is properly sorted by POLICY_ID. 

 

UNTESTED CODE

 

data want;
    set have;
    by policy_id;
    prev_small_area=lag(small_area);
    if not first.policy_id and small_area='NULL' then small_area=prev_small_area;
run;
--
Paige Miller
finbar_gillen
Calcite | Level 5
Would I need to sort this by policy_id and lst_renew_date? As I want to select the most recent small_area.
PaigeMiller
Diamond | Level 26

@finbar_gillen wrote:
Would I need to sort this by policy_id and lst_renew_date? As I want to select the most recent small_area.

Sounds correct to me.

--
Paige Miller
Amir
PROC Star

Hi,

 

Although your sample data does not show it, consider if you need to handle 2 or more contiguous rows that have NULL in small_area.

 

Regards,

Amir.

 

finbar_gillen
Calcite | Level 5

Hi Amir,

 

Yeah, I do have cases lke that. What I would need is for the code to look for the first instance where there is a non Null value. 

 

Could an if statement be added to the above code to ignore where the value is a null?

 

Regards

Finbar

Amir
PROC Star

Hi,

 

To assist, I have created a data step that anyone can run to create the test data, this includes some extra rows (policies 105-107) some with contiguous NULL values and one policy with no NULL values at all. The second data step process the data in the manner described so far, I believe.

 

data have;
   infile datalines dsd;

   input policy_id      : $3.
         lst_renew_date : yymmdd10.
         small_area     : $5.
   ;

   datalines;
101,20180507,00011
102,20160508,00022
102,20170508,NULL
102,20180509,00022
103,20160510,00033
103,20170511,00033
103,20180511,NULL
104,20170511,00044
104,20170512,NULL
104,20170513,NULL
104,20170514,NULL
104,20170515,00044
105,20170511,NULL
105,20170512,NULL
105,20170513,00055
105,20170514,NULL
106,20170511,NULL
106,20170512,NULL
107,20170511,00077
107,20170512,00077
;


data want;
   set have;
   by policy_id;

   retain saved_small_area '     ';

   if first.policy_id then
      /* at start of a policy group reset saved value */
      saved_small_area = small_area;
   else
      /* otherwise save a non-NULL value if possible */
      if     saved_small_area eq 'NULL'
         and small_area       ne 'NULL'
      then
         saved_small_area = small_area;

   /* replace a NULL small_area if possible */
   if     small_area       eq 'NULL'
      and saved_small_area ne 'NULL'
   then
      small_area = saved_small_area;
run;

 

 

If there are more scenarios of data that need to be catered for then try to represent them all in a data step we can all run (like the first step above), otherwise we might just be making incorrect assumptions.

 

HTH.

 

 

Regards,

Amir.

finbar_gillen
Calcite | Level 5

Hi Amir,

 

Thanks for that, I also found this link which seems to give me a good output:

 

https://stackoverflow.com/questions/42961591/filling-in-missing-values-with-forward-backward-method-...

 

Thanks,

Finbar

Amir
PROC Star

Hi,

 

Well done on continuing your search, the best solution is not necessarily the first or second one.

 

Yes the solution looks workable (I haven't tested it myself) and if you don't know it, it introduces the DOW loop technique which can be very useful. I found the following useful in this regard:

 

http://support.sas.com/resources/papers/proceedings09/038-2009.pdf

 

I would advise you compare the performance of any solutions you come across, as, for example, in the solution you provided a link to, the data is being read twice (there are two set statements).

 

 

Regards,

Amir.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1316 views
  • 0 likes
  • 3 in conversation