DATA Step, Macro, Functions and more

Populating All Missing Variables With the Right Most Non-Missing Value

Accepted Solution Solved
Reply
Super Contributor
Posts: 308
Accepted Solution

Populating All Missing Variables With the Right Most Non-Missing Value

Good Evening All,

 

I have a hierarchy which has holes in it as per the HAVE dataset code below.  I am looking to first maintain the right most populated value whilst filling all blanks to the left (the code attached with the Array does this nicely).  Then I need to find the right most populated record and hold it's value until ORG_LVL_10.  I am struggling with the finding a solution that performs the latter task.  Any help would be greatly appreciated.

 

Thank you for all your help.

 

Regards,

Scott 

 

 

data have;
length org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10 $20.;
infile datalines dlm= ',' dsd missover;
input org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10;

datalines;
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,,,LEVEL10
;
run;


data want;
length org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10 $20.;
infile datalines dlm= ',' dsd missover;
input org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10;

datalines;
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL5,LEVEL5,LEVEL5,LEVEL5,LEVEL5,
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL6,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL6,LEVEL6,LEVEL7,LEVEL10,LEVEL10,LEVEL10
;
run;


DATA test; SET have; ARRAY _ORG $ ORG_LVL_1 - ORG_LVL_10; J=1; DO I=1 TO DIM(_ORG); PUT I = J = _ORG(I) = _ORG(J) =; IF TRIM(LEFT(_ORG(I))) NE "" THEN DO; _ORG(J) = COALESCEC(TRIM(LEFT(_ORG(J))),TRIM(LEFT(_ORG(I)))); J+1; END; END; RUN;

 


Accepted Solutions
Solution
3 weeks ago
PROC Star
Posts: 249

Re: Populating All Missing Variables With the Right Most Non-Missing Value

Posted in reply to Scott_Mitchell

Here is a relatively simple program which does the trick:

data want;

  set have;

  array org_lvl {*} org_lvl_:;

  /* if last element is missing, set it to last non-missing */

  if missing(org_lvl(dim(org_lvl))) then do;

    do _N_=dim(org_lvl)-1 to 1 by -1 while(missing(org_lvl(_N_)));

      end;

    org_lvl(dim(org_lvl))=org_lvl(_N_);

    end;

  /* fill left when missing */

  do _N_=dim(org_lvl)-1 to 1 by -1;

    if missing(org_lvl(_N_)) then

      org_lvl(_N_)=org_lvl(_N_+1);

    end;

run;

 

Not only is it quite easy to understand, it also performs well (with 400,000 obs, my version took about 0.26 CPU seconds, while your original version took 0.78 seconds - the solution suggested by @Patrick was in between).

View solution in original post


All Replies
Respected Advisor
Posts: 4,665

Re: Populating All Missing Variables With the Right Most Non-Missing Value

Posted in reply to Scott_Mitchell

@Scott_Mitchell

Below should do the job:

data have;
 length org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10 $20.;
 infile datalines dlm= ',' dsd missover;
 input org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10;

 datalines;
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,,,LEVEL10
;
run;

data want;
  set have;
  array org_lvl {*} org_lvl_:;
  do _i=1 to dim(org_lvl);
    if missing(org_lvl[_i]) then
      do _j=_i+1 to dim(org_lvl) until(not missing(org_lvl[_i]));
        org_lvl[_i]=org_lvl[_j];
      end;
      if missing(org_lvl[_i]) then org_lvl[_i]=org_lvl[max(_i-1,1)];
  end;
run;
Solution
3 weeks ago
PROC Star
Posts: 249

Re: Populating All Missing Variables With the Right Most Non-Missing Value

Posted in reply to Scott_Mitchell

Here is a relatively simple program which does the trick:

data want;

  set have;

  array org_lvl {*} org_lvl_:;

  /* if last element is missing, set it to last non-missing */

  if missing(org_lvl(dim(org_lvl))) then do;

    do _N_=dim(org_lvl)-1 to 1 by -1 while(missing(org_lvl(_N_)));

      end;

    org_lvl(dim(org_lvl))=org_lvl(_N_);

    end;

  /* fill left when missing */

  do _N_=dim(org_lvl)-1 to 1 by -1;

    if missing(org_lvl(_N_)) then

      org_lvl(_N_)=org_lvl(_N_+1);

    end;

run;

 

Not only is it quite easy to understand, it also performs well (with 400,000 obs, my version took about 0.26 CPU seconds, while your original version took 0.78 seconds - the solution suggested by @Patrick was in between).

Super Contributor
Posts: 308

Re: Populating All Missing Variables With the Right Most Non-Missing Value

Thank you both for your help.

 

This worked perfectly.

PROC Star
Posts: 249

Re: Populating All Missing Variables With the Right Most Non-Missing Value

Posted in reply to Scott_Mitchell

I have been thinking a bit - if all values are missing, you will get an index-out-of-range error, as _N_=0 after the first loop.

So you may want to put in a safeguard:

data want;

  set have;

  array org_lvl {*} org_lvl_:;

  /* if last element is missing, set it to last non-missing */

  if missing(org_lvl(dim(org_lvl))) then do;

    do _N_=dim(org_lvl)-1 to 1 by -1 while(missing(org_lvl(_N_)));

      end;

    if _N_=0 then /* in case all values are missing */

      return;

    org_lvl(dim(org_lvl))=org_lvl(_N_);

    end;

  /* fill left when missing */

  do _N_=dim(org_lvl)-1 to 1 by -1;

    if missing(org_lvl(_N_)) then

      org_lvl(_N_)=org_lvl(_N_+1);

    end;

run;

I did it with a RETURN statement, as the second loop will not do anything useful in that case.

Super Contributor
Posts: 308

Re: Populating All Missing Variables With the Right Most Non-Missing Value

Thanks again @s_lassen.

 

I appreciate the rigor and additional consideration you have given my problem.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 128 views
  • 1 like
  • 3 in conversation