Good Evening All,
I have a hierarchy which has holes in it as per the HAVE dataset code below. I am looking to first maintain the right most populated value whilst filling all blanks to the left (the code attached with the Array does this nicely). Then I need to find the right most populated record and hold it's value until ORG_LVL_10. I am struggling with the finding a solution that performs the latter task. Any help would be greatly appreciated.
Thank you for all your help.
Regards,
Scott
data have;
length org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10 $20.;
infile datalines dlm= ',' dsd missover;
input org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10;
datalines;
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,,,LEVEL10
;
run;
data want;
length org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10 $20.;
infile datalines dlm= ',' dsd missover;
input org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10;
datalines;
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL5,LEVEL5,LEVEL5,LEVEL5,LEVEL5,
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL6,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL6,LEVEL6,LEVEL7,LEVEL10,LEVEL10,LEVEL10
;
run;
DATA test; SET have; ARRAY _ORG $ ORG_LVL_1 - ORG_LVL_10; J=1; DO I=1 TO DIM(_ORG); PUT I = J = _ORG(I) = _ORG(J) =; IF TRIM(LEFT(_ORG(I))) NE "" THEN DO; _ORG(J) = COALESCEC(TRIM(LEFT(_ORG(J))),TRIM(LEFT(_ORG(I)))); J+1; END; END; RUN;
Here is a relatively simple program which does the trick:
data want;
set have;
array org_lvl {*} org_lvl_:;
/* if last element is missing, set it to last non-missing */
if missing(org_lvl(dim(org_lvl))) then do;
do _N_=dim(org_lvl)-1 to 1 by -1 while(missing(org_lvl(_N_)));
end;
org_lvl(dim(org_lvl))=org_lvl(_N_);
end;
/* fill left when missing */
do _N_=dim(org_lvl)-1 to 1 by -1;
if missing(org_lvl(_N_)) then
org_lvl(_N_)=org_lvl(_N_+1);
end;
run;
Not only is it quite easy to understand, it also performs well (with 400,000 obs, my version took about 0.26 CPU seconds, while your original version took 0.78 seconds - the solution suggested by @Patrick was in between).
Below should do the job:
data have;
length org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10 $20.;
infile datalines dlm= ',' dsd missover;
input org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10;
datalines;
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,,,LEVEL10
;
run;
data want;
set have;
array org_lvl {*} org_lvl_:;
do _i=1 to dim(org_lvl);
if missing(org_lvl[_i]) then
do _j=_i+1 to dim(org_lvl) until(not missing(org_lvl[_i]));
org_lvl[_i]=org_lvl[_j];
end;
if missing(org_lvl[_i]) then org_lvl[_i]=org_lvl[max(_i-1,1)];
end;
run;
Here is a relatively simple program which does the trick:
data want;
set have;
array org_lvl {*} org_lvl_:;
/* if last element is missing, set it to last non-missing */
if missing(org_lvl(dim(org_lvl))) then do;
do _N_=dim(org_lvl)-1 to 1 by -1 while(missing(org_lvl(_N_)));
end;
org_lvl(dim(org_lvl))=org_lvl(_N_);
end;
/* fill left when missing */
do _N_=dim(org_lvl)-1 to 1 by -1;
if missing(org_lvl(_N_)) then
org_lvl(_N_)=org_lvl(_N_+1);
end;
run;
Not only is it quite easy to understand, it also performs well (with 400,000 obs, my version took about 0.26 CPU seconds, while your original version took 0.78 seconds - the solution suggested by @Patrick was in between).
Thank you both for your help.
This worked perfectly.
I have been thinking a bit - if all values are missing, you will get an index-out-of-range error, as _N_=0 after the first loop.
So you may want to put in a safeguard:
data want;
set have;
array org_lvl {*} org_lvl_:;
/* if last element is missing, set it to last non-missing */
if missing(org_lvl(dim(org_lvl))) then do;
do _N_=dim(org_lvl)-1 to 1 by -1 while(missing(org_lvl(_N_)));
end;
if _N_=0 then /* in case all values are missing */
return;
org_lvl(dim(org_lvl))=org_lvl(_N_);
end;
/* fill left when missing */
do _N_=dim(org_lvl)-1 to 1 by -1;
if missing(org_lvl(_N_)) then
org_lvl(_N_)=org_lvl(_N_+1);
end;
run;
I did it with a RETURN statement, as the second loop will not do anything useful in that case.
Thanks again @s_lassen.
I appreciate the rigor and additional consideration you have given my problem.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.