Good Evening All,
I have a hierarchy which has holes in it as per the HAVE dataset code below. I am looking to first maintain the right most populated value whilst filling all blanks to the left (the code attached with the Array does this nicely). Then I need to find the right most populated record and hold it's value until ORG_LVL_10. I am struggling with the finding a solution that performs the latter task. Any help would be greatly appreciated.
Thank you for all your help.
Regards,
Scott
data have;
length org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10 $20.;
infile datalines dlm= ',' dsd missover;
input org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10;
datalines;
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,,,LEVEL10
;
run;
data want;
length org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10 $20.;
infile datalines dlm= ',' dsd missover;
input org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10;
datalines;
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL5,LEVEL5,LEVEL5,LEVEL5,LEVEL5,
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL6,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL6,LEVEL6,LEVEL7,LEVEL10,LEVEL10,LEVEL10
;
run;
DATA test; SET have; ARRAY _ORG $ ORG_LVL_1 - ORG_LVL_10; J=1; DO I=1 TO DIM(_ORG); PUT I = J = _ORG(I) = _ORG(J) =; IF TRIM(LEFT(_ORG(I))) NE "" THEN DO; _ORG(J) = COALESCEC(TRIM(LEFT(_ORG(J))),TRIM(LEFT(_ORG(I)))); J+1; END; END; RUN;
Here is a relatively simple program which does the trick:
data want;
set have;
array org_lvl {*} org_lvl_:;
/* if last element is missing, set it to last non-missing */
if missing(org_lvl(dim(org_lvl))) then do;
do _N_=dim(org_lvl)-1 to 1 by -1 while(missing(org_lvl(_N_)));
end;
org_lvl(dim(org_lvl))=org_lvl(_N_);
end;
/* fill left when missing */
do _N_=dim(org_lvl)-1 to 1 by -1;
if missing(org_lvl(_N_)) then
org_lvl(_N_)=org_lvl(_N_+1);
end;
run;
Not only is it quite easy to understand, it also performs well (with 400,000 obs, my version took about 0.26 CPU seconds, while your original version took 0.78 seconds - the solution suggested by @Patrick was in between).
Below should do the job:
data have;
length org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10 $20.;
infile datalines dlm= ',' dsd missover;
input org_lvl_1 org_lvl_2 org_lvl_3 org_lvl_4 org_lvl_5 org_lvl_6 org_lvl_7 org_lvl_8 org_lvl_9 org_lvl_10;
datalines;
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,LEVEL5,
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,LEVEL8,LEVEL9,LEVEL10
LEVEL1,LEVEL2,LEVEL3,LEVEL4,,LEVEL6,LEVEL7,,,LEVEL10
;
run;
data want;
set have;
array org_lvl {*} org_lvl_:;
do _i=1 to dim(org_lvl);
if missing(org_lvl[_i]) then
do _j=_i+1 to dim(org_lvl) until(not missing(org_lvl[_i]));
org_lvl[_i]=org_lvl[_j];
end;
if missing(org_lvl[_i]) then org_lvl[_i]=org_lvl[max(_i-1,1)];
end;
run;
Here is a relatively simple program which does the trick:
data want;
set have;
array org_lvl {*} org_lvl_:;
/* if last element is missing, set it to last non-missing */
if missing(org_lvl(dim(org_lvl))) then do;
do _N_=dim(org_lvl)-1 to 1 by -1 while(missing(org_lvl(_N_)));
end;
org_lvl(dim(org_lvl))=org_lvl(_N_);
end;
/* fill left when missing */
do _N_=dim(org_lvl)-1 to 1 by -1;
if missing(org_lvl(_N_)) then
org_lvl(_N_)=org_lvl(_N_+1);
end;
run;
Not only is it quite easy to understand, it also performs well (with 400,000 obs, my version took about 0.26 CPU seconds, while your original version took 0.78 seconds - the solution suggested by @Patrick was in between).
Thank you both for your help.
This worked perfectly.
I have been thinking a bit - if all values are missing, you will get an index-out-of-range error, as _N_=0 after the first loop.
So you may want to put in a safeguard:
data want;
set have;
array org_lvl {*} org_lvl_:;
/* if last element is missing, set it to last non-missing */
if missing(org_lvl(dim(org_lvl))) then do;
do _N_=dim(org_lvl)-1 to 1 by -1 while(missing(org_lvl(_N_)));
end;
if _N_=0 then /* in case all values are missing */
return;
org_lvl(dim(org_lvl))=org_lvl(_N_);
end;
/* fill left when missing */
do _N_=dim(org_lvl)-1 to 1 by -1;
if missing(org_lvl(_N_)) then
org_lvl(_N_)=org_lvl(_N_+1);
end;
run;
I did it with a RETURN statement, as the second loop will not do anything useful in that case.
Thanks again @s_lassen.
I appreciate the rigor and additional consideration you have given my problem.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.