Hello!
I'm currently stuck trying to do conditional inheritence:
Have:
DATA HAVE;
LENGTH Col1 $ 1 Col2 3 Col3 $ 3 Col4 $ 3;
INFILE DATALINES DELIMITER=',';
A,1,
A,2,foo
A,3,
B,1,bar
B,2,
B,1,
B,2,foo
C,1,
C,2,bar
C,3,
C,4,
C,3,
C,4,
D,1,
D,2,
D,3,
;
RUN;
Col1 | Col2 | Col3 | Col4 |
A | 1 | ||
A | 2 | foo | |
A | 3 | ||
B | 1 | bar | |
B | 2 | ||
B | 1 | ||
B | 2 | foo | |
C | 1 | ||
C | 2 | bar | |
C | 3 | ||
C | 4 | ||
C | 3 | ||
C | 4 | ||
D | 1 | ||
D | 2 | ||
D | 3 |
Want:
Col1 | Col2 | Col3 | Col4 |
A | 1 | ||
A | 2 | foo | foo |
A | 3 | foo | |
B | 1 | bar | bar |
B | 2 | bar | |
B | 1 | ||
B | 2 | foo | foo |
C | 1 | ||
C | 2 | bar | bar |
C | 3 | bar | |
C | 4 | bar | |
C | 3 | bar | |
C | 4 | bar | |
D | 1 | ||
D | 2 | ||
D | 3 |
In the above example Col1 is used to group the data while Col2 defines hierarchies within these groups. If Col3 contains a value I want Col4 to retain this value. If Col3 doesn't contain a value I want Col4 to inherit the value from the last observation but only within the same group and if the level in the hierarchy is descending in relation to the last observation in Col3 that held a value.
After googling and fiddling around with the LAG function (doesn't seem to be the right tool here?) and the RETAIN statement (failing at groups C and D in the example) for longer than I'd like to admit I remain clueless on how to archive this.
I'd be really grateful for a possible solution, comparable examples or a pointer in the right direction!
Regards
I wish base SAS had an linked list object. I've needed something like this before. And I ache for the days I could find a C library or even make a linked list object to code with. A hash object could be used but it is not neccessary, and it doesn't help code readability, IMO. Even though I've done this before with my own data, doing it with this data I feel like I came up with a completely different solution than I remembered before. This code assumes a well formed input dataset, as I understand the problem.
data have;
length n 8 col1 $ 1 col2 8 col3 $ 3 ;
infile datalines delimiter=',' truncover;
input col1 col2 col3 ;
n=_N_;
datalines;
A,1,
A,2,foo
A,3,
B,1,bar
B,2,
B,1,
B,2,foo
C,1,
C,2,bar
C,3,
C,4,
C,3,
C,2,
C,3,
C,4,
D,1,
D,2,
D,3,
;
run;
data want;
do until (last.col1);
set have;
by col1;
length _col2_x 8 col4 $3;
retain _col2_x col4;
/*_col2_x is the value col2 equaled when col3 was not missing; IOW the
col3 property set to be applied to the children of that parent. */
drop _:;
if not missing(_col2_x) then do;
if _col2_x>=col2
then call missing(_col2_x,col4);
end;
if not missing(col3) then do;
_col2_x=col2;
col4=col3;
end;
output;
end;
call missing(_col2_x, col4);
run;
Concepts for anyone looking up these methods, search for DOW loops, aka. Whitlock loops.
The step creating work.have has some bugs: input and datalines statements are missing.
I prefer using a hash-object for such problems:
/* just to make improve readability of the declare statement */
data work.lookup / view=work.lookup;
set have(drop=col4 rename=(col3=col4));
where not missing(col4);
run;
data work.want;
set work.have;
if _n_ = 1 then do;
declare hash h(dataset: 'work.lookup');
h.defineKey('col1', 'col2');
h.defineData('col4');
h.defineDone();
end;
if not missing(col3) then do;
col4 = col3;
end;
else do;
rc = h.find(key: col1, key: col2-1);
rc = h.ref();
end;
drop rc;
run;
Thank you very much for the suggestion! I definitly have to read up on hash objects.
One problem:
data have;
length col1 $ 1 col2 3 col3 $ 3 col4 $ 3;
infile datalines delimiter=',';
A,1,
A,2,foo
A,3,
B,1,bar
B,2,
B,1,
B,2,foo
C,1,
C,2,bar
C,3,
C,4,
C,3, /* new */
C,2, /* new */
C,3,
C,4,
D,1,
D,2,
D,3,
;
run;
data lookup;
set have(drop=col4 rename=(col3=col4));
where not missing(col4);
run;
data want;
set have;
if _n_ = 1 then do;
declare hash h(dataset: 'lookup');
h.defineKey('col1', 'col2');
h.defineData('col4');
h.defineDone();
end;
if not missing(col3) then do;
col4 = col3;
end;
else do;
rc = h.find(key: col1, key: col2-1);
rc = h.ref();
end;
drop rc;
run;
I've inserted two new observations.
Expected result:
OBS | col1 | col2 | col3 | col4 |
1 | A | 1 |
| |
2 | A | 2 | foo | foo |
3 | A | 3 | foo | |
4 | B | 1 | bar | bar |
5 | B | 2 | bar | |
6 | B | 1 | ||
7 | B | 2 | foo | foo |
8 | C | 1 | ||
9 | C | 2 | bar | bar |
10 | C | 3 | bar | |
11 | C | 4 | bar | |
12 | C | 3 | bar | |
13 | C | 2 | ||
14 | C | 3 | ||
15 | C | 4 | ||
16 | D | 1 | ||
17 | D | 2 | ||
18 | D | 3 |
Actual result:
OBS | col1 | col2 | col3 | col4 |
1 | A | 1 |
| |
2 | A | 2 | foo | foo |
3 | A | 3 | foo | |
4 | B | 1 | bar | bar |
5 | B | 2 | bar | |
6 | B | 1 | ||
7 | B | 2 | foo | foo |
8 | C | 1 | ||
9 | C | 2 | bar | bar |
10 | C | 3 | bar | |
11 | C | 4 | bar | |
12 | C | 3 | bar | |
13 | C | 2 | ||
14 | C | 3 | bar | |
15 | C | 4 | bar | |
16 | D | 1 | ||
17 | D | 2 | ||
18 | D | 3 |
Observations 14 und 15 meet the criteria so they inherit the value. But hierarchically they belong to observation 13 which doesn't have an associated value. Is there a way to consider this?
Row 12? seems to me it is not connected to the rows 9-11 either, how could I explain that?
A1 | B1' | B1 | C1 | ?? | C3 | C2 | D1 | |||||
▼ | ▼ | ▼ | ▼ | ▼ | ▼ | |||||||
A2* | B2' | B2* | C2' | ?? | C3* | D2 | ||||||
▼ | ▼ | ▼ | ▼ | |||||||||
A3* | C3' | ?? | C4* | D3 | ||||||||
nil | ▼ | |||||||||||
foo* | C4' | |||||||||||
bar' |
I'll try to explain it more clearly: col2 represents a hierarchical structure like this (here for group C):
Along the hierarchy the col3 values are introduced (red) and they then count for all subordinate elements (blue).
Edit: Adding to that: As soon as a col3 value is introduced, subordinate elements cannot introduce another col3 value.
Ok, how does the second C3 have "bar"(?), because the first C2 does, ok
@Felmond wrote:
Thank you very much for the suggestion! I definitly have to read up on hash objects.
One problem:
....
Observations 14 und 15 meet the criteria so they inherit the value. But hierarchically they belong to observation 13 which doesn't have an associated value. Is there a way to consider this?
Most likely not by using a hash object the way i used it. I am sure that i didn't understood the role of col3 properly, so i think a lengthy data step will be necessary to create the expected result. Maybe renaming the variables to reflect their role and usage could be useful. And it seems as if i need further text explaining the values of col3.
I wish base SAS had an linked list object. I've needed something like this before. And I ache for the days I could find a C library or even make a linked list object to code with. A hash object could be used but it is not neccessary, and it doesn't help code readability, IMO. Even though I've done this before with my own data, doing it with this data I feel like I came up with a completely different solution than I remembered before. This code assumes a well formed input dataset, as I understand the problem.
data have;
length n 8 col1 $ 1 col2 8 col3 $ 3 ;
infile datalines delimiter=',' truncover;
input col1 col2 col3 ;
n=_N_;
datalines;
A,1,
A,2,foo
A,3,
B,1,bar
B,2,
B,1,
B,2,foo
C,1,
C,2,bar
C,3,
C,4,
C,3,
C,2,
C,3,
C,4,
D,1,
D,2,
D,3,
;
run;
data want;
do until (last.col1);
set have;
by col1;
length _col2_x 8 col4 $3;
retain _col2_x col4;
/*_col2_x is the value col2 equaled when col3 was not missing; IOW the
col3 property set to be applied to the children of that parent. */
drop _:;
if not missing(_col2_x) then do;
if _col2_x>=col2
then call missing(_col2_x,col4);
end;
if not missing(col3) then do;
_col2_x=col2;
col4=col3;
end;
output;
end;
call missing(_col2_x, col4);
run;
Concepts for anyone looking up these methods, search for DOW loops, aka. Whitlock loops.
Wow! Thanks a lot! Your code worked like a charm and did exactly what I was looking for - at least after checking a few hundred observations 😉
I've found a nice article on DoW loops (https://sasnrd.com/sas-dow-loop-example/) and am now trying to understand how it works.
Thanks again!
Regards
I'm not sure what the rules involved in assigning col4.
Too many cases an example does not provide the rules.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.