Hi all,
I have a dataset with following variables:
Child | Mother | Country_Mother | Country_Child |
1 | . | . | DE |
2 | 1 | DE | GB |
3 | 2 | GB | US |
4 | 3 | US | ES |
5 | 1 | DE | DK |
6 | 5 | DK | FR |
7 | . | . | IT |
8 | . | . | NL |
9 | 8 | NL | LU |
Each child has either no mother, one mother or several mothers in a chain upwards. If the child does not have a mother it is the "highest" mother in a chain. Then I need to assign in a new variable "New_Country" the country of this "highest" mother to all her children in a chain downwards without knowing the depth of the chain. For the highest mother itself (country_mother and mother is empty) new_Country should be filled in with country_child. The result table should look like this:
Child | Mother | Country_Mother | Country_Child | New_Country |
1 | . | . | DE | DE |
2 | 1 | DE | GB | DE |
3 | 2 | GB | US | DE |
4 | 3 | US | ES | DE |
5 | 1 | DE | DK | DE |
6 | 5 | DK | FR | DE |
7 | . | . | IT | IT |
8 | . | . | NL | NL |
9 | 8 | NL | LU | NL |
This table is only an excerpt to illustrate the data structure. The data set is actually longer and has many such chains. I've already tried using loops and index but it did not work. I read a lot about hashes and recursive queries but I'm not sure how to use them in this case, since I'm rather new in sas macros.
I'm very grateful for every help!
Thanks,
Jaus
Need some more explanation to understand the following statement:
"Each child has either no mother, one mother or several mothers in a chain upwards."
In this case mother 1 has 2 children. I am unable to understand "mothers in a chain upwards" and see how you are getting "DE" for child 2 to 6.
Here is a graph to the mentioned table in order to explain it a bit better:
For 1,7 and 8 new_country should be the the value of country_child, because there is no mother above. For 5,6,2,3 and 4 new_country should be DE because 1 is the "highest" mother of them all in their chain upwards. 7 doesn't have a child and 9 Need as new_country NL, because 8 is its "highest" mother.
I hope this clarifies.
Thanks!
It's actually very simple. Missing "mother" indicates a group change, keep the value from there:
data have;
input Child Mother Country_Mother $ Country_Child $;
datalines;
1 . . DE
2 1 DE GB
3 2 GB US
4 3 US ES
5 1 DE DK
6 5 DK FR
7 . . IT
8 . . NL
9 8 NL LU
;
data want;
set have;
by child; /* just to make sure it's correctly sorted */
retain new_country;
if mother = . then new_country = country_child;
run;
proc print data=want noobs;
run;
Note how I presented example data in usable form in a data step with datalines. Please do so yourself in the future.
Result:
Country_ Country_ new_ Child Mother Mother Child country 1 . DE DE 2 1 DE GB DE 3 2 GB US DE 4 3 US ES DE 5 1 DE DK DE 6 5 DK FR DE 7 . IT IT 8 . NL NL 9 8 NL LU NL
data x;
input Child
Mother
Country_Mother $
Country_Child $;
cards;
1
.
.
DE
2
1
DE
GB
3
2
GB
US
4
3
US
ES
5
1
DE
DK
6
5
DK
FR
7
.
.
IT
8
.
.
NL
9
8
NL
LU
;
data have;
set x(keep=child mother);
rename mother=from child=to;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sort data=want;
by node;
run;
data want1(index=(household));
merge want x(keep=child country_child mother rename=(child=node)
where=(mother is missing));
by node;
if not missing(node);
drop mother;
run;
data want2;
merge want1 want1(keep=household country_child
rename=(country_child=new_country)
where=(new_country is not missing));
by household;
drop country_child;
run;
data want3;
if _n_=1 then do;
if 0 then set want2;
declare hash h(dataset:'want2');
h.definekey('node');
h.definedata('new_country');
h.definedone();
end;
set x;
call missing(new_country);
rc=h.find(key:child);
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.