There is this dataset. Var1 shows the place within a group. Var2 shows the ID to which the member is related to.
site ID var1 var2
1 11 2 .
1 12 1 11
2 21 2 .
2 22 1 21
2 31 2 .
2 33 1 31
We need to create a variable that shows that two IDs belong to a group.
site ID var1 var2 group
1 11 2 . 1
1 12 1 11 1
2 21 2 . 2
2 22 1 21 2
2 31 2 . 3
2 33 1 31 3
I still think there is a piece of logic you haven't explained, but based upon what you have said, this works:
data want;
set have;
group=1+floor((_n_-1)/2);
run;
I contend something hasn't been explained, because the above solution doesn't even look at the value of var1 and var2.
Please explain the logic used here to derive variable GROUP.
group indicates that ID 11 and ID 12 are associated. var2 indicates that ID 12 is related to ID 11 (value 11).
is VAR1 always 2 in the first record and VAR1 always 1 in the second record? Is VAR2 always present when var1=1 and missing when var1^=1? Are there ever more than 2 records in each GROUP?
I still think there is a piece of logic you haven't explained, but based upon what you have said, this works:
data want;
set have;
group=1+floor((_n_-1)/2);
run;
I contend something hasn't been explained, because the above solution doesn't even look at the value of var1 and var2.
Yes, I think I understand that, but what is wrong with my solution that doesn't use VAR1 or VAR2? I get the proper values of GROUP.
What is not sorted properly?
I was just wondering if there should be prior step but it works great. Thanks!
data x;
infile cards expandtabs;
input site ID var1 var2;
from=catx('|',site,id);
to=catx('|',site,var2);
cards;
1 11 2 .
1 12 1 11
2 21 2 .
2 22 1 21
2 31 2 .
2 33 1 31
;
data have;
set x;
if not missing(id) and not missing(var2);
keep from to;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node; output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 16);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sql;
create table final_want as
select a.site,a.id,a.var1,a.var2,b.household as group
from x as a left join want as b
on catx('|',a.site,a.id)=b.node
order by site,id;
quit;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.