There is this dataset. Var1 shows the place within a group. Var2 shows the ID to which the member is related to.
site ID var1 var2
1 11 2 .
1 12 1 11
2 21 2 .
2 22 1 21
2 31 2 .
2 33 1 31
We need to create a variable that shows that two IDs belong to a group.
site ID var1 var2 group
1 11 2 . 1
1 12 1 11 1
2 21 2 . 2
2 22 1 21 2
2 31 2 . 3
2 33 1 31 3
I still think there is a piece of logic you haven't explained, but based upon what you have said, this works:
data want;
set have;
group=1+floor((_n_-1)/2);
run;
I contend something hasn't been explained, because the above solution doesn't even look at the value of var1 and var2.
Please explain the logic used here to derive variable GROUP.
group indicates that ID 11 and ID 12 are associated. var2 indicates that ID 12 is related to ID 11 (value 11).
is VAR1 always 2 in the first record and VAR1 always 1 in the second record? Is VAR2 always present when var1=1 and missing when var1^=1? Are there ever more than 2 records in each GROUP?
I still think there is a piece of logic you haven't explained, but based upon what you have said, this works:
data want;
set have;
group=1+floor((_n_-1)/2);
run;
I contend something hasn't been explained, because the above solution doesn't even look at the value of var1 and var2.
Yes, I think I understand that, but what is wrong with my solution that doesn't use VAR1 or VAR2? I get the proper values of GROUP.
What is not sorted properly?
I was just wondering if there should be prior step but it works great. Thanks!
data x;
infile cards expandtabs;
input site ID var1 var2;
from=catx('|',site,id);
to=catx('|',site,var2);
cards;
1 11 2 .
1 12 1 11
2 21 2 .
2 22 1 21
2 31 2 .
2 33 1 31
;
data have;
set x;
if not missing(id) and not missing(var2);
keep from to;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node; output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 16);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
proc sql;
create table final_want as
select a.site,a.id,a.var1,a.var2,b.household as group
from x as a left join want as b
on catx('|',a.site,a.id)=b.node
order by site,id;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.