Hello SAS users,
I have a data set where I want to group together.
There are two input columns: Start and End (times).
Row 1 and 2 are connected with each other because Start (row 2) = End (row 1). The desired output must have a column 'Group' with value 2 where '2' refers to _N_ = 2.
The Start and End in row 3 cannot be grouped with the values in observation 4 and therefore have Group = 3.
Row 4 - 7 can be grouped together, therefore Group = 7.
I suppose that Retain may help, but it does not provide me the solution, so far.
Could you demonstrate the solution?
Best regards,
Cornelis
If your actual data is as simple as you show us then something like below should do. If it is more like what @Ksharp anticipates then things will require a bit more coding.
data have;
input from to;
datalines;
1 2
2 3
5 7
8 11
11 12
12 14
14 15
16 19
21 22
;
data inter;
set have;
group_id + from ne lag(to);
run;
proc sql;
select
from
,to
,group_id
,count(*) as n_rows
from inter
group by group_id
order by from
;
quit;
data have;
input from to;
obs+1;
datalines;
69 77
69 77
31 12
16 29
52 21
52 97
52 97
99 37
99 37
68 24
;
run;
data key;
set have;
key=from;output;
key=to;output;
keep obs key;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node; output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household );
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 16);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
data key2;
if _n_=1 then do;
if 0 then set want;
declare hash k(dataset:'want');
k.definekey('node');
k.definedata('household');
k.definedone();
end;
set key;
call missing(household);
rc=k.find(key:key);
drop rc node;
run;
proc sql;
create table key3 as
select *,max(obs) as group from key2
group by household;
quit;
data final_want;
if _n_=1 then do;
if 0 then set key3;
declare hash k(dataset:'key3');
k.definekey('key');
k.definedata('group');
k.definedone();
end;
set have;
call missing(group);
rc=k.find(key:from);
keep obs from to group;
run;
If your actual data is as simple as you show us then something like below should do. If it is more like what @Ksharp anticipates then things will require a bit more coding.
data have;
input from to;
datalines;
1 2
2 3
5 7
8 11
11 12
12 14
14 15
16 19
21 22
;
data inter;
set have;
group_id + from ne lag(to);
run;
proc sql;
select
from
,to
,group_id
,count(*) as n_rows
from inter
group by group_id
order by from
;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.