Hi, I have a question about how to put duplicates into different groups based on the original order? I have a data like below:
ID condition
1 A
1 A
2 B
2 B
3 A
3 A
3 B
3 B
4 B
4 B
4 A
4 A
5 A
5 A
5 B
5 B
5 A
5 A
The data is like above, basically, there are five situations: 1. all As; 2, all Bs; 3, A first and then B; 4, B first and then A; 5, A first and then B, and then A.
I want to keep the original order, which means I can't use proc sort, because after I use proc sort, AABBAA or BBAA will become AAAABB and AABB.
What I want to do is to put ID with all As in group 1, put ID with all Bs in group 2, and put ID with A first and then B in to group 3. So, for the rest of patterns like B first and then A, or A first and then B, and then A can be removed. I am wondering if there is a way to do it? Hopefully that I express myself clearly. Thank you!
If you just want those specific groups AB, BA, A , B and perhaps none then you could do something like this:
data want;
do row=1 by 1 until (last.id);
set have;
by id;
if condition='A' then a=min(a,row);
else if condition='B' then b=min(b,row);
end;
length group $2 ;
if . < a < b then group='AB';
else if . < b < a then group='BA';
else if . < b then group='B';
else if . < a then group='A';
keep id group;
run;
Result
Obs ID group 1 1 A 2 2 B 3 3 AB 4 4 BA 5 5 AB
If you also want to look for longer patterns, like ABA you might use something like:
data want;
set have;
by id condition notsorted;
length group $8 ;
retain group;
if first.id then group=' ';
if first.condition then group=cats(group,condition);
if last.id;
keep id group;
run;
Result
Obs ID group 1 1 A 2 2 B 3 3 AB 4 4 BA 5 5 ABA
If you just want those specific groups AB, BA, A , B and perhaps none then you could do something like this:
data want;
do row=1 by 1 until (last.id);
set have;
by id;
if condition='A' then a=min(a,row);
else if condition='B' then b=min(b,row);
end;
length group $2 ;
if . < a < b then group='AB';
else if . < b < a then group='BA';
else if . < b then group='B';
else if . < a then group='A';
keep id group;
run;
Result
Obs ID group 1 1 A 2 2 B 3 3 AB 4 4 BA 5 5 AB
If you also want to look for longer patterns, like ABA you might use something like:
data want;
set have;
by id condition notsorted;
length group $8 ;
retain group;
if first.id then group=' ';
if first.condition then group=cats(group,condition);
if last.id;
keep id group;
run;
Result
Obs ID group 1 1 A 2 2 B 3 3 AB 4 4 BA 5 5 ABA
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.