I have a dataset like this (below). The variable code takes 4 possible values
The ID variable is not unique. There can be an ID with 2, 3 or 4 codes, although it's rare with 3 and 4. I want to keep the non-unique IDs that have different codes (or conversely, remove the non-unique IDs with repeated codes).
What I have:
data have;
input Obs ID $12. code $;
cards;
1 AC0000037163 C1
2 AC0000037163 U1
3 BE0000037282 U1
4 BE0000037282 U2
5 CZE0000037693 C2
6 CZE0000037693 C2
7 FR0000037738 U2
8 FR0000037738 C2
;
What I want:
data want;
input Obs ID $12. code $;
cards;
1 AC0000037163 C1
2 AC0000037163 U1
3 BE0000037282 U1
4 BE0000037282 U2
7 FR0000037738 U2
8 FR0000037738 C2
;
data have;
input Obs ID $12. code $;
cards;
1 AC0000037163 C1
2 AC0000037163 U1
3 BE0000037282 U1
4 BE0000037282 U2
5 CZE0000037693 C2
6 CZE0000037693 C2
7 FR0000037738 U2
8 FR0000037738 C2
;
proc sql;
create table want as
select *
from have
group by ID
having count(distinct code) > 1
;
quit;
data have;
input Obs ID $12. code $;
cards;
1 AC0000037163 C1
2 AC0000037163 U1
3 BE0000037282 U1
4 BE0000037282 U2
5 CZE0000037693 C2
6 CZE0000037693 C2
7 FR0000037738 U2
8 FR0000037738 C2
;
proc sql;
create table want as
select *
from have
group by ID
having count(distinct code) > 1
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.