- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a dataset like this (below). The variable code takes only two values ('C2', 'U2')
The ID variable is not unique. There can be an ID with only code U2, only code C2 and with both codes. I want to keep only the last one of these, meaning I want to keep the observations with the same ID and different codes, and drop all IDs that have only one code.
Obs ID code
1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I will assume you want to see the NON-unique ID values.
data have;
input Obs ID:$12. code $;
cards;
1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2
;
run;
proc print;
proc sort data=have nounikey out=dups uniout=_null_;
by id;
run;
proc print;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please provide data that illustrates the problem. Right now, there is no ID which has both U2 and C2.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The ID variable is not unique. There can be an ID with only code U2, only code C2 and with both codes. I want to keep only the last one of these, meaning I want to keep the observations with the same ID and different codes, and drop all IDs that have only one code.
Obs ID code
1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
From now on, please provide data as WORKING data step code, as I have shown below in creating data set named HAVE.
data have;
input Obs ID $12. code $;
cards;
1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2
;
proc sql;
create table want as select id
from have
group by id
having sum(code='U2')>0 and sum(code='C2')>0;
quit;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Satori wrote:
After running your suggested code, I ended up with a list of observations that only shows the ID column, and none of the IDs are duplicate.
I don't understand this. Show us what you see. Show us what you want to see.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I will assume you want to see the NON-unique ID values.
data have;
input Obs ID:$12. code $;
cards;
1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2
;
run;
proc print;
proc sort data=have nounikey out=dups uniout=_null_;
by id;
run;
proc print;
run;