SAS Programming

Satori · Posted 01-19-2023 08:40 AM

I have a dataset like this (below). The variable code takes only two values ('C2', 'U2')

The ID variable is not unique. There can be an ID with only code U2, only code C2 and with both codes. I want to keep only the last one of these, meaning I want to keep the observations with the same ID and different codes, and drop all IDs that have only one code.

Obs ID code

1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2

data_null__ · Posted 01-19-2023 09:12 AM

I will assume you want to see the NON-unique ID values.

data have;
   input Obs ID:$12. code $;
   cards;
1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2
;
run;
proc print;
proc sort data=have nounikey out=dups uniout=_null_;
   by id;
   run;
proc print;
   run;

View solution in original post

PaigeMiller · Posted 01-19-2023 08:41 AM

Please provide data that illustrates the problem. Right now, there is no ID which has both U2 and C2.

--
Paige Miller

Satori · Posted 01-19-2023 08:43 AM

I have a dataset like this (below). The variable code takes only two values ('C2', 'U2')

The ID variable is not unique. There can be an ID with only code U2, only code C2 and with both codes. I want to keep only the last one of these, meaning I want to keep the observations with the same ID and different codes, and drop all IDs that have only one code.

Obs ID code

1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2

PaigeMiller · Posted 01-19-2023 08:53 AM

From now on, please provide data as WORKING data step code, as I have shown below in creating data set named HAVE.

data have;
input Obs ID $12. code $;
cards;
1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2
;

proc sql;
    create table want as select id
    from have
    group by id
    having sum(code='U2')>0 and sum(code='C2')>0; 
quit;

--
Paige Miller

Satori · Posted 01-19-2023 08:59 AM

After running your suggested code, I ended up with a list of observations that only shows the ID column, and none of the IDs are duplicate.

PaigeMiller · Posted 01-19-2023 09:11 AM

@Satori wrote:
After running your suggested code, I ended up with a list of observations that only shows the ID column, and none of the IDs are duplicate.

I don't understand this. Show us what you see. Show us what you want to see.

--
Paige Miller

data_null__ · Posted 01-19-2023 09:12 AM

I will assume you want to see the NON-unique ID values.

data have;
   input Obs ID:$12. code $;
   cards;
1 AE0000037163 U2
2 AE0000037282 U2
3 AE0000037693 U2
4 AE0000037738 U2
5 AE0000037738 C2
;
run;
proc print;
proc sort data=have nounikey out=dups uniout=_null_;
   by id;
   run;
proc print;
   run;

SAS Programming

Keep only ID duplicates

Re: Keep only ID duplicates

Re: Keep only ID duplicates

Re: Keep only ID duplicates

Re: Keep only ID duplicates

Re: Keep only ID duplicates

Re: Keep only ID duplicates

Re: Keep only ID duplicates

SAS Viya: CAS 'Duplicate Value Reduction'

Combine and Condense Duplicate Records

Getting all duplicates within a SAS data set

Averaging duplicate samples

Duplicate values display only once

Follow Us

What is...

SAS Programming

Register Today!

SAS Training: Just a Click Away

Follow Us

What is...