Solved: Re: Identify unique values across multiple columns

lizzy28 · Posted 01-23-2021 10:59 AM

Hi All,

I need to pick out unique phone numbers among three columns per ID as below:

USERID	Phone1	Phone2	Phone3
1	1234567890	1234567890	1234567890
2	2345678901	2345678901
3	4567890123	4567890124	4567890125
4	9876543210	9876543211
5	8765432109	8765432109	8765432110
6	6543210987		6543210988

Some records actually have same numbers, like USERID 1 and 2 while others have different. I just wanted to keep unique numbers. If there is only one unique number, then phone1_valid takes that value; If there are two unique number, then phone1_valid and phone2_valid take the two values; etc..

Do you have a way to do that?

Thanks!

Lz

novinosrin · Posted 01-23-2021 12:23 PM

Hi @lizzy28 I would recommend to have a FLAG for unique and dups, so that you can filter and summarize however you want-



data have;
input USERID $	(Phone1	Phone2	Phone3) (:$12.);
cards;
1	1234567890	1234567890	1234567890
2	2345678901	2345678901	 .
3	4567890123	4567890124	4567890125
4	9876543210	9876543211	 .
5	8765432109	8765432109	8765432110
6	6543210987	 .	6543210988
;
proc transpose data=have out=temp;
 by userid;
 var phone1-phone3;
run;

proc sort data=temp;
 where col1>' ';
 by userid col1;
run;

data want;
 set temp;
 by userid col1;
 length check $12;
 if not(first.col1 and last.col1) then check='Dups';
 else check='unique';
 rename col1=phone_no
 _name_=Phone;
run;

USERID	Phone	phone_no	check
1	Phone1	1234567890	Dups
1	Phone2	1234567890	Dups
1	Phone3	1234567890	Dups
2	Phone1	2345678901	Dups
2	Phone2	2345678901	Dups
3	Phone1	4567890123	unique
3	Phone2	4567890124	unique
3	Phone3	4567890125	unique
4	Phone1	9876543210	unique
4	Phone2	9876543211	unique
5	Phone1	8765432109	Dups
5	Phone2	8765432109	Dups
5	Phone3	8765432110	unique
6	Phone1	6543210987	unique
6	Phone3	6543210988	unique

View solution in original post

novinosrin · Posted 01-23-2021 12:23 PM

Hi @lizzy28 I would recommend to have a FLAG for unique and dups, so that you can filter and summarize however you want-



data have;
input USERID $	(Phone1	Phone2	Phone3) (:$12.);
cards;
1	1234567890	1234567890	1234567890
2	2345678901	2345678901	 .
3	4567890123	4567890124	4567890125
4	9876543210	9876543211	 .
5	8765432109	8765432109	8765432110
6	6543210987	 .	6543210988
;
proc transpose data=have out=temp;
 by userid;
 var phone1-phone3;
run;

proc sort data=temp;
 where col1>' ';
 by userid col1;
run;

data want;
 set temp;
 by userid col1;
 length check $12;
 if not(first.col1 and last.col1) then check='Dups';
 else check='unique';
 rename col1=phone_no
 _name_=Phone;
run;

USERID	Phone	phone_no	check
1	Phone1	1234567890	Dups
1	Phone2	1234567890	Dups
1	Phone3	1234567890	Dups
2	Phone1	2345678901	Dups
2	Phone2	2345678901	Dups
3	Phone1	4567890123	unique
3	Phone2	4567890124	unique
3	Phone3	4567890125	unique
4	Phone1	9876543210	unique
4	Phone2	9876543211	unique
5	Phone1	8765432109	Dups
5	Phone2	8765432109	Dups
5	Phone3	8765432110	unique
6	Phone1	6543210987	unique
6	Phone3	6543210988	unique

mkeintz · Posted 01-23-2021 02:18 PM

You want to remove duplicates within rows. Do you also want to remove duplicates across rows? Or can USERIDs share a phone number?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Identify unique values across multiple columns

Re: Identify unique values across multiple columns

Re: Identify unique values across multiple columns

Re: Identify unique values across multiple columns

Registration is open

SAS Training: Just a Click Away