Hi All,
I need to pick out unique phone numbers among three columns per ID as below:
| USERID | Phone1 | Phone2 | Phone3 |
| 1 | 1234567890 | 1234567890 | 1234567890 |
| 2 | 2345678901 | 2345678901 | |
| 3 | 4567890123 | 4567890124 | 4567890125 |
| 4 | 9876543210 | 9876543211 | |
| 5 | 8765432109 | 8765432109 | 8765432110 |
| 6 | 6543210987 | 6543210988 |
Some records actually have same numbers, like USERID 1 and 2 while others have different. I just wanted to keep unique numbers. If there is only one unique number, then phone1_valid takes that value; If there are two unique number, then phone1_valid and phone2_valid take the two values; etc..
Do you have a way to do that?
Thanks!
Lz
Hi @lizzy28 I would recommend to have a FLAG for unique and dups, so that you can filter and summarize however you want-
data have;
input USERID $ (Phone1 Phone2 Phone3) (:$12.);
cards;
1 1234567890 1234567890 1234567890
2 2345678901 2345678901 .
3 4567890123 4567890124 4567890125
4 9876543210 9876543211 .
5 8765432109 8765432109 8765432110
6 6543210987 . 6543210988
;
proc transpose data=have out=temp;
by userid;
var phone1-phone3;
run;
proc sort data=temp;
where col1>' ';
by userid col1;
run;
data want;
set temp;
by userid col1;
length check $12;
if not(first.col1 and last.col1) then check='Dups';
else check='unique';
rename col1=phone_no
_name_=Phone;
run;
| USERID | Phone | phone_no | check |
|---|---|---|---|
| 1 | Phone1 | 1234567890 | Dups |
| 1 | Phone2 | 1234567890 | Dups |
| 1 | Phone3 | 1234567890 | Dups |
| 2 | Phone1 | 2345678901 | Dups |
| 2 | Phone2 | 2345678901 | Dups |
| 3 | Phone1 | 4567890123 | unique |
| 3 | Phone2 | 4567890124 | unique |
| 3 | Phone3 | 4567890125 | unique |
| 4 | Phone1 | 9876543210 | unique |
| 4 | Phone2 | 9876543211 | unique |
| 5 | Phone1 | 8765432109 | Dups |
| 5 | Phone2 | 8765432109 | Dups |
| 5 | Phone3 | 8765432110 | unique |
| 6 | Phone1 | 6543210987 | unique |
| 6 | Phone3 | 6543210988 | unique |
Hi @lizzy28 I would recommend to have a FLAG for unique and dups, so that you can filter and summarize however you want-
data have;
input USERID $ (Phone1 Phone2 Phone3) (:$12.);
cards;
1 1234567890 1234567890 1234567890
2 2345678901 2345678901 .
3 4567890123 4567890124 4567890125
4 9876543210 9876543211 .
5 8765432109 8765432109 8765432110
6 6543210987 . 6543210988
;
proc transpose data=have out=temp;
by userid;
var phone1-phone3;
run;
proc sort data=temp;
where col1>' ';
by userid col1;
run;
data want;
set temp;
by userid col1;
length check $12;
if not(first.col1 and last.col1) then check='Dups';
else check='unique';
rename col1=phone_no
_name_=Phone;
run;
| USERID | Phone | phone_no | check |
|---|---|---|---|
| 1 | Phone1 | 1234567890 | Dups |
| 1 | Phone2 | 1234567890 | Dups |
| 1 | Phone3 | 1234567890 | Dups |
| 2 | Phone1 | 2345678901 | Dups |
| 2 | Phone2 | 2345678901 | Dups |
| 3 | Phone1 | 4567890123 | unique |
| 3 | Phone2 | 4567890124 | unique |
| 3 | Phone3 | 4567890125 | unique |
| 4 | Phone1 | 9876543210 | unique |
| 4 | Phone2 | 9876543211 | unique |
| 5 | Phone1 | 8765432109 | Dups |
| 5 | Phone2 | 8765432109 | Dups |
| 5 | Phone3 | 8765432110 | unique |
| 6 | Phone1 | 6543210987 | unique |
| 6 | Phone3 | 6543210988 | unique |
You want to remove duplicates within rows. Do you also want to remove duplicates across rows? Or can USERIDs share a phone number?
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.