Hi All,
I need to pick out unique phone numbers among three columns per ID as below:
| USERID | Phone1 | Phone2 | Phone3 |
| 1 | 1234567890 | 1234567890 | 1234567890 |
| 2 | 2345678901 | 2345678901 | |
| 3 | 4567890123 | 4567890124 | 4567890125 |
| 4 | 9876543210 | 9876543211 | |
| 5 | 8765432109 | 8765432109 | 8765432110 |
| 6 | 6543210987 | 6543210988 |
Some records actually have same numbers, like USERID 1 and 2 while others have different. I just wanted to keep unique numbers. If there is only one unique number, then phone1_valid takes that value; If there are two unique number, then phone1_valid and phone2_valid take the two values; etc..
Do you have a way to do that?
Thanks!
Lz
Hi @lizzy28 I would recommend to have a FLAG for unique and dups, so that you can filter and summarize however you want-
data have;
input USERID $ (Phone1 Phone2 Phone3) (:$12.);
cards;
1 1234567890 1234567890 1234567890
2 2345678901 2345678901 .
3 4567890123 4567890124 4567890125
4 9876543210 9876543211 .
5 8765432109 8765432109 8765432110
6 6543210987 . 6543210988
;
proc transpose data=have out=temp;
by userid;
var phone1-phone3;
run;
proc sort data=temp;
where col1>' ';
by userid col1;
run;
data want;
set temp;
by userid col1;
length check $12;
if not(first.col1 and last.col1) then check='Dups';
else check='unique';
rename col1=phone_no
_name_=Phone;
run;
| USERID | Phone | phone_no | check |
|---|---|---|---|
| 1 | Phone1 | 1234567890 | Dups |
| 1 | Phone2 | 1234567890 | Dups |
| 1 | Phone3 | 1234567890 | Dups |
| 2 | Phone1 | 2345678901 | Dups |
| 2 | Phone2 | 2345678901 | Dups |
| 3 | Phone1 | 4567890123 | unique |
| 3 | Phone2 | 4567890124 | unique |
| 3 | Phone3 | 4567890125 | unique |
| 4 | Phone1 | 9876543210 | unique |
| 4 | Phone2 | 9876543211 | unique |
| 5 | Phone1 | 8765432109 | Dups |
| 5 | Phone2 | 8765432109 | Dups |
| 5 | Phone3 | 8765432110 | unique |
| 6 | Phone1 | 6543210987 | unique |
| 6 | Phone3 | 6543210988 | unique |
Hi @lizzy28 I would recommend to have a FLAG for unique and dups, so that you can filter and summarize however you want-
data have;
input USERID $ (Phone1 Phone2 Phone3) (:$12.);
cards;
1 1234567890 1234567890 1234567890
2 2345678901 2345678901 .
3 4567890123 4567890124 4567890125
4 9876543210 9876543211 .
5 8765432109 8765432109 8765432110
6 6543210987 . 6543210988
;
proc transpose data=have out=temp;
by userid;
var phone1-phone3;
run;
proc sort data=temp;
where col1>' ';
by userid col1;
run;
data want;
set temp;
by userid col1;
length check $12;
if not(first.col1 and last.col1) then check='Dups';
else check='unique';
rename col1=phone_no
_name_=Phone;
run;
| USERID | Phone | phone_no | check |
|---|---|---|---|
| 1 | Phone1 | 1234567890 | Dups |
| 1 | Phone2 | 1234567890 | Dups |
| 1 | Phone3 | 1234567890 | Dups |
| 2 | Phone1 | 2345678901 | Dups |
| 2 | Phone2 | 2345678901 | Dups |
| 3 | Phone1 | 4567890123 | unique |
| 3 | Phone2 | 4567890124 | unique |
| 3 | Phone3 | 4567890125 | unique |
| 4 | Phone1 | 9876543210 | unique |
| 4 | Phone2 | 9876543211 | unique |
| 5 | Phone1 | 8765432109 | Dups |
| 5 | Phone2 | 8765432109 | Dups |
| 5 | Phone3 | 8765432110 | unique |
| 6 | Phone1 | 6543210987 | unique |
| 6 | Phone3 | 6543210988 | unique |
You want to remove duplicates within rows. Do you also want to remove duplicates across rows? Or can USERIDs share a phone number?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.