Hi All,
I need to pick out unique phone numbers among three columns per ID as below:
USERID | Phone1 | Phone2 | Phone3 |
1 | 1234567890 | 1234567890 | 1234567890 |
2 | 2345678901 | 2345678901 | |
3 | 4567890123 | 4567890124 | 4567890125 |
4 | 9876543210 | 9876543211 | |
5 | 8765432109 | 8765432109 | 8765432110 |
6 | 6543210987 | 6543210988 |
Some records actually have same numbers, like USERID 1 and 2 while others have different. I just wanted to keep unique numbers. If there is only one unique number, then phone1_valid takes that value; If there are two unique number, then phone1_valid and phone2_valid take the two values; etc..
Do you have a way to do that?
Thanks!
Lz
Hi @lizzy28 I would recommend to have a FLAG for unique and dups, so that you can filter and summarize however you want-
data have;
input USERID $ (Phone1 Phone2 Phone3) (:$12.);
cards;
1 1234567890 1234567890 1234567890
2 2345678901 2345678901 .
3 4567890123 4567890124 4567890125
4 9876543210 9876543211 .
5 8765432109 8765432109 8765432110
6 6543210987 . 6543210988
;
proc transpose data=have out=temp;
by userid;
var phone1-phone3;
run;
proc sort data=temp;
where col1>' ';
by userid col1;
run;
data want;
set temp;
by userid col1;
length check $12;
if not(first.col1 and last.col1) then check='Dups';
else check='unique';
rename col1=phone_no
_name_=Phone;
run;
USERID | Phone | phone_no | check |
---|---|---|---|
1 | Phone1 | 1234567890 | Dups |
1 | Phone2 | 1234567890 | Dups |
1 | Phone3 | 1234567890 | Dups |
2 | Phone1 | 2345678901 | Dups |
2 | Phone2 | 2345678901 | Dups |
3 | Phone1 | 4567890123 | unique |
3 | Phone2 | 4567890124 | unique |
3 | Phone3 | 4567890125 | unique |
4 | Phone1 | 9876543210 | unique |
4 | Phone2 | 9876543211 | unique |
5 | Phone1 | 8765432109 | Dups |
5 | Phone2 | 8765432109 | Dups |
5 | Phone3 | 8765432110 | unique |
6 | Phone1 | 6543210987 | unique |
6 | Phone3 | 6543210988 | unique |
Hi @lizzy28 I would recommend to have a FLAG for unique and dups, so that you can filter and summarize however you want-
data have;
input USERID $ (Phone1 Phone2 Phone3) (:$12.);
cards;
1 1234567890 1234567890 1234567890
2 2345678901 2345678901 .
3 4567890123 4567890124 4567890125
4 9876543210 9876543211 .
5 8765432109 8765432109 8765432110
6 6543210987 . 6543210988
;
proc transpose data=have out=temp;
by userid;
var phone1-phone3;
run;
proc sort data=temp;
where col1>' ';
by userid col1;
run;
data want;
set temp;
by userid col1;
length check $12;
if not(first.col1 and last.col1) then check='Dups';
else check='unique';
rename col1=phone_no
_name_=Phone;
run;
USERID | Phone | phone_no | check |
---|---|---|---|
1 | Phone1 | 1234567890 | Dups |
1 | Phone2 | 1234567890 | Dups |
1 | Phone3 | 1234567890 | Dups |
2 | Phone1 | 2345678901 | Dups |
2 | Phone2 | 2345678901 | Dups |
3 | Phone1 | 4567890123 | unique |
3 | Phone2 | 4567890124 | unique |
3 | Phone3 | 4567890125 | unique |
4 | Phone1 | 9876543210 | unique |
4 | Phone2 | 9876543211 | unique |
5 | Phone1 | 8765432109 | Dups |
5 | Phone2 | 8765432109 | Dups |
5 | Phone3 | 8765432110 | unique |
6 | Phone1 | 6543210987 | unique |
6 | Phone3 | 6543210988 | unique |
You want to remove duplicates within rows. Do you also want to remove duplicates across rows? Or can USERIDs share a phone number?
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.