Hello Experts,
I am having a dataset which contains account_number and address. I want to find out which account_number has multiple address in the dataset.
This is my sample data:
data temp;
infile cards dlm=' ' dsd;
input account_no Address :$30.;
format Address :$30.;
cards;
1001 "2419 Jarvisville Road"
1002 "320 Fairway Drive"
1002 "320 Fairway Drive"
1002 "623 Sycamore Fork Road"
1003 "3647 Juniper Drive"
1003 "3647 Juniper Drive"
1003 "3647 Juniper Drive"
1003 "3647 Juniper Drive"
1004 "3309 Kembery Drive"
1004 "890 Leisure Lane"
1004 "3145 Longview Avenue"
1004 "2314 Andell Road"
;
run;
i want my output like this.(As 1001 is unique so i don't want that in my output dataset and also 1003 has same address in all four observation so i don't want that either). I want only Account_number 1002 and 1004 in my output in this way
Account_no | Address1 | Address2 | Address3 | Address4 |
1002 | 320 Fairway Drive | 320 Fairway Drive | 623 Sycamore Fork Road | |
1004 | 3309 Kembery Drive | 890 Leisure Lane | 3145 Longview Avenue | 2314 Andell Road |
Thanks in advance
You can use PROC SQL to easily detect the accounts with conflicting addresses.
You can use PROC TRANSPOSE to convert it to that wide format.
data temp;
infile cards dlm=' ' dsd;
input account_no address :$30.;
cards;
1001 "2419 Jarvisville Road"
1002 "320 Fairway Drive"
1002 "320 Fairway Drive"
1002 "623 Sycamore Fork Road"
1003 "3647 Juniper Drive"
1003 "3647 Juniper Drive"
1003 "3647 Juniper Drive"
1003 "3647 Juniper Drive"
1004 "3309 Kembery Drive"
1004 "890 Leisure Lane"
1004 "3145 Longview Avenue"
1004 "2314 Andell Road"
;
proc sql;
create table wrong as
select distinct account_no,address
from temp
group by account_no
having count(distinct address) > 1
;
quit;
proc transpose data=wrong out=want(drop=_name_) prefix=address;
by account_no;
var address;
run;
This works using PROC FREQ.
proc freq data=temp nlevels;
ods output nlevels=nlevels;
by account_no;
run;
data want;
merge temp nlevels(keep=account_no nlevels);
by account_no;
if nlevels>1;
run;
Generally, I will not put data in wide format as you requested, as SAS is designed to work better on long data sets. If you really really really need to have this wide, please explain why.
Thanks @PaigeMiller . I want that kind of output with addresses to show our data team in presentation that the data is not accurate. So it is easier to explain when I have address along with account number
@_el_doredo wrote:
Thanks @PaigeMiller . I want that kind of output with addresses to show our data team in presentation that the data is not accurate. So it is easier to explain when I have address along with account number
Long works just as well as wide for showing problems in the data
You can use PROC SQL to easily detect the accounts with conflicting addresses.
You can use PROC TRANSPOSE to convert it to that wide format.
data temp;
infile cards dlm=' ' dsd;
input account_no address :$30.;
cards;
1001 "2419 Jarvisville Road"
1002 "320 Fairway Drive"
1002 "320 Fairway Drive"
1002 "623 Sycamore Fork Road"
1003 "3647 Juniper Drive"
1003 "3647 Juniper Drive"
1003 "3647 Juniper Drive"
1003 "3647 Juniper Drive"
1004 "3309 Kembery Drive"
1004 "890 Leisure Lane"
1004 "3145 Longview Avenue"
1004 "2314 Andell Road"
;
proc sql;
create table wrong as
select distinct account_no,address
from temp
group by account_no
having count(distinct address) > 1
;
quit;
proc transpose data=wrong out=want(drop=_name_) prefix=address;
by account_no;
var address;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.