Solved: Keeing ONE address

altijani · Posted 05-02-2019 09:03 AM

Greetings Community,

I have this dataset:

Client_ID	Address
ABC123	11 East Street, City1, OH
ABC123	11 East Street, City1, OH County 5
XFE51D	100 Main Street, City2, OH
XFE51D	100 Main Street, City2, OH

I need to keep only ONE observation for each client with the following conditions:

- If the addresses are different, keep the longest address

- if the addresses are the same, keep only one.

My final dataset should look like this:

Client_ID	Address
ABC123	11 East Street, City1, OH County 5
XFE51D	100 Main Street, City2, OH

Thank you!

Tom · Posted 05-02-2019 09:28 AM

If you want to use LENGTH as a criteria you need to add it to your dataset.

data want ;
  set have;
  a_length=length(address);
run;

proc sort;
  by client_id a_length;
run;

data want;
  set want;
  by client_id;
  if last.client_id;
run;

View solution in original post

novinosrin · Posted 05-02-2019 09:25 AM


data have;
input Client_ID $	Address $50.;
cards;
ABC123	11 East Street, City1, OH
ABC123	11 East Street, City1, OH County 5
XFE51D	100 Main Street, City2, OH
XFE51D	100 Main Street, City2, OH
;

proc sql;
create table want  as
select distinct *
from have
group by client_id
having max(length(address))=length(address);
quit;

Tom · Posted 05-02-2019 09:28 AM

If you want to use LENGTH as a criteria you need to add it to your dataset.

data want ;
  set have;
  a_length=length(address);
run;

proc sort;
  by client_id a_length;
run;

data want;
  set want;
  by client_id;
  if last.client_id;
run;

Astounding · Posted 05-02-2019 09:41 AM

There are some tricky situations to consider, both in the planning and execution. Not all solutions will return the same result for:

Client_ID	Address
ABC123	11 East Street, City1, OH
ABC123	11 East Street, City1, OH County 5
ABC123	11 East Street, City1, OH County 5

And what would you like the result to be in cases of a tie?

Client_ID	Address
ABC123	11 East Street, City1, OH
ABC123	11 East Street, City1, OH County 4
ABC123	11 East Street, City1, OH County 5

ballardw · Posted 05-02-2019 01:01 PM

@Astounding wrote:

There are some tricky situations to consider, both in the planning and execution. Not all solutions will return the same result for:
Client_ID	Address
ABC123	11 East Street, City1, OH
ABC123	11 East Street, City1, OH County 5
ABC123	11 East Street, City1, OH County 5
And what would you like the result to be in cases of a tie?
Client_ID	Address
ABC123	11 East Street, City1, OH
ABC123	11 East Street, City1, OH County 4
ABC123	11 East Street, City1, OH County 5

Or even more fun

Client_ID Address 
ABC123 11 East Street, City1, OH County 4 
ABC123 22 West Street, Abcde, OH OCount 9

altijani · Posted 05-03-2019 07:27 AM

Ballardw

Thank you for bringing that up. All these combinations are possible. I have an additional column in the data that has the Date of that record. In case of any of these possibilities, I need to pick up the latest address (based on the most recent Date).

So the data looks like this:

Client_ID Address Date
ABC123 11 East Street, City1, OH 05/01/2018
ABC123 11 East Street, City1, OH County 5 09/09/2018
ABC123 11 East Street, City1, OH County 5 01/13/2019

How the code will look like now?

Thank you

ballardw · Posted 05-03-2019 07:00 PM

@altijani wrote:

Ballardw

Thank you for bringing that up. All these combinations are possible. I have an additional column in the data that has the Date of that record. In case of any of these possibilities, I need to pick up the latest address (based on the most recent Date).

So the data looks like this:

Client_ID Address Date
ABC123 11 East Street, City1, OH 05/01/2018
ABC123 11 East Street, City1, OH County 5 09/09/2018
ABC123 11 East Street, City1, OH County 5 01/13/2019

How the code will look like now?

Thank you

When I have dated addresses I would just take the latest. Unless you have lots of duplicate dates with different addresses then this should work if your date variable is an actual SAS date value (numeric with a SAS format like mmddyy10.)

Proc sort data=have;
   by client_id date;
run;

data want;
   set have;
   by client_id;
   if last.client_id;
run;

Astounding · Posted 05-03-2019 07:10 PM

It sounds like you are looking for the longest value, but using DATE to break ties. I have to assume that you know what a SAS date actually should contain, and that your data actually contains true SAS dates. In that case, I would choose a variation on a method that was suggested earlier:

data want ;
  set have;
  a_length=length(address);
run;

proc sort;
  by client_id a_length date;
run;

data want;
  set want;
  by client_id;
  if last.client_id;
  drop a_length;
run;

altijani · Posted 05-06-2019 07:34 AM

Thank you

Keeing ONE address

Re: Keeing ONE address

Re: Keeing ONE address

Re: Keeing ONE address

Re: Keeing ONE address

Re: Keeing ONE address

Re: Keeing ONE address

Re: Keeing ONE address

Re: Keeing ONE address

Re: Keeing ONE address

Registration is open

SAS Training: Just a Click Away