Solved: Concatenate if duplicate

nickspencer · Posted 09-13-2018 07:03 AM

I have a dataset that has ID and the zip code. There are some duplicate IDs with different zipcodes. I need to remove the duplicate ids by concatenating the multiple zipcodes separated by a comma.
The dataset looks like this.

ID zipcode
001 02115
001 19103
002 10001
003 33130
003 30303
003 20008

The final dataset should look like this.

ID zipcode
001 02115, 19103
002 10001
003 33130, 30303, 20008

Any suggestions on how this can be achieved?

Thanks.

RW9 · Posted 09-13-2018 07:50 AM

Not tested, post test data in the form of a datastep in future please:

data want;
  set have;
  length zcode $2000;
  retain zcode;
  by id;
  if first.id then zcode=zipcode;
  else if index(zcode,zipcode)=0 then zcode=catx(',',zcode,zipcode);
run;

So just check the that zipcode isn't already in the list before catx'ing.

View solution in original post

Astounding · Posted 09-13-2018 07:49 AM

That's actually a question that has been asked a few times before. Here's one example:

https://communities.sas.com/t5/SAS-Programming/concatenate-last-observation-variable-value/m-p/42005...

Be sure to add a LENGTH statement for your new variable (the concatenated zip codes) so it has room to store multiple values.

RW9 · Posted 09-13-2018 07:50 AM

Not tested, post test data in the form of a datastep in future please:

data want;
  set have;
  length zcode $2000;
  retain zcode;
  by id;
  if first.id then zcode=zipcode;
  else if index(zcode,zipcode)=0 then zcode=catx(',',zcode,zipcode);
run;

So just check the that zipcode isn't already in the list before catx'ing.

SuryaKiran · Posted 09-13-2018 08:31 AM

data have;
input ID:$3. zipcode:$8.;
datalines;
001 02115
001 19103
002 10001
003 33130
003 30303
003 20008
;
run;

proc sort data=have nodupkey;
by id zipcode;
run;

data want(drop=zipcode);
format zipcode_all $50.;
do until (last.id);
set have;
by id;
zipcode_all=catx(',',zipcode_all,zipcode);
end;
run;

Thanks,
Suryakiran

ballardw · Posted 09-13-2018 10:46 AM

@nickspencer wrote:
I have a dataset that has ID and the zip code. There are some duplicate IDs with different zipcodes. I need to remove the duplicate ids by concatenating the multiple zipcodes separated by a comma.
The dataset looks like this.

ID zipcode
001 02115
001 19103
002 10001
003 33130
003 30303
003 20008

The final dataset should look like this.

ID zipcode
001 02115, 19103
002 10001
003 33130, 30303, 20008

Any suggestions on how this can be achieved?

Thanks.

Every time I see one of these requirements I ask how the concatenated variable is to be used. I seldom get a good response.

Concatenate if duplicate

Re: Concatenate if duplicate

Re: Concatenate if duplicate

Re: Concatenate if duplicate

Re: Concatenate if duplicate

Re: Concatenate if duplicate

Concatenate if duplicate

Re: Concatenate if duplicate

Re: Concatenate if duplicate

Re: Concatenate if duplicate

Re: Concatenate if duplicate

Re: Concatenate if duplicate

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away