BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nickspencer
Obsidian | Level 7
I have a dataset that has ID and the zip code. There are some duplicate IDs with different zipcodes. I need to remove the duplicate ids by concatenating the multiple zipcodes separated by a comma.
The dataset looks like this.

ID zipcode
001 02115
001 19103
002 10001
003 33130
003 30303
003 20008



The final dataset should look like this.

ID zipcode
001 02115, 19103
002 10001
003 33130, 30303, 20008

Any suggestions on how this can be achieved?

Thanks.
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not tested, post test data in the form of a datastep in future please:

data want;
  set have;
  length zcode $2000;
  retain zcode;
  by id;
  if first.id then zcode=zipcode;
  else if index(zcode,zipcode)=0 then zcode=catx(',',zcode,zipcode);
run;

So just check the that zipcode isn't already in the list before catx'ing.

View solution in original post

4 REPLIES 4
Astounding
PROC Star

That's actually a question that has been asked a few times before.  Here's one example:

 

https://communities.sas.com/t5/SAS-Programming/concatenate-last-observation-variable-value/m-p/42005...

 

Be sure to add a LENGTH statement for your new variable (the concatenated zip codes) so it has room to store multiple values.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not tested, post test data in the form of a datastep in future please:

data want;
  set have;
  length zcode $2000;
  retain zcode;
  by id;
  if first.id then zcode=zipcode;
  else if index(zcode,zipcode)=0 then zcode=catx(',',zcode,zipcode);
run;

So just check the that zipcode isn't already in the list before catx'ing.

SuryaKiran
Meteorite | Level 14
data have;
input ID:$3. zipcode:$8.;
datalines;
001 02115
001 19103
002 10001
003 33130
003 30303
003 20008
;
run;

proc sort data=have nodupkey;
by id zipcode;
run;

data want(drop=zipcode);
format zipcode_all $50.;
do until (last.id);
set have;
by id;
zipcode_all=catx(',',zipcode_all,zipcode);
end;
run;
Thanks,
Suryakiran
ballardw
Super User

@nickspencer wrote:
I have a dataset that has ID and the zip code. There are some duplicate IDs with different zipcodes. I need to remove the duplicate ids by concatenating the multiple zipcodes separated by a comma.
The dataset looks like this.

ID zipcode
001 02115
001 19103
002 10001
003 33130
003 30303
003 20008



The final dataset should look like this.

ID zipcode
001 02115, 19103
002 10001
003 33130, 30303, 20008

Any suggestions on how this can be achieved?

Thanks.

Every time I see one of these requirements I ask how the concatenated variable is to be used. I seldom get a good response.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 3278 views
  • 0 likes
  • 5 in conversation