BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nickspencer
Obsidian | Level 7
I have a dataset that has ID and the zip code. There are some duplicate IDs with different zipcodes. I need to remove the duplicate ids by concatenating the multiple zipcodes separated by a comma.
The dataset looks like this.

ID zipcode
001 02115
001 19103
002 10001
003 33130
003 30303
003 20008



The final dataset should look like this.

ID zipcode
001 02115, 19103
002 10001
003 33130, 30303, 20008

Any suggestions on how this can be achieved?

Thanks.
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not tested, post test data in the form of a datastep in future please:

data want;
  set have;
  length zcode $2000;
  retain zcode;
  by id;
  if first.id then zcode=zipcode;
  else if index(zcode,zipcode)=0 then zcode=catx(',',zcode,zipcode);
run;

So just check the that zipcode isn't already in the list before catx'ing.

View solution in original post

4 REPLIES 4
Astounding
PROC Star

That's actually a question that has been asked a few times before.  Here's one example:

 

https://communities.sas.com/t5/SAS-Programming/concatenate-last-observation-variable-value/m-p/42005...

 

Be sure to add a LENGTH statement for your new variable (the concatenated zip codes) so it has room to store multiple values.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not tested, post test data in the form of a datastep in future please:

data want;
  set have;
  length zcode $2000;
  retain zcode;
  by id;
  if first.id then zcode=zipcode;
  else if index(zcode,zipcode)=0 then zcode=catx(',',zcode,zipcode);
run;

So just check the that zipcode isn't already in the list before catx'ing.

SuryaKiran
Meteorite | Level 14
data have;
input ID:$3. zipcode:$8.;
datalines;
001 02115
001 19103
002 10001
003 33130
003 30303
003 20008
;
run;

proc sort data=have nodupkey;
by id zipcode;
run;

data want(drop=zipcode);
format zipcode_all $50.;
do until (last.id);
set have;
by id;
zipcode_all=catx(',',zipcode_all,zipcode);
end;
run;
Thanks,
Suryakiran
ballardw
Super User

@nickspencer wrote:
I have a dataset that has ID and the zip code. There are some duplicate IDs with different zipcodes. I need to remove the duplicate ids by concatenating the multiple zipcodes separated by a comma.
The dataset looks like this.

ID zipcode
001 02115
001 19103
002 10001
003 33130
003 30303
003 20008



The final dataset should look like this.

ID zipcode
001 02115, 19103
002 10001
003 33130, 30303, 20008

Any suggestions on how this can be achieved?

Thanks.

Every time I see one of these requirements I ask how the concatenated variable is to be used. I seldom get a good response.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2354 views
  • 0 likes
  • 5 in conversation