Input ID Zip;
101 400201
101 400333
102 650067
103 560103
103 560067
;
How to get output.
101 400201,400333
102 650067
103 560103,560067
I feel compelled to repeat advice already given.
Don't re-arrange your data like this. Your programming would be much easier if you did not re-arrange data like this. To quote @ballardw :
And just what do you expect to do with that resulting data set that can't be done with the one you have now? Anything that uses that codelist variable is going to be hard to actually work with for counting, modeling, just about any analysis or graphing?
How many codes do you expect to place in a single variable? You should specify a length for the variable long enough to hold that many values plus the comma separator.
Are there any other variables to be involved in this? If so, what happens with them?
This question gets asked in one form another at least once per week.
If the data is sorted by the ID variable then something like:
data want; set have; by id;
length codelist $ 200.; retain codelist; if first.id then call missing (codelist); codelist = catx(',',codelist,zip); i last.id then output; keep id codelist; run;
If the data is not sorted by ID then do so.
This sets the length of the combined variable to 200 characters which will hold about 28 codes. Adjust the length on the length statement to match actual need.
When you use BY group processing SAS creates automatic variables that indicate whether a particular observation is the first or last of a particular by variable grouping and is a 1/0 (true/false) value accessed by using first.variablename and last.variable name.
The code above clears the existing codelist variable when the first observation of the Id is encountered, then adds the zip to the list.
The values are written out to the data set only when the last observation for an id value is encountered.
And just what do you expect to do with that resulting data set that can't be done with the one you have now? Anything that uses that codelist variable is going to be hard to actually work with for counting, modeling, just about any analysis or graphing?
So, what would be a better way to structure data like this for for counting, modeling, just about any analysis or graphing?
The better way is to keep the data set as it was originally presented. In other words, one observation for each value of ID and ZIP. Not two (or more) zips on one line.
NOTE: All of this depends on what the next step is, what analysis or report or chart is going to be created. Certainly, there could be very rare and unusual cases where you want two (or more) zips on one line, but even then I would be skeptical that this is really needed (as it said, it would be very rare)
I feel compelled to repeat advice already given.
Don't re-arrange your data like this. Your programming would be much easier if you did not re-arrange data like this. To quote @ballardw :
And just what do you expect to do with that resulting data set that can't be done with the one you have now? Anything that uses that codelist variable is going to be hard to actually work with for counting, modeling, just about any analysis or graphing?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.