I need help with creating a new field using values in an existing column. I have a dataset with two columns, Customer and Customer_Number. Below is a sample dataset. I need to create a new field Customer_Number_Multi using the values in the Customer_Number column. I also need single quotes around each value and comma as a seperator. As you can see some Customers have one Customer Number and some have multiple Customer Numbers. The Customer_Number_Multi field should have all Customer_Number values for each Customer. So, for an example, for Customer ABC should have all, 123, 234 and 567 Customer_Numbers in Customer_Number_Multi. Thanks for your help.
| Customer | Customer_Number | Customer_Number_Multi |
| ABC | 123 | '123', '234', '567' |
| ABC | 234 | '123', '234', '567' |
| ABC | 567 | '123', '234', '567' |
| DEF | 321 | '321' |
| GHI | 345 | '345', '678' |
| GHI | 678 | '345', '678' |
| JAK | 256 | '256' |
Here's one way:
data have;
input Customer $ Customer_Number $;
datalines;
ABC 123
ABC 234
ABC 567
DEF 321
GHI 345
GHI 678
JAK 256
;
run;
data intermediate;
set have;
length Customer_Number_Multi $32.;
by customer;
retain Customer_Number_Multi;
if first.customer
then Customer_Number_Multi = "";
Customer_Number_Multi = catx(", ", Customer_Number_Multi, "'"||strip(customer_number)||"'");
if last.customer;
run;
proc sql;
create table want as
select h.*
,i.Customer_Number_Multi
from have h
left join intermediate i
on h.customer = i.customer;
quit;
* Note: the code above assumes your input dataset is sorted by customer (like your sample data) and that any duplicate customer_numbers should be repeated in customer_number_multi.
Output:
Do you really want to remerge the result back into all of the original observations?
Wouldn't be more useful to just have one observation per customer.
data want;
do until (last.customer);
set have;
by customer ;
length Customer_Number_Multi $200 ;
customer_number_multi=catx(',',customer_number_multi,quote(cats(customer_number),"'"));
end;
keep customer customer_number_multi;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.