Hello
I have 2 data sets:
1-Date Set Have1 have information of customer accounts that belong to each person ID.
2-Data set Have2 have information of people ID's that belong to each customer account.
Please note that CAT_Customers is concatenate of customer accounts
Please note that CAT_ID is concatenate of people ID
The task is to add to Have2 the following 2 columns:
1-CAT_Customers
It is concatenate of customers that belongs to the ID's
2-CAT_distinct_Customers
It is same as CAT_Customers but after clean duplicate values
Data Have1;
Input ID Customer1 Customer2 Customer3 CAT_Customers $;
cards;
111111 11 33 22 11,22,33
222222 11 . . 11
444444 22 . . 22
666666 33 . . 33
555555 44 . . 44
;
Run;
Data Have2;
input Customer ID1 ID2 CAT_ID $20.;
cards;
11 111111 222222 111111,222222
33 111111 666666 111111,666666
22 111111 444444 111111,444444
11 111111 222222 111111,222222
44 555555 . 555555
;
Run;
Data want1;
Input Customer ID1 ID2 CAT_ID $14. CAT_Customers $12. CAT_distinct_Customers $;
cards;
11 111111 222222 111111,222222 11,22,33,11 11,22,33
33 111111 666666 111111,666666 11,22,33,33 11,22,33
22 111111 444444 111111,444444 11,22,33,22 11,22,33
11 111111 222222 111111,222222 11,22,33,11 11,22,33
44 555555 . 555555 44
;
Run;
Are you actually interested in CAT_Customers? Or just the distinct list?
What have you tried?
Have you create CAT_Customers and CAT_ID? Are they necessary?
Before creating the "Want", i fixed some issues with your "have" datasets: IDs should always be alphanumeric.
data work.Have1;
length ID $ 5 Customer1-Customer3 $ 2 Cat_Customers $ 20;
infile cards missover delimiter='|';
input Id Customer1 Customer2 Customer3 Cat_Customers;
cards;
111111|11|33|22|11,22,33
222222|11| | |11
444444|22| | |22
666666|33| | |33
555555|44| | |44
;
run;
data work.Have2;
length Customer $ 2 Id_1-Id_2 $ 5 Cat_Ids $ 50;
infile cards missover delimiter='|';
input Customer Id_1 Id_2 Cat_Ids;
cards;
11|111111|222222|11111,22222
33|111111|666666|11111,66666
22|111111|444444|11111,44444
11|111111|222222|11111,22222
44|555555| |55555
;
run;
One could, of course, bring the data you have in better form, but this hardly helps to get the job done.
data want;
set Have2;
if _n_ = 1 then do;
if 0 then set work.Have1(keep= Id Cat_Customers);
declare hash h(dataset: 'work.Have1(keep= Id Cat_Customers)');
h.defineKey('Id');
h.defineData('Cat_Customers');
h.defineDone();
end;
length Customer_List CAT_distinct_Customers $ 100;
array list Id_:;
do i = 1 to dim(list);
if not missing(list[i]) then do;
rc = h.find(key: list[i]);
Customer_List = catx(',', Customer_List, Cat_Customers);
if missing(CAT_distinct_Customers) then do;
CAT_distinct_Customers = Cat_Customers;
end;
else do;
do j = 1 to countw(Cat_Customers, ',');
cust = scan(Cat_Customers, j, ',');
if not findw(CAT_distinct_Customers, cust, ',', 't') then do;
CAT_distinct_Customers = catx(',', CAT_distinct_Customers, cust);
end;
end;
end;
end;
end;
drop Id Cat_Customers i j cust rc;
rename Customer_List = Cat_Customers;
run;
Data Have1;
Input ID Customer1 Customer2 Customer3 CAT_Customers $;
cards;
111111 11 33 22 11,22,33
222222 11 . . 11
444444 22 . . 22
666666 33 . . 33
555555 44 . . 44
;
Run;
Data Have2;
input Customer ID1 ID2 CAT_ID $20.;
cards;
11 111111 222222 111111,222222
33 111111 666666 111111,666666
22 111111 444444 111111,444444
11 111111 222222 111111,222222
44 555555 . 555555
;
Run;
option missing=' ';
data want;
if _n_=1 then do;
if 0 then set have1(keep=id CAT_Customers);
declare hash h(dataset:'have1');
h.definekey('ID');
h.definedata('CAT_Customers');
h.definedone();
end;
set have2;
length new_CAT_Customers CAT_distinct_Customers $ 200;
call missing(CAT_Customers);
rc=h.find(key:ID1);
CAT_distinct_Customers=catx(',',CAT_distinct_Customers,CAT_Customers);
call missing(CAT_Customers);
rc=h.find(key:ID2);
CAT_distinct_Customers=catx(',',CAT_distinct_Customers,CAT_Customers);
do i=1 to countw(CAT_distinct_Customers,', ');
temp=scan(CAT_distinct_Customers,i,',');
if not findw(new_CAT_Customers,strip(temp),', ') then new_CAT_Customers=catx(',',new_CAT_Customers,temp) ;
end;
drop id CAT_Customers rc i temp;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.