Hello
I have 2 data sets:
1-Date Set Have1 have information of customer accounts that belong to each person ID.
2-Data set Have2 have information of people ID's that belong to each customer account.
Please note that CAT_Customers is concatenate of customer accounts
Please note that CAT_ID is concatenate of people ID
The task is to add to Have2 the following 2 columns:
1-CAT_Customers
It is concatenate of customers that belongs to the ID's
2-CAT_distinct_Customers
It is same as CAT_Customers but after clean duplicate values
Data Have1;
Input ID Customer1 Customer2 Customer3 CAT_Customers $;
cards;
111111 11 33 22 11,22,33
222222 11 . . 11
444444 22 . . 22
666666 33 . . 33
555555 44 . . 44
;
Run;
Data Have2;
input Customer ID1 ID2 CAT_ID $20.;
cards;
11 111111 222222 111111,222222
33 111111 666666 111111,666666
22 111111 444444 111111,444444
11 111111 222222 111111,222222
44 555555 . 555555
;
Run;
Data want1;
Input Customer ID1 ID2 CAT_ID $14. CAT_Customers $12. CAT_distinct_Customers $;
cards;
11 111111 222222 111111,222222 11,22,33,11 11,22,33
33 111111 666666 111111,666666 11,22,33,33 11,22,33
22 111111 444444 111111,444444 11,22,33,22 11,22,33
11 111111 222222 111111,222222 11,22,33,11 11,22,33
44 555555 . 555555 44
;
Run;
Are you actually interested in CAT_Customers? Or just the distinct list?
What have you tried?
Have you create CAT_Customers and CAT_ID? Are they necessary?
Before creating the "Want", i fixed some issues with your "have" datasets: IDs should always be alphanumeric.
data work.Have1;
length ID $ 5 Customer1-Customer3 $ 2 Cat_Customers $ 20;
infile cards missover delimiter='|';
input Id Customer1 Customer2 Customer3 Cat_Customers;
cards;
111111|11|33|22|11,22,33
222222|11| | |11
444444|22| | |22
666666|33| | |33
555555|44| | |44
;
run;
data work.Have2;
length Customer $ 2 Id_1-Id_2 $ 5 Cat_Ids $ 50;
infile cards missover delimiter='|';
input Customer Id_1 Id_2 Cat_Ids;
cards;
11|111111|222222|11111,22222
33|111111|666666|11111,66666
22|111111|444444|11111,44444
11|111111|222222|11111,22222
44|555555| |55555
;
run;
One could, of course, bring the data you have in better form, but this hardly helps to get the job done.
data want;
set Have2;
if _n_ = 1 then do;
if 0 then set work.Have1(keep= Id Cat_Customers);
declare hash h(dataset: 'work.Have1(keep= Id Cat_Customers)');
h.defineKey('Id');
h.defineData('Cat_Customers');
h.defineDone();
end;
length Customer_List CAT_distinct_Customers $ 100;
array list Id_:;
do i = 1 to dim(list);
if not missing(list[i]) then do;
rc = h.find(key: list[i]);
Customer_List = catx(',', Customer_List, Cat_Customers);
if missing(CAT_distinct_Customers) then do;
CAT_distinct_Customers = Cat_Customers;
end;
else do;
do j = 1 to countw(Cat_Customers, ',');
cust = scan(Cat_Customers, j, ',');
if not findw(CAT_distinct_Customers, cust, ',', 't') then do;
CAT_distinct_Customers = catx(',', CAT_distinct_Customers, cust);
end;
end;
end;
end;
end;
drop Id Cat_Customers i j cust rc;
rename Customer_List = Cat_Customers;
run;
Data Have1;
Input ID Customer1 Customer2 Customer3 CAT_Customers $;
cards;
111111 11 33 22 11,22,33
222222 11 . . 11
444444 22 . . 22
666666 33 . . 33
555555 44 . . 44
;
Run;
Data Have2;
input Customer ID1 ID2 CAT_ID $20.;
cards;
11 111111 222222 111111,222222
33 111111 666666 111111,666666
22 111111 444444 111111,444444
11 111111 222222 111111,222222
44 555555 . 555555
;
Run;
option missing=' ';
data want;
if _n_=1 then do;
if 0 then set have1(keep=id CAT_Customers);
declare hash h(dataset:'have1');
h.definekey('ID');
h.definedata('CAT_Customers');
h.definedone();
end;
set have2;
length new_CAT_Customers CAT_distinct_Customers $ 200;
call missing(CAT_Customers);
rc=h.find(key:ID1);
CAT_distinct_Customers=catx(',',CAT_distinct_Customers,CAT_Customers);
call missing(CAT_Customers);
rc=h.find(key:ID2);
CAT_distinct_Customers=catx(',',CAT_distinct_Customers,CAT_Customers);
do i=1 to countw(CAT_distinct_Customers,', ');
temp=scan(CAT_distinct_Customers,i,',');
if not findw(new_CAT_Customers,strip(temp),', ') then new_CAT_Customers=catx(',',new_CAT_Customers,temp) ;
end;
drop id CAT_Customers rc i temp;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.