Solved: Merge with SUBSTR

Ronein · Posted 12-04-2024 12:12 AM

Hello

Let's say that there are 2 fields that identify a customer- HighCustID and LowCustID.

Let's think about it like a family structure where some people can belong to same family.

So, under specific HighCustID there can be multiple LowCustID.

Each person (LowCustID) can be Main (Ind_Main=1) or not main (Ind_Main=0)

The number of HighCustID is defined as the number of the main person .

For example:

In month 202410 , 2 people : 111(Main) and 222 (not main ) belong to same HighCustID so the number of the HighCustID will be 111 ( as the number of the main)

In month 202409, these 2 people changed the functions and now 111 is not main and 222 is main so the number of the HighCustID will be 222 ( as the number of the main).

For each month there are 2 tables:

table in level of HighCustID with data of wealth

table that shows the ownership structure

The task is to perform Left join from table t202410_HighLevel to table t202409_HighLevel.

The problem is that the number of HighCustID can be changed during months and I still want to merge them.

So,

I want to merge 111(t202410_HighLevel) with 222(t202409_HighLevel)

I want to merge 444(t202410_HighLevel) with 444(t202409_HighLevel)

and about 333(t202410_HighLevel) he doesnt exist in t202409_HighLevel (It is a new person that join )

My question-

Here is the code I created and in last step I want to perform merge in the following way:

on a.HighCustID=b.HighCustID OR a.HighCustID=substr(one of the elemetns in IDs from table t202409_HighLevel_b)

Can anyone help to write this code that search if a.HighCustID equal to one of the elements (In concatentate field) from t202409_HighLevel_b?

My code is working perfect but there are some things I want to do:

In real life It can have more commas in the concatenate and then I want to create merge that automatically check how many commas have and try to merge with all possible locations .

Here is my full code:

/***********INPUT data sets*****************/
/***********INPUT data sets*****************/
/***********INPUT data sets*****************/
Data t202410_HighLevel;
Input HighCustID wealth;
cards;
111 100
333 80
444 180
;
Run;


Data t202409_HighLevel;
Input HighCustID wealth;
cards;
222 110
444 80
;
Run;


Data t202410_ownership;
Input HighCustID LowCustID Ind_Main;
cards;
111 111 1
111 222 0
333 333 1
444 444 1
;
Run;
Data t202409_ownership;
Input HighCustID LowCustID Ind_Main;
cards;
222 111 0
222 222 1
333 333 1
;
Run;


/***********Data sets that I created in order to get the wanted data set*****/
/***********Data sets that I created in order to get the wanted data set*****/
/***********Data sets that I created in order to get the wanted data set*****/
data want_ownership_202410;
length CAT $20.;
do until (last.HighCustID);
set t202410_ownership;
by HighCustID notsorted;
CAT=catx(',',CAT,LowCustID);
end;
drop LowCustID;
run;

data want_ownership_202409;
length CAT $20.;
do until (last.HighCustID);
set t202409_ownership;
by HighCustID notsorted;
CAT=catx(',',CAT,LowCustID);
end;
drop LowCustID;
run;




proc sql;
create table t202409_HighLevel_b as
select a.*,b.CAT
from t202409_HighLevel as a
left join want_ownership_202409  as b
on a.HighCustID=b.HighCustID
;
quit;



 
proc sql;
create table Want as
select a.HighCustID as HighCustID202409,a.wealth as wealth202410,b.wealth as wealth202409
from t202410_HighLevel  as a
left join t202409_HighLevel_b  as b
on a.HighCustID=b.HighCustID
OR
CAT(a.HighCustID)=scan(b.CAT,1,",")
OR
CAT(a.HighCustID)=scan(b.CAT,2,",")
OR
CAT(a.HighCustID)=scan(b.CAT,3,",")
OR
CAT(a.HighCustID)=scan(b.CAT,4,",")
;
quit;

Patrick · Posted 12-04-2024 05:07 AM

To stick with the coding approach you've taken I believe to understand that you're just asking how to deal with a varying number of comma separated terms in your concatenated string. If so then why not use findw() instead?

findw(strip(b.CAT),CAT(a.HighCustID)) >0

View solution in original post

Patrick · Posted 12-04-2024 05:07 AM

To stick with the coding approach you've taken I believe to understand that you're just asking how to deal with a varying number of comma separated terms in your concatenated string. If so then why not use findw() instead?

findw(strip(b.CAT),CAT(a.HighCustID)) >0

Tom · Posted 12-04-2024 10:36 AM

INDEXW() will work.

You can use CATS() function strip leading/trailing spaces from HIGHCUSTID (or convert it to a string if it is numeric).

ON indexw(b.CAT,cats(a.HighCustID),',')

If you want to ignore the case of any letters in the ids the you might want to use FINDW() instead as it has a modifiers arguments you can add to ignore case. It also has modifiers to remove the spaces.

ON findw(b.CAT,a.HighCustID,',','it')

Merge with SUBSTR

Re: Merge with SUBSTR

Re: Merge with SUBSTR

Re: Merge with SUBSTR

Registration is open

Merge with SUBSTR

Re: Merge with SUBSTR

Re: Merge with SUBSTR

Re: Merge with SUBSTR

Registration is open

SAS Training: Just a Click Away