Merge with part of ID

Accepted Solution Solved
Reply
Super Contributor
Posts: 339
Accepted Solution

Merge with part of ID


Hello,

is it possible to merge ID's if only a part of the key is contained in one of the ID's?

I mean something like this:

Data Key;
  Input Key $12. Value;
  Datalines;
AAA_BBB_EEE 2
AAA_YYY_EEE 3
;

Data List;
  Input Text $19.;
  Datalines;
AAA_BBB_CCC_DDD_EEE
AAA_BBB_YYY_EEE
;

Data Want;
  Input Text $19. Value;
  Datalines;
AAA_BBB_CCC_DDD_EEE 2
AAA_BBB_YYY_EEE     3
;

Thanks&kind regards


Accepted Solutions
Solution
‎03-20-2015 10:01 AM
Respected Advisor
Posts: 3,156

Re: Merge with part of ID

How you define 'part of a key' will affect the details of the merge/join, but it is possible:

Data Key;

     Input Key $12. Value;

     Datalines;

AAA_BBB_EEE 2

AAA_YYY_EEE 3

;

Data List;

     Input Text $19.;

     Datalines;

AAA_BBB_CCC_DDD_EEE

AAA_BBB_YYY_EEE

;

PROC SQL;

     CREATE TABLE WANT AS

           SELECT A.TEXT, B.VALUE FROM LIST A, KEY B

                WHERE A.TEXT EQT SUBSTR(B.KEY, 1,7);

QUIT;

View solution in original post


All Replies
Solution
‎03-20-2015 10:01 AM
Respected Advisor
Posts: 3,156

Re: Merge with part of ID

How you define 'part of a key' will affect the details of the merge/join, but it is possible:

Data Key;

     Input Key $12. Value;

     Datalines;

AAA_BBB_EEE 2

AAA_YYY_EEE 3

;

Data List;

     Input Text $19.;

     Datalines;

AAA_BBB_CCC_DDD_EEE

AAA_BBB_YYY_EEE

;

PROC SQL;

     CREATE TABLE WANT AS

           SELECT A.TEXT, B.VALUE FROM LIST A, KEY B

                WHERE A.TEXT EQT SUBSTR(B.KEY, 1,7);

QUIT;

Super Contributor
Posts: 339

Re: Merge with part of ID

Thanks to you I think am getting closer. :-)

Unfortunately, my like-statement doesn't fetch the "FFF"-part of my example. And is there a possibility to keep the case-statement dynamic, if the key has different lengths?

Data Key;
     Input Key $12. Value;
     Datalines;
AAA_BBB_FFF 2
AAA_YYY_EEE 3
AAA_SSS
;
Data List;
     Input Text $19.;
     Datalines;
AAA_BBB_CCC_DDD_FFF
AAA_BBB_YYY_EEE
AAA_CCC_SSS
;

Proc SQL;
  Select A.Value, A.Key, B.Text,
         Scan(A.Key,1,'_') As Subs1, Scan(A.Key,2,'_') As Subs2, Scan(A.Key,3,'_') As Subs3,
         B.Text like "%"!!Scan(A.Key,1,'_')!!"%" As Flag1, B.Text like "%"!!Scan(A.Key,2,'_')!!"%" As Flag2, B.Text like "%"!!Scan(A.Key,3,'_')!!"%" As Flag3
  From Key A, List B
  /*
  Where (Case When B.Text like "%"!!Scan(A.Key,1,'_')!!"%" AND
                   B.Text like "%"!!Scan(A.Key,2,'_')!!"%" AND
       B.Text like "%"!!Scan(A.Key,3,'_')!!"%"
         Then 1 Else 0 End)=1
  */
;
Quit;

Super User
Super User
Posts: 7,720

Re: Merge with part of ID

IMO your getting in a tangle.  Split the ID up into relevant points, sorting the orders of them if necessary, getting them into standard formats etc.  Then look at merging them.  An example:

ABCD-First row-1245-DEB

would not match

First ROW-1245-DEB-ABCD

However if you know to pull certain parts out into ids:

ID1=all the numbers

ID2=three chars

ID3=four chars

ID4=everything else

Then you can start to work with it.  Provide a bit more detail on your ID.

Super User
Posts: 9,874

Re: Merge with part of ID

What you gonna do if these words have different order ?

AAA_BBB_FFF 2

BBB_AAA_FFF 2

AAA_YYY_EEE 3

AAA_SSS



Data Key;

     Input Key : $12. Value;

     Datalines;

AAA_BBB_FFF 2

AAA_YYY_EEE 3

AAA_SSS .

;

Data List;

     Input Text : $19.;

     Datalines;

AAA_BBB_CCC_DDD_FFF

AAA_BBB_YYY_EEE

AAA_CCC_SSS .

;

data Key;

set Key;

length _Key $ 100;

_Key='/.*'||strip(tranwrd(Key,'_','.*'))||'.*/';

run;

proc sql;

create table want as

  select Text,Value

   from list,Key

    where prxmatch(_Key,Text);

quit;





Xia Keshan

Respected Advisor
Posts: 4,135

Re: Merge with part of ID

I like your thinking. The only concern I have with your approach is that very fast a lot of RegEx will get compiled and all of them will stay in memory. If I see this right then the number of compiled RegEx in memory will be the Cartesian product of the 2 source tables.

Below an approach which would avoid such issues.

data want(keep=Text Value);

  if _n_=1 then

    do;

      length _prxid 8;

      dcl hash h();

      _rc=h.defineKey('_prxid');

      _rc=h.defineData('_prxid','value');

      _rc=h.defineDone();

      dcl hiter hh('h');

      do _i=1 to _nobs;

        set key point=_i nobs=_nobs;

        _prxid=prxparse('/.*'||strip(tranwrd(Key,'_','.*'))||'.*/');

        _rc=h.add();

      end;

    end;

  set list;

  _rc = hh.first();

  do while (_rc = 0);

    if prxmatch(_prxid,Text) then

      output;

    _rc = hh.next();

  end;

run;

...and I believe below SQL would also compile the RegEx once only:

proc sql feedback;

  create table want as

    select l.text, k.value

      from list l,

       ( select prxparse('/.*' || strip(tranwrd(key.key, '_', '.*')) || '.*/') as _prxid, key.value

           from key

       ) as k

     where prxmatch(_prxid, l.text)

    ;

quit;

Super User
Posts: 9,874

Re: Merge with part of ID

I don't know if the following would work out. if you only want compile Regular Expression once . Try :

_Key='/.*'||strip(tranwrd(Key,'_','.*'))||'.*/o';

Respected Advisor
Posts: 4,135

Re: Merge with part of ID

No, this doesn't work. This way the RegEx gets only once compiled (so the very first key value) and you end up with a "want" table having the very first row from "list" 3 times repeated (and no match to the other rows).

Super Contributor
Posts: 339

Re: Merge with part of ID

The actual data looks like in the example below. As Hai.kuo said, I can't merge, I only can join. I am not sure, if the keys always have the right logical order. Since this is my first time I actually use a custom function, I would appreciate it, if you could take a look at the following program:

Libname MyFunc "C:\Test"; * !!;

Proc FCMP OutLib=MyFunc.MySubs.ScatteredContains;
Function ScatteredContains(Key $,Text $);
  Flag=1;
  Do i=1 To CountW(Key,'_') By 1;
    Flag=Min(Flag,Find(Text,Trim(Scan(Key,i,'_'))));
  End;
  Return (Flag);
EndSub;
Run;

Options CmpLib=MyFunc.MySubs;

Data Key;
  Length Top $15. Bottom $15.;
  Input @1 Key $48. @49 Qty;
  Top=Substr(Key,1,15); Bottom=Substr(Key,Length(Trim(Key))-14,15);
  Datalines;
0004_4600907_01_0004_4600906_01_0004_4500770_01 0.8
0004_4600907_01_0004_4600906_01_0004_4500770_01 0.9
0004_4600907_02_0004_4600906_01_0004_4500770_01 0.5
0004_4600907_02_0004_4600906_02_0004_4500770_03 0.6
0004_4600907_01_0004_4600906_01_0004_4500770_02 1.0
0004_4600907_02_0004_4600906_01_0004_4500770_02 0.85
0004_4600907_01_0004_4600422_01_0004_4500040_01 0.1
0004_4600907_01_0004_4600422_01_0004_4500040_02 0.15
0001_4600043_01_0001_4600040_01                 0.8
0001_4600043_01_0001_4600040_02                 0.7
;

Data List;
  Input @1 Top $15. @17 Bottom $15. @33 Text $85.;
  Datalines;
0004_4600907_01 0004_4500770_01 0004_4600907_01_0004_4600906_01_0004_4600422_01_0004_4500770_01_0004_4500040_01
0004_4600907_02 0004_4500770_02 0004_4600907_02_0004_4600906_02_0004_4600422_01_0004_4500770_02_0004_4500040_02
0004_4600907_01 0004_4500040_01 0004_4600907_01_0004_4600906_01_0004_4600422_01_0004_4500770_02_0004_4500040_01
0004_4600907_01 0004_4500040_02 0004_4600907_01_0004_4600906_02_0004_4600422_01_0004_4500770_01_0004_4500040_02
0001_4600043_01 0001_4600040_01 0001_4600043_01_0001_4600040_01
0001_4600043_01 0001_4600040_02 0001_4600043_01_0001_4600040_02
;

Data Key (Drop=i);
  Set Key;
  Array ID $ Top Bottom Key;
  Do over ID;
  Do i=1 To Length(ID) By 16;
    Substr(ID,i+4,1)="x";
Substr(ID,i+12,1)="x";
  End;
  End;
Run;

Data List (Drop=i);
  Set List;
  Array ID $ Top Bottom Text;
  Do over ID;
  Do i=1 To Length(ID) By 16;
    Substr(ID,i+4,1)="x";
Substr(ID,i+12,1)="x";
  End;
  End;
Run;

Proc SQL;
  Select L.Top, L.Bottom, K.Key, L.Text, K.Qty
  From Key K, List L
  Where L.Top eq K.Top AND
        L.Bottom eq K.Bottom AND
        ScatteredContains(K.Key,L.Text);
Quit;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 335 views
  • 6 likes
  • 5 in conversation