BookmarkSubscribeRSS Feed
Zeus_Olympus
Fluorite | Level 6

Dear all,

 

I have two tables A  and B

  data a;
     subj=1; dat='HI_76_KAM_BAK_887_PD34_KAR'; output;
     subj=2; dat='FT_01_S_NDRI_KAR'; output;
     subj=3; dat='DY_02_DEALIRITEON_843_KAR'; output;
     subj=4; dat='DY_02_RET_DEALIRITEON_ST_843_KAR'; output;
     subj=5; dat='BV_719_B_SIAFT_JH_500_KAR'; output;
     subj=6; dat='BV_303_B_SIAFT_500_KAR'; output;
     subj=7; dat='BV_711_B_SIAFT_KAR'; output;
     subj=8; dat='BV_714_XEMEYDE5_CESIC_890_CAS_KAR'; output;
     subj=9; dat='BV_714_MIL_XMD5_CSIC_CAS_KAR'; output;
     subj=10; dat='BV_714_TJIRSIC_CAS_KAR'; output;
     subj=11; dat='DY_30_KAM_BAK_YR_956_KAR'; output;
     subj=12; dat='DY_30_PIC_KAM_BAK_956_KAR'; output;
     subj=13; dat='AM_07_MH_KAR'; output;
     subj=14; dat='AM_53_JIDIDUM_DYDERS_KAR'; output;

    run;
    
    data b;
     id=1; dat_new='KAM_BAK_887'; other_var=1; output;
     id=2; dat_new='S_NDRI'; other_var=2; output;
     id=3; dat_new='DEALIRITEON_843'; other_var=3; output;
     id=4; dat_new='B_SIAFT_500'; other_var=11; output;
     id=5; dat_new='XEMEYDE5_CESIC_CAS'; other_var=21; output;
     id=6; dat_new='KAM_BAK_956'; other_var=25; output;
     id=7; dat_new='MH'; other_var=21; other_var=28; output;
     id=7; dat_new='JIDIDUM_DYDERS'; other_var=29; output;
run;

I need to appropriately join them and get the table WANT as follows:

 

data want;
dat=HI_76_KAM_BAK_887_PD34_KAR;    dat_new=KAM_BAK_887;output; 
dat=FT_01_S_NDRI_KAR;              dat_new=S_NDRI;output;
dat=DY_02_DEALIRITEON_843_KAR;     dat_new=DEALIRITEON_843;output;
dat=BV_303_B_SIAFT_500_KAR;        dat_new=B_SIAFT_500;output;
dat=BV_714_MIL_XMD5_CSIC_CAS_KAR;  dat_new=XEMEYDE5_CESIC_CAS;output;
dat=DY_30_PIC_KAM_BAK_956_KAR;     dat_new=KAM_BAK_956;output;
dat=AM_07_MH_KAR;                  dat_new=MH;output;
dat=AM_53_JIDIDUM_DYDERS_KAR;      dat_new=JIDIDUM_DYDERS;output;
run;

I have tried to use something like this within a PROC SQL join but I have failed.

ifc(indexw(strip(dat),strip(dat_new),'_'),'OK','NOT OK');

Any suggestions would be more than welcome.

 

Thanking you in advance.

 

 

 

 

 

2 REPLIES 2
PaigeMiller
Diamond | Level 26

Note: I have modified the code for A and B to include LENGTH statements.

 

data a;
    length dat $ 40;
     subj=1; dat='HI_76_KAM_BAK_887_PD34_KAR'; output;
     subj=2; dat='FT_01_S_NDRI_KAR'; output;
     subj=3; dat='DY_02_DEALIRITEON_843_KAR'; output;
     subj=4; dat='DY_02_RET_DEALIRITEON_ST_843_KAR'; output;
     subj=5; dat='BV_719_B_SIAFT_JH_500_KAR'; output;
     subj=6; dat='BV_303_B_SIAFT_500_KAR'; output;
     subj=7; dat='BV_711_B_SIAFT_KAR'; output;
     subj=8; dat='BV_714_XEMEYDE5_CESIC_890_CAS_KAR'; output;
     subj=9; dat='BV_714_MIL_XMD5_CSIC_CAS_KAR'; output;
     subj=10; dat='BV_714_TJIRSIC_CAS_KAR'; output;
     subj=11; dat='DY_30_KAM_BAK_YR_956_KAR'; output;
     subj=12; dat='DY_30_PIC_KAM_BAK_956_KAR'; output;
     subj=13; dat='AM_07_MH_KAR'; output;
     subj=14; dat='AM_53_JIDIDUM_DYDERS_KAR'; output;

    run;
    
    data b;
        length dat_new $ 24;
     id=1; dat_new='KAM_BAK_887'; other_var=1; output;
     id=2; dat_new='S_NDRI'; other_var=2; output;
     id=3; dat_new='DEALIRITEON_843'; other_var=3; output;
     id=4; dat_new='B_SIAFT_500'; other_var=11; output;
     id=5; dat_new='XEMEYDE5_CESIC_CAS'; other_var=21; output;
     id=6; dat_new='KAM_BAK_956'; other_var=25; output;
     id=7; dat_new='MH'; other_var=21; other_var=28; output;
     id=7; dat_new='JIDIDUM_DYDERS'; other_var=29; output;
run;

proc sql;
    create table want as select
        a.dat,
        b.dat_new
    from a left join b 
        on find(a.dat,b.dat_new,'t')>=1
        where not missing(b.dat_new);
quit;

The reason that INDEXW (and also FINDW) will not work is that in record one, the words it looks at with the '_' delimiter are HI 76 KAM BAK 887 etc and it never finds one that matches KAM_BAK.

 

The approach in my code will fail if in data set A, there are values of DAT that contain MH, for example, if DAT='AMHERST' this will match data set b ID=7 DAT_NEW='MH'. I do not have at this moment a way to get around this.

 

Maybe regular expressions will have a better solution, but I leave that to others.

--
Paige Miller
mkeintz
PROC Star

What is the matching rule you wish to apply?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 658 views
  • 0 likes
  • 3 in conversation