Dear all,
I have two tables A and B
data a; subj=1; dat='HI_76_KAM_BAK_887_PD34_KAR'; output; subj=2; dat='FT_01_S_NDRI_KAR'; output; subj=3; dat='DY_02_DEALIRITEON_843_KAR'; output; subj=4; dat='DY_02_RET_DEALIRITEON_ST_843_KAR'; output; subj=5; dat='BV_719_B_SIAFT_JH_500_KAR'; output; subj=6; dat='BV_303_B_SIAFT_500_KAR'; output; subj=7; dat='BV_711_B_SIAFT_KAR'; output; subj=8; dat='BV_714_XEMEYDE5_CESIC_890_CAS_KAR'; output; subj=9; dat='BV_714_MIL_XMD5_CSIC_CAS_KAR'; output; subj=10; dat='BV_714_TJIRSIC_CAS_KAR'; output; subj=11; dat='DY_30_KAM_BAK_YR_956_KAR'; output; subj=12; dat='DY_30_PIC_KAM_BAK_956_KAR'; output; subj=13; dat='AM_07_MH_KAR'; output; subj=14; dat='AM_53_JIDIDUM_DYDERS_KAR'; output; run; data b; id=1; dat_new='KAM_BAK_887'; other_var=1; output; id=2; dat_new='S_NDRI'; other_var=2; output; id=3; dat_new='DEALIRITEON_843'; other_var=3; output; id=4; dat_new='B_SIAFT_500'; other_var=11; output; id=5; dat_new='XEMEYDE5_CESIC_CAS'; other_var=21; output; id=6; dat_new='KAM_BAK_956'; other_var=25; output; id=7; dat_new='MH'; other_var=21; other_var=28; output; id=7; dat_new='JIDIDUM_DYDERS'; other_var=29; output; run;
I need to appropriately join them and get the table WANT as follows:
data want; dat=HI_76_KAM_BAK_887_PD34_KAR; dat_new=KAM_BAK_887;output; dat=FT_01_S_NDRI_KAR; dat_new=S_NDRI;output; dat=DY_02_DEALIRITEON_843_KAR; dat_new=DEALIRITEON_843;output; dat=BV_303_B_SIAFT_500_KAR; dat_new=B_SIAFT_500;output; dat=BV_714_MIL_XMD5_CSIC_CAS_KAR; dat_new=XEMEYDE5_CESIC_CAS;output; dat=DY_30_PIC_KAM_BAK_956_KAR; dat_new=KAM_BAK_956;output; dat=AM_07_MH_KAR; dat_new=MH;output; dat=AM_53_JIDIDUM_DYDERS_KAR; dat_new=JIDIDUM_DYDERS;output; run;
I have tried to use something like this within a PROC SQL join but I have failed.
ifc(indexw(strip(dat),strip(dat_new),'_'),'OK','NOT OK');
Any suggestions would be more than welcome.
Thanking you in advance.
Note: I have modified the code for A and B to include LENGTH statements.
data a;
length dat $ 40;
subj=1; dat='HI_76_KAM_BAK_887_PD34_KAR'; output;
subj=2; dat='FT_01_S_NDRI_KAR'; output;
subj=3; dat='DY_02_DEALIRITEON_843_KAR'; output;
subj=4; dat='DY_02_RET_DEALIRITEON_ST_843_KAR'; output;
subj=5; dat='BV_719_B_SIAFT_JH_500_KAR'; output;
subj=6; dat='BV_303_B_SIAFT_500_KAR'; output;
subj=7; dat='BV_711_B_SIAFT_KAR'; output;
subj=8; dat='BV_714_XEMEYDE5_CESIC_890_CAS_KAR'; output;
subj=9; dat='BV_714_MIL_XMD5_CSIC_CAS_KAR'; output;
subj=10; dat='BV_714_TJIRSIC_CAS_KAR'; output;
subj=11; dat='DY_30_KAM_BAK_YR_956_KAR'; output;
subj=12; dat='DY_30_PIC_KAM_BAK_956_KAR'; output;
subj=13; dat='AM_07_MH_KAR'; output;
subj=14; dat='AM_53_JIDIDUM_DYDERS_KAR'; output;
run;
data b;
length dat_new $ 24;
id=1; dat_new='KAM_BAK_887'; other_var=1; output;
id=2; dat_new='S_NDRI'; other_var=2; output;
id=3; dat_new='DEALIRITEON_843'; other_var=3; output;
id=4; dat_new='B_SIAFT_500'; other_var=11; output;
id=5; dat_new='XEMEYDE5_CESIC_CAS'; other_var=21; output;
id=6; dat_new='KAM_BAK_956'; other_var=25; output;
id=7; dat_new='MH'; other_var=21; other_var=28; output;
id=7; dat_new='JIDIDUM_DYDERS'; other_var=29; output;
run;
proc sql;
create table want as select
a.dat,
b.dat_new
from a left join b
on find(a.dat,b.dat_new,'t')>=1
where not missing(b.dat_new);
quit;
The reason that INDEXW (and also FINDW) will not work is that in record one, the words it looks at with the '_' delimiter are HI 76 KAM BAK 887 etc and it never finds one that matches KAM_BAK.
The approach in my code will fail if in data set A, there are values of DAT that contain MH, for example, if DAT='AMHERST' this will match data set b ID=7 DAT_NEW='MH'. I do not have at this moment a way to get around this.
Maybe regular expressions will have a better solution, but I leave that to others.
What is the matching rule you wish to apply?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.