Text mining and content categorization

Joining with wild card key, PROC SQL?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 90
Accepted Solution

Joining with wild card key, PROC SQL?

[ Edited ]

Hello all,

I have two lists one of 6100 files a fully qualified windows dir\path\filename.ext with random punctuation like:     &,()'

and another larger list 81K obs.   The 6100 records will each and everyone have a match in the larger list however the larger list has none of those punctuation marks any longer but does have a place holder underscore. "_" if any punctuation was taken out for example:

D:\path\file&name.ext                                   D:\path\file_name.ext

D:\path\path's\file&name.ext                        D:\path\path_s\file_name.ext

E:\path (my)\pat&h\new, file.SAV                    E:\path _my_\pat_h\new_ file.SAV

etc.

6100                                                             81K

Does anyone know how to crosswalk "left join" these two lists with wild cards and might have the time to toss me a slow pitch solfball?

 

/*for a clear list of what has been compressed out*/
compress(Path_File, "',&()", "")

These 6100 files are the last of 240K files I need to research for metadata, however I did not have the skill to make SAS read these files with the problematic punctuation still in the file paths/names.  PS the path file names can be upto 260 char long, all other data is derived from the path, file name and data type so no fields were given as examples.   TIA. -KJ


Accepted Solutions
Solution
‎03-28-2016 06:59 PM
Respected Advisor
Posts: 4,606

Re: Joining with wild card key, PROC SQL?

Use function translate()

 

a inner join b on a.newPath = translate(b.oldPath,"____", "&,()") 

PG

View solution in original post


All Replies
Solution
‎03-28-2016 06:59 PM
Respected Advisor
Posts: 4,606

Re: Joining with wild card key, PROC SQL?

Use function translate()

 

a inner join b on a.newPath = translate(b.oldPath,"____", "&,()") 

PG
Frequent Contributor
Posts: 90

Re: Joining with wild card key, PROC SQL?

[ Edited ]
proc sql;
create table testing as
Select a.Path_File1 as keya,
	   b.Path_File1 as keyb
from small_data_set a
inner join large_data_set b
on a.Path_File1 = translate(b.Path_File1,"_____", "&',()")
;
quit;

The data sets are exactly 4 files off and I hand checked them earler and removed their obs because of side issues, thanks for the help.   Smiley Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 490 views
  • 1 like
  • 2 in conversation