I have two datasets, file1 and file2. File1 contains a free form variable called 'jobname'. File2 contains an eight character variable called 'name'. It is likely that the variable 'name' from file2 is contained somewhere in the free form variable 'jobname'. What I would like to do is search jobname to see if I can find a match with name. What would be the best way to code something like this?
It's unclear if you want to pass the entire file2 looking for all 'name' values in file1?
Investigate using the hash object to load 'name' values into memory and then you can use a DATA step function like INDEX to search for a match. The facility is explained in detail at the SAS support http://support.sas.com/ and use the SEARCH facility.
As the approach below results in a cartesian product using a hash table will be much better for performance (if the source tables are "big").
For small tables: I find the approach below easier to code.
jobname='this is job abc001 and abc002'; output;
jobname='this is job abc003 '; output;
jobname='this is job abc004 '; output;
jobname='this is job abc005 '; output;
select DISTINCT file1.*,ifc(file2.name ne '','1','0') as found
from file1 left join (select distinct name from file2) as file2
Added a DISTINCT to the outer select to avoid multiplying the result.
In case that there can be duplicate values for "jobname" in file1 which should be kept backmerging of file1 (left join with result set) would be necessary.
If your data is like in the example I'm right now not so sure how this could be solved with a hash table as there is no key. The only way a can think of right now would be to create an artificial and not unique key and then "hash" over this key with only one value using the hash iterator object - but this sounds to me close to a cartesian product.