BookmarkSubscribeRSS Feed
supp
Pyrite | Level 9
I have two datasets, file1 and file2. File1 contains a free form variable called 'jobname'. File2 contains an eight character variable called 'name'. It is likely that the variable 'name' from file2 is contained somewhere in the free form variable 'jobname'. What I would like to do is search jobname to see if I can find a match with name. What would be the best way to code something like this?
4 REPLIES 4
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
It's unclear if you want to pass the entire file2 looking for all 'name' values in file1?

Investigate using the hash object to load 'name' values into memory and then you can use a DATA step function like INDEX to search for a match. The facility is explained in detail at the SAS support http://support.sas.com/ and use the SEARCH facility.

Scott Barry
SBBWorks, Inc.
Patrick
Opal | Level 21
As the approach below results in a cartesian product using a hash table will be much better for performance (if the source tables are "big").

For small tables: I find the approach below easier to code.

data file1;
jobname='this is job abc001 and abc002'; output;
jobname='this is job abc003 '; output;
jobname='this is job abc004 '; output;
jobname='this is job abc005 '; output;
run;
data file2;
name='abc001';output;
name='abc002';output;
name='abc003';output;
name='abc004';output;
name='abc004';output;
name='job';output;
run;

proc sql;
select DISTINCT file1.*,ifc(file2.name ne '','1','0') as found
from file1 left join (select distinct name from file2) as file2
on find(file1.jobname,file2.name,'i');
quit;

HTH
Patrick Added a DISTINCT to the outer select to avoid multiplying the result.
In case that there can be duplicate values for "jobname" in file1 which should be kept backmerging of file1 (left join with result set) would be necessary.
If your data is like in the example I'm right now not so sure how this could be solved with a hash table as there is no key. The only way a can think of right now would be to create an artificial and not unique key and then "hash" over this key with only one value using the hash iterator object - but this sounds to me close to a cartesian product.


Message was edited by: Patrick
Patrick
Opal | Level 21
I gave your question another thought. Here my prefered solution:

data file1;
jobname='this is job abc001 and abc002'; output;
jobname='this is job abc003 '; output;
jobname='this is job abc004 '; output;
jobname='this is job abc005 '; output;
run;
data file2;
name='abc001';output;
name='abc002';output;
name='abc003';output;
name='abc004';output;
name='abc004';output;
run;

proc sql;
create table File2Distinct as
select distinct name from file2;
quit;

data file1;
set file1;
do i=1 to nobs;
set File2Distinct point=i nobs=nobs;
flag= find(jobname,name,'i')>1;
if flag then leave;
end;
run;
Flip
Fluorite | Level 6
Why not just


proc sql;
create table result as select a.*, b.*
from file2 a LEFT JOIN file1 b on
b.jobname like cats('%', a.name , '%');
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 850 views
  • 0 likes
  • 4 in conversation