Re: Advanced string comparison in a row

aycarus · Posted 02-06-2009 12:12 AM

I have been trying to think of a solution to this for awhile, but I have had no luck. I have a list of customers (with Customer ID and Possessions, which is a semi-colon delimited list as a string) and a list of objects (as strings), as follows:

customers:
Customer ID# Possessions
------------------ ----------------
1 'apples; oranges'
2 'pears'
3 'pears; cheese'

objects:
Object
---------
'apples'
'pineapples'
'cheese'

I want to find all customers whose Possessions can also be found in the list of Objects (only customers 1 and 3 in this case, since the first has 'apples' whereas the third has 'cheese'. Customer 2 has nothing which is in the list of objects). What is the best way to do this in SAS?

I have tried using a SQL statement that looks like

proc sql;
create table customer_pt as select *
from customers as a, objects as b
where a.Possessions like ('%' + b.Object + '%');
quit;

but that just produces errors...

Any help would be appreciated...

LinusH · Posted 02-06-2009 02:24 AM

I am certain that there are many ways to do this. On way I would try is first to split up your Possessions column so you will have each possession in separate columns, then transpose that table, so that you will have just one possession per row. After doing that, just do an inner join to the Objects table.

/Linus

Data never sleeps

aycarus · Posted 02-06-2009 09:29 AM

One additional confounding problem with this is that I want to also detect partial matches. Namely, if one customer has "cheese slices" for Possessions then I want that to also be flagged. I can't imagine it being possible to do this any other way than per-row string searching of some kind.

Cynthia_sas · Posted 02-06-2009 12:27 PM

Hi:
You might try the FIND function in your SQL WHERE clause:
[pre]
where find(a.possessions, trim(b.object)) gt 0;
[/pre]

cynthia

NickR · Posted 02-06-2009 01:37 PM

little more in detail...

proc sql;
create table both as select * from customer as a, object as b where index(possessions, trim(compress(object, "'")));
quit;

Cynthia_sas · Posted 02-06-2009 02:19 PM

Hi:
If you use the DSD option when you read the data, there are no quotes in the string that need to be compressed out.

cynthia

[pre]
data customers(keep=cust_id possession) ;
length cust_id 8 possession $100;
infile datalines dsd dlm=" ";
input cust_id $ possession $;
output customers;
return;
datalines4;
1 'apples; oranges'
2 'pears'
3 'pears; cheese'
4 'cheese slices'
5 'oranges; pears'
6 'cheeses tray'
;;;;
run;

title;
proc print data=customers;
title 'Original record';
run;

data objects(keep=object);
length object $20;

infile datalines dsd;
input object $;
output objects;
return;
datalines;
'apples'
'pineapples'
'cheese'
;
run;

proc print data=objects;
title 'original list of objects';
run;

proc sql;
create table sql_find as
select *
from customers as a, objects as b
where find(a.possession, trim(b.object)) gt 0;
quit;

proc print data=sql_find;
title 'After FIND in SQL';
run;

[/pre]

aycarus · Posted 02-06-2009 03:04 PM

Nice! I think it works. Thanks!

NickR · Posted 02-06-2009 03:57 PM

nice...you are tooo good...

Registration is open

SAS Training: Just a Click Away