DATA Step, Macro, Functions and more

Joining on a wildcard

Accepted Solution Solved
Reply
Contributor JS
Contributor
Posts: 38
Accepted Solution

Joining on a wildcard

I have two tables.

 

First table has a bunch of record-level data.

ID Email_Address

1 josh@gmail.com

2 josh@yahoo.com

3 josh@hotmail.com

4 josh@aol.com

5 josh@facebook.com

 

Second table has a lookup on domains

flag

gmail.com

yahoo.com

 

Here's what I want

ID email_address in_flag

1 josh@gmail.com 1

2 josh@yahoo.com 1

3 josh@hotmail.com 0

4 josh@aol.com 0

5 josh@facebook.com 0 

 

I've tried a various number of joins to get to this, but haven't had any luck

 

proc sql;

select * 

from table1

left join (select *, 1 as in_flag from table2) on (email_address like catx("","%",flag))

 

Any ideas? Thanks!


Accepted Solutions
Solution
‎01-31-2017 05:39 PM
PROC Star
Posts: 1,564

Re: Joining on a wildcard

Like this?

data EMAIL;
 input EMAIL : $50.;
cards;
osh@123gmail.com
josh@jeoiafwgmail.com.bu
josh@a09irafyahoo.com
josh@a09irafaol.com
run;
data DOMAIN;
 input DOMAIN : $50.;
cards;
lag
gmail.com
yahoo.com
run;
proc sql;
  select EMAIl, DOMAIN, (DOMAIN is not null) as IN_FLAG
  from EMAIL
         left join
       DOMAIN
         on EMAIL like catt('%',DOMAIN,'%');
quit; 

 

EMAIL DOMAIN IN_FLAG
osh@123gmail.com gmail.com 1
josh@jeoiafwgmail.com.bu gmail.com 1
josh@a09irafyahoo.com yahoo.com 1
josh@a09irafaol.com   0

 

 

View solution in original post


All Replies
Super User
Posts: 17,907

Re: Joining on a wildcard

Why use a wildcard? Why not extract the domain and check for it specifically?

The SCAN function can extract the domain only. 

 

 

proc sql;

create table want as

select a.*, case when not missing(b.domain) then 1 else 0 end as flag

from table1 as a

Left join table2 as b  on

scan(a.email_address, 2, '@') = b.domain;

quit;

 

You can modify this to an update or join 

Contributor JS
Contributor
Posts: 38

Re: Joining on a wildcard

[ Edited ]

Reeza,

 

That's a good question.

 

Reason being is because occasionally the record-level data will present emails like this:

 

josh@123gmail.com

josh@jeoiafwgmail.com.bu

josh@a09irafgmail.com

 

I still want to be able to capture those.

Super User
Posts: 17,907

Re: Joining on a wildcard

Comparing only the domain levels is still better IMO. 

Extract and then use =* which is the equivalent to Sounds like between character variables. 

 

You may also want to look into COMPGED\COMPLEV if you want to filter some of the values or have some more control. 

 

http://support.sas.com/documentation/cdl/en/lefunctionsref/69762/HTML/default/viewer.htm#p0w6napahk6...

Solution
‎01-31-2017 05:39 PM
PROC Star
Posts: 1,564

Re: Joining on a wildcard

Like this?

data EMAIL;
 input EMAIL : $50.;
cards;
osh@123gmail.com
josh@jeoiafwgmail.com.bu
josh@a09irafyahoo.com
josh@a09irafaol.com
run;
data DOMAIN;
 input DOMAIN : $50.;
cards;
lag
gmail.com
yahoo.com
run;
proc sql;
  select EMAIl, DOMAIN, (DOMAIN is not null) as IN_FLAG
  from EMAIL
         left join
       DOMAIN
         on EMAIL like catt('%',DOMAIN,'%');
quit; 

 

EMAIL DOMAIN IN_FLAG
osh@123gmail.com gmail.com 1
josh@jeoiafwgmail.com.bu gmail.com 1
josh@a09irafyahoo.com yahoo.com 1
josh@a09irafaol.com   0

 

 

Contributor JS
Contributor
Posts: 38

Re: Joining on a wildcard

This was perfect and worked like a charm!

 

Thank you so much.

Respected Advisor
Posts: 4,655

Re: Joining on a wildcard

[ Edited ]

This might do:

 

proc sql;
create table want as
select 
	a.*, 
	not missing(b.domain) as flag
from 
	table1 as a Left join 
	table2 as b on a.email_address contains trim(b.domain);
quit;
PG
Contributor JS
Contributor
Posts: 38

Re: Joining on a wildcard

PG,

 

Thank you for the response. The output did not necessarily accomplish my

 

Thank you for your advice though!

 

josh@123gmail.com 0
josh@jeoiafwgmail.com.bu 0
josh@a09irafyahoo.com 0
josh@a09irafaol.com 0

Respected Advisor
Posts: 4,655

Re: Joining on a wildcard

You are welcome. I fixed it by adding the trim function. It works now.

PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 246 views
  • 2 likes
  • 4 in conversation