I have two tables.
First table has a bunch of record-level data.
ID Email_Address
1 josh@gmail.com
2 josh@yahoo.com
3 josh@hotmail.com
4 josh@aol.com
5 josh@facebook.com
Second table has a lookup on domains
flag
gmail.com
yahoo.com
Here's what I want
ID email_address in_flag
1 josh@gmail.com 1
2 josh@yahoo.com 1
3 josh@hotmail.com 0
4 josh@aol.com 0
5 josh@facebook.com 0
I've tried a various number of joins to get to this, but haven't had any luck
proc sql;
select *
from table1
left join (select *, 1 as in_flag from table2) on (email_address like catx("","%",flag))
Any ideas? Thanks!
Like this?
data EMAIL;
input EMAIL : $50.;
cards;
osh@123gmail.com
josh@jeoiafwgmail.com.bu
josh@a09irafyahoo.com
josh@a09irafaol.com
run;
data DOMAIN;
input DOMAIN : $50.;
cards;
lag
gmail.com
yahoo.com
run;
proc sql;
select EMAIl, DOMAIN, (DOMAIN is not null) as IN_FLAG
from EMAIL
left join
DOMAIN
on EMAIL like catt('%',DOMAIN,'%');
quit;
DOMAIN | IN_FLAG | |
---|---|---|
osh@123gmail.com | gmail.com | 1 |
josh@jeoiafwgmail.com.bu | gmail.com | 1 |
josh@a09irafyahoo.com | yahoo.com | 1 |
josh@a09irafaol.com | 0 |
Why use a wildcard? Why not extract the domain and check for it specifically?
The SCAN function can extract the domain only.
proc sql;
create table want as
select a.*, case when not missing(b.domain) then 1 else 0 end as flag
from table1 as a
Left join table2 as b on
scan(a.email_address, 2, '@') = b.domain;
quit;
You can modify this to an update or join
Reeza,
That's a good question.
Reason being is because occasionally the record-level data will present emails like this:
josh@123gmail.com
josh@jeoiafwgmail.com.bu
josh@a09irafgmail.com
I still want to be able to capture those.
Comparing only the domain levels is still better IMO.
Extract and then use =* which is the equivalent to Sounds like between character variables.
You may also want to look into COMPGED\COMPLEV if you want to filter some of the values or have some more control.
Like this?
data EMAIL;
input EMAIL : $50.;
cards;
osh@123gmail.com
josh@jeoiafwgmail.com.bu
josh@a09irafyahoo.com
josh@a09irafaol.com
run;
data DOMAIN;
input DOMAIN : $50.;
cards;
lag
gmail.com
yahoo.com
run;
proc sql;
select EMAIl, DOMAIN, (DOMAIN is not null) as IN_FLAG
from EMAIL
left join
DOMAIN
on EMAIL like catt('%',DOMAIN,'%');
quit;
DOMAIN | IN_FLAG | |
---|---|---|
osh@123gmail.com | gmail.com | 1 |
josh@jeoiafwgmail.com.bu | gmail.com | 1 |
josh@a09irafyahoo.com | yahoo.com | 1 |
josh@a09irafaol.com | 0 |
This was perfect and worked like a charm!
Thank you so much.
This might do:
proc sql;
create table want as
select
a.*,
not missing(b.domain) as flag
from
table1 as a Left join
table2 as b on a.email_address contains trim(b.domain);
quit;
PG,
Thank you for the response. The output did not necessarily accomplish my
Thank you for your advice though!
josh@123gmail.com 0
josh@jeoiafwgmail.com.bu 0
josh@a09irafyahoo.com 0
josh@a09irafaol.com 0
You are welcome. I fixed it by adding the trim function. It works now.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.