Conditionally processing non-matching values

Helle · Posted 08-09-2010 07:15 AM

Hi,

I have a dataset (A) containing usernames and email addresses of a number of people who entered data into a table in the past. Some of these people have left the company in the meantime. I want to check these usernames against a dataset of usernames of all current employees (B). If a username does not exist in B, the email address of that user in A must be changed into "xx@xx.com" (the same address for all ex-employees).

I am sure there must be a simple way to do this but for some reason, I cannot get my head around it.

Thanks in advance for your help,

Helle

SASKiwi · Posted 08-09-2010 06:12 PM

You could try something like this:

data new;
merge A (in = a) B (in = b);
by user_name;
if a and not b then email = "xx@xx.com";
run;

This assumes tables A and B are already sorted in user_name order and in at least one of the tables user_name is unique. If this is not the case then using SQL may be preferable.

ArtC · Posted 08-09-2010 06:49 PM

The merge that SASKiwi suggests is one of several types of table look-ups. Read more about other look-up methods at http://caloxy.com/papers/43-i_how_table_lookups_from_ift.pdf .

Helle · Posted 08-10-2010 04:04 AM

Hi,

Thanks a lot to both of you for your help. I wanted to end up with a dataset containing only the observations from A so I wrote the data step as follows:

data new;
merge A (in = a) B (in = b);
by user_name;
if a and not b then email = "xx@xx.com";
if b and not a then delete;
run;

Regards,

Helle

sbb · Posted 08-10-2010 04:12 AM

The observation is either in A or B, so the simplest, most efficient code construct (given your output requirement) would be:

if a and not b then email = "xx@xx.com";
else delete;

Scott Barry
SBBWorks, Inc.

Helle · Posted 08-11-2010 09:40 AM

Hi Scott,

Thanks for your suggestion. However, I want to keep all the observations from A and if I use your code, the only observations left in "new" are the ones that are not in B (all the ones which are in both A and B are deleted).

Regards,

Helle

Helle · Posted 08-11-2010 09:41 AM

Hi,

Can anybody tell me how I would do something similar in PROC SQL? I have another case involving a very large dataset and I run out of memory when sorting it.

Thanks,

Helle

Peter_C · Posted 08-11-2010 11:06 AM

sql will still sort, unless it could perform a "hash join"
Similar technology is available in a data step.
Scalability of joins depends on memory available and data sizes. When you have that information, you can make an informed choice.
The "seminal" (?) paper on sql joins, written some time ago but (imho) permanently relevant is at http://support.sas.com/techsup/technote/ts553.html titled "SQL Joins -- The Long and The Short of It"

peterC

SASKiwi · Posted 08-11-2010 06:09 PM

If you have a very large dataset and do not want to sort it there are at least two good choices:

1) Hash join table B to table A in a DATA step, as mentioned by Peter.

2) Create a SAS format from the username in table B using PROC FORMAT, then use the PUT function in a DATA step to do the lookup.

Both of these techniques provide similar (very quick) performance as the lookups are done in memory. They are well-documented in online help so I would suggest doing some research yourself if you would like to check this out further.

Helle · Posted 08-13-2010 09:37 AM

Thanks for all your help - I will look into the different suggestions.

Helle

Conditionally processing non-matching values

Re: Conditionally processing non-matching values

Re: Conditionally processing non-matching values

Re: Conditionally processing non-matching values

Re: Conditionally processing non-matching values

Re: Conditionally processing non-matching values

Re: Conditionally processing non-matching values

Re: Conditionally processing non-matching values

Re: Conditionally processing non-matching values

Re: Conditionally processing non-matching values

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away