topic Re: Conditionally processing non-matching values in SAS Programming

Conditionally processing non-matching values

Helle — Mon, 09 Aug 2010 11:15:17 GMT

Hi,

I have a dataset (A) containing usernames and email addresses of a number of people who entered data into a table in the past. Some of these people have left the company in the meantime. I want to check these usernames against a dataset of usernames of all current employees (B). If a username does not exist in B, the email address of that user in A must be changed into "xx@xx.com" (the same address for all ex-employees).

I am sure there must be a simple way to do this but for some reason, I cannot get my head around it.

Thanks in advance for your help,

Helle

Re: Conditionally processing non-matching values

SASKiwi — Mon, 09 Aug 2010 22:12:42 GMT

You could try something like this:

data new;
merge A (in = a) B (in = b);
by user_name;
if a and not b then email = "xx@xx.com";
run;

This assumes tables A and B are already sorted in user_name order and in at least one of the tables user_name is unique. If this is not the case then using SQL may be preferable.

Re: Conditionally processing non-matching values

ArtC — Mon, 09 Aug 2010 22:49:16 GMT

The merge that SASKiwi suggests is one of several types of table look-ups. Read more about other look-up methods at http://caloxy.com/papers/43-i_how_table_lookups_from_ift.pdf .

Re: Conditionally processing non-matching values

Helle — Tue, 10 Aug 2010 08:04:38 GMT

Hi,

Thanks a lot to both of you for your help. I wanted to end up with a dataset containing only the observations from A so I wrote the data step as follows:

data new;
merge A (in = a) B (in = b);
by user_name;
if a and not b then email = "xx@xx.com";
if b and not a then delete;
run;

Regards,

Helle

Re: Conditionally processing non-matching values

sbb — Tue, 10 Aug 2010 08:12:32 GMT

The observation is either in A or B, so the simplest, most efficient code construct (given your output requirement) would be:

if a and not b then email = "xx@xx.com";
else delete;

Scott Barry
SBBWorks, Inc.

Re: Conditionally processing non-matching values

Helle — Wed, 11 Aug 2010 13:40:04 GMT

Hi Scott,

Thanks for your suggestion. However, I want to keep all the observations from A and if I use your code, the only observations left in "new" are the ones that are not in B (all the ones which are in both A and B are deleted).

Regards,

Helle

Re: Conditionally processing non-matching values

Helle — Wed, 11 Aug 2010 13:41:39 GMT

Hi,

Can anybody tell me how I would do something similar in PROC SQL? I have another case involving a very large dataset and I run out of memory when sorting it.

Thanks,

Helle

Re: Conditionally processing non-matching values

Peter_C — Wed, 11 Aug 2010 15:06:30 GMT

sql will still sort, unless it could perform a "hash join"
Similar technology is available in a data step.
Scalability of joins depends on memory available and data sizes. When you have that information, you can make an informed choice.
The "seminal" (?) paper on sql joins, written some time ago but (imho) permanently relevant is at http://support.sas.com/techsup/technote/ts553.html titled "SQL Joins -- The Long and The Short of It"

peterC

Re: Conditionally processing non-matching values

SASKiwi — Wed, 11 Aug 2010 22:09:19 GMT

If you have a very large dataset and do not want to sort it there are at least two good choices:

1) Hash join table B to table A in a DATA step, as mentioned by Peter.

2) Create a SAS format from the username in table B using PROC FORMAT, then use the PUT function in a DATA step to do the lookup.

Both of these techniques provide similar (very quick) performance as the lookups are done in memory. They are well-documented in online help so I would suggest doing some research yourself if you would like to check this out further.

Re: Conditionally processing non-matching values

Helle — Fri, 13 Aug 2010 13:37:42 GMT

Thanks for all your help - I will look into the different suggestions.

Helle