DATA Step, Macro, Functions and more

Conditionally processing non-matching values

Reply
Contributor
Posts: 20

Conditionally processing non-matching values

Hi,

I have a dataset (A) containing usernames and email addresses of a number of people who entered data into a table in the past. Some of these people have left the company in the meantime. I want to check these usernames against a dataset of usernames of all current employees (B). If a username does not exist in B, the email address of that user in A must be changed into "xx@xx.com" (the same address for all ex-employees).

I am sure there must be a simple way to do this but for some reason, I cannot get my head around it.

Thanks in advance for your help,

Helle
Super User
Posts: 3,257

Re: Conditionally processing non-matching values

You could try something like this:

data new;
merge A (in = a) B (in = b);
by user_name;
if a and not b then email = "xx@xx.com";
run;

This assumes tables A and B are already sorted in user_name order and in at least one of the tables user_name is unique. If this is not the case then using SQL may be preferable.
Valued Guide
Posts: 634

Re: Conditionally processing non-matching values

The merge that SASKiwi suggests is one of several types of table look-ups. Read more about other look-up methods at http://caloxy.com/papers/43-i_how_table_lookups_from_ift.pdf .
Contributor
Posts: 20

Re: Conditionally processing non-matching values

Hi,

Thanks a lot to both of you for your help. I wanted to end up with a dataset containing only the observations from A so I wrote the data step as follows:

data new;
merge A (in = a) B (in = b);
by user_name;
if a and not b then email = "xx@xx.com";
if b and not a then delete;
run;

Regards,

Helle
Super Contributor
Super Contributor
Posts: 3,174

Re: Conditionally processing non-matching values

The observation is either in A or B, so the simplest, most efficient code construct (given your output requirement) would be:

if a and not b then email = "xx@xx.com";
else delete;

Scott Barry
SBBWorks, Inc.
Contributor
Posts: 20

Re: Conditionally processing non-matching values

Hi Scott,

Thanks for your suggestion. However, I want to keep all the observations from A and if I use your code, the only observations left in "new" are the ones that are not in B (all the ones which are in both A and B are deleted).

Regards,

Helle
Contributor
Posts: 20

Re: Conditionally processing non-matching values

Hi,

Can anybody tell me how I would do something similar in PROC SQL? I have another case involving a very large dataset and I run out of memory when sorting it.

Thanks,

Helle
Valued Guide
Posts: 2,177

Re: Conditionally processing non-matching values

sql will still sort, unless it could perform a "hash join"
Similar technology is available in a data step.
Scalability of joins depends on memory available and data sizes. When you have that information, you can make an informed choice.
The "seminal" (?) paper on sql joins, written some time ago but (imho) permanently relevant is at http://support.sas.com/techsup/technote/ts553.html titled "SQL Joins -- The Long and The Short of It"

peterC
Super User
Posts: 3,257

Re: Conditionally processing non-matching values

If you have a very large dataset and do not want to sort it there are at least two good choices:

1) Hash join table B to table A in a DATA step, as mentioned by Peter.

2) Create a SAS format from the username in table B using PROC FORMAT, then use the PUT function in a DATA step to do the lookup.

Both of these techniques provide similar (very quick) performance as the lookups are done in memory. They are well-documented in online help so I would suggest doing some research yourself if you would like to check this out further.
Contributor
Posts: 20

Re: Conditionally processing non-matching values

Thanks for all your help - I will look into the different suggestions.

Helle
Ask a Question
Discussion stats
  • 9 replies
  • 190 views
  • 0 likes
  • 5 in conversation