I have a dataset (A) containing usernames and email addresses of a number of people who entered data into a table in the past. Some of these people have left the company in the meantime. I want to check these usernames against a dataset of usernames of all current employees (B). If a username does not exist in B, the email address of that user in A must be changed into "email@example.com" (the same address for all ex-employees).
I am sure there must be a simple way to do this but for some reason, I cannot get my head around it.
Thanks for your suggestion. However, I want to keep all the observations from A and if I use your code, the only observations left in "new" are the ones that are not in B (all the ones which are in both A and B are deleted).
sql will still sort, unless it could perform a "hash join"
Similar technology is available in a data step.
Scalability of joins depends on memory available and data sizes. When you have that information, you can make an informed choice.
The "seminal" (?) paper on sql joins, written some time ago but (imho) permanently relevant is at http://support.sas.com/techsup/technote/ts553.html titled "SQL Joins -- The Long and The Short of It"
If you have a very large dataset and do not want to sort it there are at least two good choices:
1) Hash join table B to table A in a DATA step, as mentioned by Peter.
2) Create a SAS format from the username in table B using PROC FORMAT, then use the PUT function in a DATA step to do the lookup.
Both of these techniques provide similar (very quick) performance as the lookups are done in memory. They are well-documented in online help so I would suggest doing some research yourself if you would like to check this out further.