BookmarkSubscribeRSS Feed
Helle
Calcite | Level 5
Hi,

I have a dataset (A) containing usernames and email addresses of a number of people who entered data into a table in the past. Some of these people have left the company in the meantime. I want to check these usernames against a dataset of usernames of all current employees (B). If a username does not exist in B, the email address of that user in A must be changed into "xx@xx.com" (the same address for all ex-employees).

I am sure there must be a simple way to do this but for some reason, I cannot get my head around it.

Thanks in advance for your help,

Helle
9 REPLIES 9
SASKiwi
PROC Star
You could try something like this:

data new;
merge A (in = a) B (in = b);
by user_name;
if a and not b then email = "xx@xx.com";
run;

This assumes tables A and B are already sorted in user_name order and in at least one of the tables user_name is unique. If this is not the case then using SQL may be preferable.
ArtC
Rhodochrosite | Level 12
The merge that SASKiwi suggests is one of several types of table look-ups. Read more about other look-up methods at http://caloxy.com/papers/43-i_how_table_lookups_from_ift.pdf .
Helle
Calcite | Level 5
Hi,

Thanks a lot to both of you for your help. I wanted to end up with a dataset containing only the observations from A so I wrote the data step as follows:

data new;
merge A (in = a) B (in = b);
by user_name;
if a and not b then email = "xx@xx.com";
if b and not a then delete;
run;

Regards,

Helle
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
The observation is either in A or B, so the simplest, most efficient code construct (given your output requirement) would be:

if a and not b then email = "xx@xx.com";
else delete;

Scott Barry
SBBWorks, Inc.
Helle
Calcite | Level 5
Hi Scott,

Thanks for your suggestion. However, I want to keep all the observations from A and if I use your code, the only observations left in "new" are the ones that are not in B (all the ones which are in both A and B are deleted).

Regards,

Helle
Helle
Calcite | Level 5
Hi,

Can anybody tell me how I would do something similar in PROC SQL? I have another case involving a very large dataset and I run out of memory when sorting it.

Thanks,

Helle
Peter_C
Rhodochrosite | Level 12
sql will still sort, unless it could perform a "hash join"
Similar technology is available in a data step.
Scalability of joins depends on memory available and data sizes. When you have that information, you can make an informed choice.
The "seminal" (?) paper on sql joins, written some time ago but (imho) permanently relevant is at http://support.sas.com/techsup/technote/ts553.html titled "SQL Joins -- The Long and The Short of It"

peterC
SASKiwi
PROC Star
If you have a very large dataset and do not want to sort it there are at least two good choices:

1) Hash join table B to table A in a DATA step, as mentioned by Peter.

2) Create a SAS format from the username in table B using PROC FORMAT, then use the PUT function in a DATA step to do the lookup.

Both of these techniques provide similar (very quick) performance as the lookups are done in memory. They are well-documented in online help so I would suggest doing some research yourself if you would like to check this out further.
Helle
Calcite | Level 5
Thanks for all your help - I will look into the different suggestions.

Helle

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1300 views
  • 0 likes
  • 5 in conversation