1. I need to delete 10 digits from right including @ sign on email account and leave only login name for Account ID column (i.e. mjohn)
2. You may also notice – if the names are repeated, email account is attached in parenthesis with their names. I will also need to remove the email account including the parenthesis from Full Name column if the names have their email account attached (i.e. Tom Lee (tlee2@gmail.com) = Tom Lee)
Thanks,
Roger
Input data:
Full_Name Account_ID
------------ --------------
Mary Johnson mjohn@gmail.com
Mary Jonson (mjohn2@gmail.com) mjohn2@gmail.com
Robin King rking@gmail.com
Tom Lee tlee@gmail.com
Tom Lee (tlee2@gmail.com) tlee2@gmail.com
Need the following output:
Full_Name Account_ID
------------ --------------
Mary Johnson mjohn
Mary Jonson mjohn2
Robin King rking
Tom Lee tlee
Tom Lee tlee2
Use the scan function with the "@" as a delimiter.
scan(account_id, 1, "@")
You can use compress function to remove brackets.
Hi Reeza, Thanks for your help!
I'm still having an issue with the login ID that comes in parantheses. I was able to use compress function to remove the parantheses but login ID still shows up after the the last name, e.g. Mary Jonson mjohn2.
And some people have a middle initial, e.g. Mary E. Jonson mjohn2.
Is there a way to remove the login ID at the end of their names? (in some cases it's a 4th word and in some cases it's a 3rd word, depending on whether they have a middle initial or not)
One way to do this is like so - FINDC finds the first blank searching from right to left with any trailing blanks removed first, then SUBSTR extracts all characters up to that position:
data test;
string = 'Mary E. Jonson mjohn2 ';
result = substr(string, 1, findc(trim(string), ' ', 'B') - 1);
put _all_;
run;
To identify the account id try this:
I can't tell from your data whether the envelope symbol is encapsulated in your data or is an artifact of the HTML posting. If the latter, using scan(account_id, -2, ") @") will pick out the second to last "word" in account_is, where the word boundary is delimited by "@", space or right parenthesis. If the symbol is in your data and you can identify it by its byte code you may be able to add that into the list of scan delimiters, otherwise a subsequent account_id = substr (account_id, 2, 99) should do the trick. NB don't compress out the parentheses for this to work.
From your second posting though it appears you have a problem splitting a single input line. If that is the case, first find the account_id by applying the scan function to the input data, then use the INDEX function to locate its first occurrence in the input data
pos = index (rawdata, account_id) ;
Then
full_name = substr (rawdata, 1, pos - 2) ;
combining Rezee and SASkiwi's solutions:
data have;
input all $50.;
cards;
Mary Johnson mjohn@gmail.com
Mary Jonson (mjohn2@gmail.com) mjohn2@gmail.com
Robin King rking@gmail.com
Tom Lee tlee@gmail.com
Tom Lee (tlee2@gmail.com) tlee2@gmail.com
;
data want(drop=temp);
length full_name account_id $ 20;
set have;
temp=compress(scan(all, 1, "@"),'()');
Account_ID=scan(temp,-1,' ');
full_name=substr(temp, 1, findc(trim(temp), ' ', 'B') - 1);
proc print; run;
account_
Obs full_name id all
1 Mary Johnson mjohn Mary Johnson mjohn@gmail.com
2 Mary Jonson mjohn2 Mary Jonson (mjohn2@gmail.com) mjohn2@gmail.com
3 Robin King rking Robin King rking@gmail.com
4 Tom Lee tlee Tom Lee tlee@gmail.com
5 Tom Lee tlee2 Tom Lee (tlee2@gmail.com) tlee2@gmail.com
;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.