BookmarkSubscribeRSS Feed
Roger
Calcite | Level 5

1. I need to delete 10 digits from right including @ sign on email account and leave only login name for Account ID column (i.e. mjohn)

2. You may also notice – if the names are repeated, email account is attached in parenthesis with their names. I will also need to remove the email account including      the parenthesis from Full Name column if the names have their email account attached (i.e. Tom Lee (tlee2@gmail.com) = Tom Lee)

Thanks,

Roger

Input data:

Full_Name Account_ID

------------ --------------

Mary Johnson  mjohn@gmail.com

Mary Jonson (mjohn2@gmail.com) mjohn2@gmail.com

Robin King rking@gmail.com

Tom Lee tlee@gmail.com

Tom Lee (tlee2@gmail.com) tlee2@gmail.com

Need the following output:

Full_Name Account_ID

------------ --------------

Mary Johnson  mjohn

Mary Jonson mjohn2

Robin King rking

Tom Lee tlee

Tom Lee tlee2

5 REPLIES 5
Reeza
Super User

Use the scan function with the "@" as a delimiter.

scan(account_id, 1, "@")

You can use compress function to remove brackets.

Roger
Calcite | Level 5

Hi Reeza, Thanks for your help!

I'm still having an issue with the login ID that comes in parantheses. I was able to use compress function to remove the parantheses but login ID still shows up after the the last name, e.g. Mary Jonson mjohn2.

And some people have a middle initial, e.g. Mary E. Jonson mjohn2.

Is there a way to remove the login ID at the end of their names? (in some cases it's a 4th word and in some cases it's a 3rd word, depending on whether they have a middle initial or not)

SASKiwi
PROC Star

One way to do this is like so - FINDC finds the first blank searching from right to left with any trailing blanks removed first, then SUBSTR extracts all characters up to that position:

data test;

  string = 'Mary E. Jonson mjohn2  ';

  result = substr(string, 1, findc(trim(string), ' ', 'B') - 1);

  put _all_;

run;

RichardinOz
Quartz | Level 8

To identify the account id try this:

I can't tell from your data whether the envelope symbol is encapsulated in your data or is an artifact of the HTML posting.  If the latter, using scan(account_id, -2, ") @") will pick out the second to last "word" in account_is, where the word boundary is delimited by "@", space or right parenthesis.  If the symbol is in your data and you can identify it by its byte code you may be able to add that into the list of scan delimiters, otherwise a subsequent account_id = substr (account_id, 2, 99) should do the trick.  NB don't compress out the parentheses for this to work.


From your second posting though it appears you have a problem splitting a single input line.  If that is the case, first find the account_id by applying the scan function to the input data, then use the INDEX function to locate its first occurrence in the input data

     pos = index (rawdata, account_id) ;

Then

     full_name = substr (rawdata, 1, pos - 2) ;

Linlin
Lapis Lazuli | Level 10

combining Rezee and SASkiwi's solutions:

data have;

input all $50.;

cards;

Mary Johnson  mjohn@gmail.com

Mary Jonson (mjohn2@gmail.com) mjohn2@gmail.com

Robin King rking@gmail.com

Tom Lee tlee@gmail.com

Tom Lee (tlee2@gmail.com) tlee2@gmail.com

;

data want(drop=temp);

length full_name account_id $ 20;

  set have;

  temp=compress(scan(all, 1, "@"),'()');

  Account_ID=scan(temp,-1,' ');

  full_name=substr(temp, 1, findc(trim(temp), ' ', 'B') - 1);

  proc print;  run;

                      account_

Obs    full_name          id       all

1     Mary Johnson     mjohn      Mary Johnson  mjohn@gmail.com

2     Mary Jonson      mjohn2     Mary Jonson (mjohn2@gmail.com) mjohn2@gmail.com

3     Robin King       rking      Robin King rking@gmail.com

4     Tom Lee          tlee       Tom Lee tlee@gmail.com

5     Tom Lee          tlee2      Tom Lee (tlee2@gmail.com) tlee2@gmail.com

;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3566 views
  • 0 likes
  • 5 in conversation