DATA Step, Macro, Functions and more

SUBSTR function not working! Please help me to solve the following:

Reply
Occasional Contributor
Posts: 16

SUBSTR function not working! Please help me to solve the following:

1. I need to delete 10 digits from right including @ sign on email account and leave only login name for Account ID column (i.e. mjohn)

2. You may also notice – if the names are repeated, email account is attached in parenthesis with their names. I will also need to remove the email account including      the parenthesis from Full Name column if the names have their email account attached (i.e. Tom Lee (tlee2@gmail.com) = Tom Lee)

Thanks,

Roger

Input data:

Full_Name Account_ID

------------ --------------

Mary Johnson  mjohn@gmail.com

Mary Jonson (mjohn2@gmail.com) mjohn2@gmail.com

Robin King rking@gmail.com

Tom Lee tlee@gmail.com

Tom Lee (tlee2@gmail.com) tlee2@gmail.com

Need the following output:

Full_Name Account_ID

------------ --------------

Mary Johnson  mjohn

Mary Jonson mjohn2

Robin King rking

Tom Lee tlee

Tom Lee tlee2

Super User
Posts: 17,784

Re: SUBSTR function not working! Please help me to solve the following:

Use the scan function with the "@" as a delimiter.

scan(account_id, 1, "@")

You can use compress function to remove brackets.

Occasional Contributor
Posts: 16

Re: SUBSTR function not working! Please help me to solve the following:

Hi Reeza, Thanks for your help!

I'm still having an issue with the login ID that comes in parantheses. I was able to use compress function to remove the parantheses but login ID still shows up after the the last name, e.g. Mary Jonson mjohn2.

And some people have a middle initial, e.g. Mary E. Jonson mjohn2.

Is there a way to remove the login ID at the end of their names? (in some cases it's a 4th word and in some cases it's a 3rd word, depending on whether they have a middle initial or not)

Super User
Posts: 3,102

Re: SUBSTR function not working! Please help me to solve the following:

One way to do this is like so - FINDC finds the first blank searching from right to left with any trailing blanks removed first, then SUBSTR extracts all characters up to that position:

data test;

  string = 'Mary E. Jonson mjohn2  ';

  result = substr(string, 1, findc(trim(string), ' ', 'B') - 1);

  put _all_;

run;

Super Contributor
Posts: 644

Re: SUBSTR function not working! Please help me to solve the following:

To identify the account id try this:

I can't tell from your data whether the envelope symbol is encapsulated in your data or is an artifact of the HTML posting.  If the latter, using scan(account_id, -2, ") @") will pick out the second to last "word" in account_is, where the word boundary is delimited by "@", space or right parenthesis.  If the symbol is in your data and you can identify it by its byte code you may be able to add that into the list of scan delimiters, otherwise a subsequent account_id = substr (account_id, 2, 99) should do the trick.  NB don't compress out the parentheses for this to work.


From your second posting though it appears you have a problem splitting a single input line.  If that is the case, first find the account_id by applying the scan function to the input data, then use the INDEX function to locate its first occurrence in the input data

     pos = index (rawdata, account_id) ;

Then

     full_name = substr (rawdata, 1, pos - 2) ;

Super Contributor
Posts: 1,636

Re: SUBSTR function not working! Please help me to solve the following:

combining Rezee and SASkiwi's solutions:

data have;

input all $50.;

cards;

Mary Johnson  mjohn@gmail.com

Mary Jonson (mjohn2@gmail.com) mjohn2@gmail.com

Robin King rking@gmail.com

Tom Lee tlee@gmail.com

Tom Lee (tlee2@gmail.com) tlee2@gmail.com

;

data want(drop=temp);

length full_name account_id $ 20;

  set have;

  temp=compress(scan(all, 1, "@"),'()');

  Account_ID=scan(temp,-1,' ');

  full_name=substr(temp, 1, findc(trim(temp), ' ', 'B') - 1);

  proc print;  run;

                      account_

Obs    full_name          id       all

1     Mary Johnson     mjohn      Mary Johnson  mjohn@gmail.com

2     Mary Jonson      mjohn2     Mary Jonson (mjohn2@gmail.com) mjohn2@gmail.com

3     Robin King       rking      Robin King rking@gmail.com

4     Tom Lee          tlee       Tom Lee tlee@gmail.com

5     Tom Lee          tlee2      Tom Lee (tlee2@gmail.com) tlee2@gmail.com

;

Ask a Question
Discussion stats
  • 5 replies
  • 410 views
  • 0 likes
  • 5 in conversation