BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SASUser_77
Fluorite | Level 6

Even when I close the view tables, the want table does not contain only email addresses. Here is my code, log and output table. 

libname bank_lib 'P:\temp - to be deleted periodically';

filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';

proc import datafile=banking
out=bank_lib.banking
dbms=xlsx
replace;
run;
proc contents data=bank_lib.banking;
run;
proc print data=bank_lib.banking;
run;
data want;
    set bank_lib.banking;
    nwords=countw(columnB,' ');
    do i=1 to nwords;
        thisword=scan(columnB,i,' ');
        contains_at_symbol=find(thisword,'@');
        if contains_at_symbol>0 then do;
           email_address=thisword;
           output;
        end;
    end;
    drop i nwords thisword;
run;

NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M6 MBCS3170)
      Licensed to, Site 70016086.
NOTE: This session is executing on the X64_10PRO  platform.



NOTE: Analytical products:

      SAS/STAT 15.1
      SAS/ETS 15.1
      SAS/OR 15.1
      SAS/IML 15.1
      SAS/QC 15.1

NOTE: Additional host information:

 X64_10PRO WIN 10.0.19041  Workstation

NOTE: SAS initialization used:
      real time           0.94 seconds
      cpu time            0.53 seconds

1    libname bank_lib 'P:\temp - to be deleted periodically';
NOTE: Libref BANK_LIB was successfully assigned as follows:
      Engine:        V9
      Physical Name: P:\temp - to be deleted periodically
2
3    filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';
4
5    proc import datafile=banking
6    out=bank_lib.banking
7    dbms=xlsx
8    replace;
9    run;

NOTE:    Variable Name Change.  Form 1 - Report Summary -> Form_1___Report_Summary
NOTE:    Variable Name Change.  444 -> _444
NOTE:    Variable Name Change.  Employer Number -> Employer_Number
NOTE:    Variable Name Change.  
PO Box 5777
300 Union St.
Sa -> VAR6
NOTE:    Variable Name Change.  002001 -> _002001
NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with
      options MSGLEVEL=I.
NOTE: The import data set has 905 observations and 7 variables.
NOTE: BANK_LIB.BANKING data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           1.61 seconds
      cpu time            0.04 seconds


10   proc contents data=bank_lib.banking;
NOTE: Writing HTML Body file: sashtml.htm
11   run;

NOTE: PROCEDURE CONTENTS used (Total process time):
      real time           0.64 seconds
      cpu time            0.09 seconds


12   proc print data=bank_lib.banking;
13   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.13 seconds
      cpu time            0.10 seconds


14   data want;
15       set bank_lib.banking;
16       nwords=countw(columnB,' ');
17       do i=1 to nwords;
18           thisword=scan(columnB,i,' ');
19           contains_at_symbol=find(thisword,'@');
20           if contains_at_symbol>0 then do;
21              email_address=thisword;
22              output;
23           end;
24       end;
25       drop i nwords thisword;
26   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: The data set WORK.WANT has 22 observations and 9 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


want_9.PNG

SASKiwi
PROC Star

If all you want is just the email address variable then use a KEEP statement:

data want;
  keep email_address;
  set bank_lib.banking;
  nwords=countw(columnB,' ');
       do i=1 to nwords;
           thisword=scan(columnB,i,' ');
           contains_at_symbol=find(thisword,'@');
           if contains_at_symbol>0 then do;
              email_address=thisword;
              output;
           end;
       end;
   run;
Tom
Super User Tom
Super User

@SASUser_77 wrote:

Sorry, I fixed it so the error message no longer pops up. But when I do run the code it does not extract email addresses only.


Because you did not ask it to extract the email address.  You only asked it to locate the observations that had @ characters in them.

 

If you want to only extract the words in the string that might be email addresses then perhaps you should loop through the string word by word.

data want;
  set bank_lib.banking;
  where find(columnB,'@') ;
  length word $200 ;
  do i=1 to countw(columnB,' ');
      word = scan(columnB,i,' ');
      if find(word,'@') then output;
  end;
run;
SASUser_77
Fluorite | Level 6

I ran that code and it returns cells with email addresses, not jus the email addresses. Here is my code, log and want2 table.

 

libname bank_lib 'P:\temp - to be deleted periodically';

filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';

proc import datafile=banking
out=bank_lib.banking
dbms=xlsx
replace;
run;
proc contents data=bank_lib.banking;
run;
proc print data=bank_lib.banking;
run;
data want2;
    set bank_lib.banking;
    nwords=countw(columnB,' ');
    do i=1 to nwords;
        thisword=scan(columnB,i,' ');
        contains_at_symbol=find(thisword,'@');
        if contains_at_symbol>0 then do;
           email_address=thisword;
           output;
        end;
    end;
    drop i nwords thisword;
run;

NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M6 MBCS3170)
      Licensed to CANADIAN HUMAN RIGHTS COMMISSION, Site 70016086.
NOTE: This session is executing on the X64_10PRO  platform.



NOTE: Analytical products:

      SAS/STAT 15.1
      SAS/ETS 15.1
      SAS/OR 15.1
      SAS/IML 15.1
      SAS/QC 15.1

NOTE: Additional host information:

 X64_10PRO WIN 10.0.19041  Workstation

NOTE: SAS initialization used:
      real time           0.80 seconds
      cpu time            0.59 seconds

1    libname bank_lib 'P:\temp - to be deleted periodically';
NOTE: Libref BANK_LIB was successfully assigned as follows:
      Engine:        V9
      Physical Name: P:\temp - to be deleted periodically
2
3    filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';
4
5    proc import datafile=banking
6    out=bank_lib.banking
7    dbms=xlsx
8    replace;
9    run;

NOTE:    Variable Name Change.  Form 1 - Report Summary -> Form_1___Report_Summary
NOTE:    Variable Name Change.  444 -> _444
NOTE:    Variable Name Change.  Employer Number -> Employer_Number
NOTE:    Variable Name Change.  
PO Box 5777
300 Union St.
Sa -> VAR6
NOTE:    Variable Name Change.  002001 -> _002001
NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with
      options MSGLEVEL=I.
NOTE: The import data set has 905 observations and 7 variables.
NOTE: BANK_LIB.BANKING data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           1.45 seconds
      cpu time            0.11 seconds


10   proc contents data=bank_lib.banking;
NOTE: Writing HTML Body file: sashtml.htm
11   run;

NOTE: PROCEDURE CONTENTS used (Total process time):
      real time           1.00 seconds
      cpu time            0.10 seconds


12   proc print data=bank_lib.banking;
13   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.12 seconds
      cpu time            0.11 seconds


14   data want;
15       set bank_lib.banking;
16       nwords=countw(columnB,' ');
17       do i=1 to nwords;
18           thisword=scan(columnB,i,' ');
19           contains_at_symbol=find(thisword,'@');
20           if contains_at_symbol>0 then do;
21              email_address=thisword;
22              output;
23           end;
24       end;
25       drop i nwords thisword;
26   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: The data set WORK.WANT has 22 observations and 9 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


27   data want2;
28       set bank_lib.banking;
29       nwords=countw(columnB,' ');
30       do i=1 to nwords;
31           thisword=scan(columnB,i,' ');
32           contains_at_symbol=find(thisword,'@');
33           if contains_at_symbol>0 then do;
34              email_address=thisword;
35              output;
36           end;
37       end;
38       drop i nwords thisword;
39   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: The data set WORK.WANT2 has 22 observations and 9 variables.
NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.00 seconds


want2.PNG

Tom
Super User Tom
Super User

Are you sure?  Your photograph only shows values for one observation and two variables.

Your SAS log says that the dataset has 

NOTE: The data set WORK.WANT has 22 observations and 9 variables.

What do you see when you look at the variable WORD where the email address was saved?

SASUser_77
Fluorite | Level 6

I took a screen capture of only one observation as an example. There are indeed 22 observations in the want2 table. I want to extract only the email addresses. 

 

I do not understand what you mean when you say variable word, can you clarify?

Tom
Super User Tom
Super User

@SASUser_77 wrote:

I took a screen capture of only one observation as an example. There are indeed 22 observations in the want2 table. I want to extract only the email addresses. 

 

I do not understand what you mean when you say variable word, can you clarify?


If you run a program like this:

data want ;
  set sashelp.class;
  age2 = age**2 ;
run;

If you want to check if it worked by browsing the dataset (say by using whatever tool you took the photograph of) then you need to look at the value of the new variable AGE2 and not just the values of the old variables NAME or AGE .

svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10

This is a back-of-the-napkin solution that assumes that @ symbols are only in email addresses:

 

data need;
email = "blah blah henry_fifth@gmail.com blah"; /* example of a field with text and an email address*/
end= scan(email,2, "@");
start = scan(email,1, "@");
email_new = catx('@', scan(start, -1, ' '), scan(end, 1, ' '));
run;

SASUser_77
Fluorite | Level 6

I do not know the text of each of the email addresses that I need. There are hundreds of email addresses I need to extract. 

svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10
My code was just to help with thinking about the functions that could work. Did you try running something like:

data want;
set have;
end= scan(columnB,2, "@");
start = scan(columnB,1, "@");
email = catx('@', scan(start, -1, ' '), scan(end, 1, ' '));

SASUser_77
Fluorite | Level 6

The output table looks like the original table I have. Maybe I did something wrong. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 26 replies
  • 2539 views
  • 2 likes
  • 5 in conversation