BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SASUser_77
Fluorite | Level 6

My dataset contains a column with email addresses. For example this is all the text in the variable named columnB:

 

        " Rima Loki

        President and Chief Executive Officer

        1260, boul. René-Lévesque Ouest, bur.600

        Montreal, Québec

        H3G 0E3

        Telephone : (416) 865-5800

        Email Address : Rima.loki@loo.ca "

 

I want to extract all the email addresses from this variable so that it only pulls out "Rima.loki@loo.ca"

 

How can I do this?

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

If all you want is just the email address variable then use a KEEP statement:

data want;
  keep email_address;
  set bank_lib.banking;
  nwords=countw(columnB,' ');
       do i=1 to nwords;
           thisword=scan(columnB,i,' ');
           contains_at_symbol=find(thisword,'@');
           if contains_at_symbol>0 then do;
              email_address=thisword;
              output;
           end;
       end;
   run;

View solution in original post

26 REPLIES 26
PaigeMiller
Diamond | Level 26
data want;
     set have;
     if find(columnB,'@')>0;
run;
--
Paige Miller
SASUser_77
Fluorite | Level 6

Hello, I received this error message:

 

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
149:14
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete. When this step was stopped there were 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

 

Would you know how to fix it? Thank you!

PaigeMiller
Diamond | Level 26

@SASUser_77 wrote:

Hello, I received this error message:

 

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
149:14
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete. When this step was stopped there were 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

 

Would you know how to fix it? Thank you!


We don't know what you did.

 

When you get errors or warnings in the log, SHOW US THE ENTIRE LOG for this data step or PROC. Do not show us parts of the log. Please copy the log as text and then paste it into the window that appears when you click on the </> icon.

PaigeMiller_0-1663012019648.png

--
Paige Miller
SASUser_77
Fluorite | Level 6

I ran the code again with some fixes to my data and it extracts the whole contents of the cells with email addresses, not just the email addresses. 

 

I attached the log here. I am using SAS in a different window than the screenshot you provided. Thank you.

 

 

PaigeMiller
Diamond | Level 26

Many people, including me, refuse to download Microsoft Office attachments, as they can be security threats. Please provide the log as I explained, click on the </> icon here in the SAS communities and paste the log into the window that appears.

--
Paige Miller
SASUser_77
Fluorite | Level 6

This is my code:


libname bank_lib 'P:\temp - to be deleted periodically';

filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';

proc import datafile=banking
out=bank_lib.banking
dbms=xlsx
replace;
run;
proc contents data=bank_lib.banking;
run;
proc print datat=bank_lib.banking;
run;

data want;
set bank_lib.banking;
if find(columnB,'@')>0;
run;

 

This is my log:

 

NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M6 MBCS3170)

NOTE: This session is executing on the X64_10PRO  platform.



NOTE: Analytical products:

      SAS/STAT 15.1
      SAS/ETS 15.1
      SAS/OR 15.1
      SAS/IML 15.1
      SAS/QC 15.1

NOTE: Additional host information:

 X64_10PRO WIN 10.0.19041  Workstation

NOTE: SAS initialization used:
      real time           0.89 seconds
      cpu time            0.60 seconds

1
2    libname bank_lib 'P:\temp - to be deleted periodically';
NOTE: Libref BANK_LIB was successfully assigned as follows:
      Engine:        V9
      Physical Name: P:\temp - to be deleted periodically
3
4    filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';
5
6    proc import datafile=banking
7                out=bank_lib.banking
8                dbms=xlsx
9                replace;
10   run;

NOTE:    Variable Name Change.  Form 1 - Report Summary -> Form_1___Report_Summary
NOTE:    Variable Name Change.  444 -> _444
NOTE:    Variable Name Change.  Employer Number -> Employer_Number
NOTE:    Variable Name Change.  
PO Box 5777
300 Union St.
Sa -> VAR6
NOTE:    Variable Name Change.  002001 -> _002001
NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with
      options MSGLEVEL=I.
NOTE: The import data set has 905 observations and 7 variables.
NOTE: BANK_LIB.BANKING data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           1.88 seconds
      cpu time            0.11 seconds


11   proc contents data=bank_lib.banking;
NOTE: Writing HTML Body file: sashtml.htm
12   run;

NOTE: PROCEDURE CONTENTS used (Total process time):
      real time           0.70 seconds
      cpu time            0.09 seconds


13   proc print datat=bank_lib.banking;
                -----
                1
WARNING 1-322: Assuming the symbol DATA was misspelled as datat.
14   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.13 seconds
      cpu time            0.12 seconds


15
16   data want;
17        set bank_lib.banking;
18        if find(columnB,'@')>0;
19   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: The data set WORK.WANT has 22 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


PaigeMiller
Diamond | Level 26

I'm lost. This code and this log do not produce the problems you spoke of earlier.

--
Paige Miller
SASUser_77
Fluorite | Level 6

Sorry, I fixed it so the error message no longer pops up. But when I do run the code it does not extract email addresses only.

PaigeMiller
Diamond | Level 26

Please show us a portion of the output that you think is not correct. Explain why it is not correct. A screen capture of the output data set is fine; use the "Insert Photos" icon to include the screen capture in your reply. Do NOT attach files.

--
Paige Miller
SASUser_77
Fluorite | Level 6

 

Please find below the output table "want". I redacted some info from the names and email addresses of the people for privacy reasons.

 

I would like to extract only the email addresses from columnB but this table shows all the text from cells containing email addresses. 

want_capture.PNG

PaigeMiller
Diamond | Level 26

 

This code is untested, as I don't have your data.

 

data want;
    set have;
    nwords=countw(columnB,' ');
    do i=1 to nwords;
        thisword=scan(columnB,i,' ');
        contains_at_symbol=find(thisword,'@');
        if contains_at_symbol>0 then do;
           email_address=thisword;
           output;
        end;
    end;
    drop i nwords thisword;
run;

 

--
Paige Miller
SASUser_77
Fluorite | Level 6

When I run the code, it does not seem to extract email addresses only. It looks just like my previous result. Please find below my code and attached is the log and screen capture of my "want" table.Capture.PNG

 

 


libname bank_lib 'P:\temp - to be deleted periodically';

filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';

proc import datafile=banking
out=bank_lib.banking
dbms=xlsx
replace;
run;
proc contents data=bank_lib.banking;
run;
proc print data=bank_lib.banking;
run;

data want;
set bank_lib.banking;
nwords=countw(columnB,' ');
do i=1 to nwords;
thisword=scan(columnB,i,' ');
contains_at_symbol=find(thisword,'@');
if contains_at_symbol>0 then do; email_address=thisword; output; end;
end;
drop i nwords thisword;
run;

NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M6 MBCS3170)
      Licensed to, Site 70016086.
NOTE: This session is executing on the X64_10PRO  platform.



NOTE: Analytical products:

      SAS/STAT 15.1
      SAS/ETS 15.1
      SAS/OR 15.1
      SAS/IML 15.1
      SAS/QC 15.1

NOTE: Additional host information:

 X64_10PRO WIN 10.0.19041  Workstation

NOTE: SAS initialization used:
      real time           0.89 seconds
      cpu time            0.60 seconds

1
2    libname bank_lib 'P:\temp - to be deleted periodically';
NOTE: Libref BANK_LIB was successfully assigned as follows:
      Engine:        V9
      Physical Name: P:\temp - to be deleted periodically
3
4    filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';
5
6    proc import datafile=banking
7                out=bank_lib.banking
8                dbms=xlsx
9                replace;
10   run;

NOTE:    Variable Name Change.  Form 1 - Report Summary -> Form_1___Report_Summary
NOTE:    Variable Name Change.  444 -> _444
NOTE:    Variable Name Change.  Employer Number -> Employer_Number
NOTE:    Variable Name Change.  
PO Box 5777
300 Union St.
Sa -> VAR6
NOTE:    Variable Name Change.  002001 -> _002001
NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with
      options MSGLEVEL=I.
NOTE: The import data set has 905 observations and 7 variables.
NOTE: BANK_LIB.BANKING data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           1.88 seconds
      cpu time            0.11 seconds


11   proc contents data=bank_lib.banking;
NOTE: Writing HTML Body file: sashtml.htm
12   run;

NOTE: PROCEDURE CONTENTS used (Total process time):
      real time           0.70 seconds
      cpu time            0.09 seconds


13   proc print datat=bank_lib.banking;
                -----
                1
WARNING 1-322: Assuming the symbol DATA was misspelled as datat.
14   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.13 seconds
      cpu time            0.12 seconds


15
16   data want;
17        set bank_lib.banking;
18        if find(columnB,'@')>0;
19   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: The data set WORK.WANT has 22 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


20
21   data want;
22       set bank_lib.banking;
23       nwords=countw(columnB,' ');
24       do i=1 to nwords;
25           thisword=scan(columnB,i,' ');
26           contains_at_symbol=find(thisword,'@');
27           if contains_at_symbol>0 then do;           email_address=thisword;           output;        end;
28       end;
29       drop i nwords thisword;
30   run;

ERROR: You cannot open WORK.WANT.DATA for output access with member-level control because WORK.WANT.DATA is in use by you in
resource environment ViewTable Window.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.70 seconds
      cpu time            0.04 seconds


31   data want;
32       set bank_lib.banking;
33       nwords=countw(columnB,' ');
34       do i=1 to nwords;
35           thisword=scan(columnB,i,' ');
36           contains_at_symbol=find(thisword,'@');
37           if contains_at_symbol>0 then do;           email_address=thisword;           output;        end;
38       end;
39       drop i nwords thisword;
40   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: The data set WORK.WANT has 22 observations and 9 variables.
NOTE: DATA statement used (Total process time):
      real time           0.72 seconds
      cpu time            0.04 seconds


41   data want;
42       set bank_lib.banking;
43       nwords=countw(columnB,' ');
44       do i=1 to nwords;
45           thisword=scan(columnB,i,' ');
46           contains_at_symbol=find(thisword,'@');
47           if contains_at_symbol>0 then do;           email_address=thisword;           output;        end;
48       end;
49       drop i nwords thisword;
50   run;

ERROR: You cannot open WORK.WANT.DATA for output access with member-level control because WORK.WANT.DATA is in use by you in
resource environment ViewTable Window.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           1.42 seconds
      cpu time            0.06 seconds


51   data want;
52       set bank_lib.banking;
53       nwords=countw(columnB,' ');
54       do i=1 to nwords;
55           thisword=scan(columnB,i,' ');
56           contains_at_symbol=find(thisword,'@');
57           if contains_at_symbol>0 then do;           email_address=thisword;           output;        end;
58       end;
59       drop i nwords thisword;
60   run;

NOTE: There were 905 observations read from the data set BANK_LIB.BANKING.
NOTE: The data set WORK.WANT has 22 observations and 9 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds


SASKiwi
PROC Star

Close your VIEWTABLE windows before rerunning your program. These are preventing you updating your SAS tables.

SASUser_77
Fluorite | Level 6

I closed my viewtables and re-ran the code, this time re-naming the output table too. It does not return only the email addresses.

 

 

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 26 replies
  • 3022 views
  • 2 likes
  • 5 in conversation