My dataset contains a column with email addresses. For example this is all the text in the variable named columnB:
" Rima Loki
President and Chief Executive Officer
1260, boul. René-Lévesque Ouest, bur.600
Montreal, Québec
H3G 0E3
Telephone : (416) 865-5800
Email Address : Rima.loki@loo.ca "
I want to extract all the email addresses from this variable so that it only pulls out "Rima.loki@loo.ca"
How can I do this?
If all you want is just the email address variable then use a KEEP statement:
data want;
keep email_address;
set bank_lib.banking;
nwords=countw(columnB,' ');
do i=1 to nwords;
thisword=scan(columnB,i,' ');
contains_at_symbol=find(thisword,'@');
if contains_at_symbol>0 then do;
email_address=thisword;
output;
end;
end;
run;
data want;
set have;
if find(columnB,'@')>0;
run;
Hello, I received this error message:
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
149:14
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete. When this step was stopped there were 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
Would you know how to fix it? Thank you!
@SASUser_77 wrote:
Hello, I received this error message:
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
149:14
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete. When this step was stopped there were 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
Would you know how to fix it? Thank you!
We don't know what you did.
When you get errors or warnings in the log, SHOW US THE ENTIRE LOG for this data step or PROC. Do not show us parts of the log. Please copy the log as text and then paste it into the window that appears when you click on the </> icon.
I ran the code again with some fixes to my data and it extracts the whole contents of the cells with email addresses, not just the email addresses.
I attached the log here. I am using SAS in a different window than the screenshot you provided. Thank you.
Many people, including me, refuse to download Microsoft Office attachments, as they can be security threats. Please provide the log as I explained, click on the </> icon here in the SAS communities and paste the log into the window that appears.
This is my code:
libname bank_lib 'P:\temp - to be deleted periodically';
filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';
proc import datafile=banking
out=bank_lib.banking
dbms=xlsx
replace;
run;
proc contents data=bank_lib.banking;
run;
proc print datat=bank_lib.banking;
run;
data want;
set bank_lib.banking;
if find(columnB,'@')>0;
run;
This is my log:
NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.4 (TS1M6 MBCS3170) NOTE: This session is executing on the X64_10PRO platform. NOTE: Analytical products: SAS/STAT 15.1 SAS/ETS 15.1 SAS/OR 15.1 SAS/IML 15.1 SAS/QC 15.1 NOTE: Additional host information: X64_10PRO WIN 10.0.19041 Workstation NOTE: SAS initialization used: real time 0.89 seconds cpu time 0.60 seconds 1 2 libname bank_lib 'P:\temp - to be deleted periodically'; NOTE: Libref BANK_LIB was successfully assigned as follows: Engine: V9 Physical Name: P:\temp - to be deleted periodically 3 4 filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx'; 5 6 proc import datafile=banking 7 out=bank_lib.banking 8 dbms=xlsx 9 replace; 10 run; NOTE: Variable Name Change. Form 1 - Report Summary -> Form_1___Report_Summary NOTE: Variable Name Change. 444 -> _444 NOTE: Variable Name Change. Employer Number -> Employer_Number NOTE: Variable Name Change. PO Box 5777 300 Union St. Sa -> VAR6 NOTE: Variable Name Change. 002001 -> _002001 NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with options MSGLEVEL=I. NOTE: The import data set has 905 observations and 7 variables. NOTE: BANK_LIB.BANKING data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 1.88 seconds cpu time 0.11 seconds 11 proc contents data=bank_lib.banking; NOTE: Writing HTML Body file: sashtml.htm 12 run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.70 seconds cpu time 0.09 seconds 13 proc print datat=bank_lib.banking; ----- 1 WARNING 1-322: Assuming the symbol DATA was misspelled as datat. 14 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: PROCEDURE PRINT used (Total process time): real time 0.13 seconds cpu time 0.12 seconds 15 16 data want; 17 set bank_lib.banking; 18 if find(columnB,'@')>0; 19 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: The data set WORK.WANT has 22 observations and 7 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
I'm lost. This code and this log do not produce the problems you spoke of earlier.
Sorry, I fixed it so the error message no longer pops up. But when I do run the code it does not extract email addresses only.
Please show us a portion of the output that you think is not correct. Explain why it is not correct. A screen capture of the output data set is fine; use the "Insert Photos" icon to include the screen capture in your reply. Do NOT attach files.
Please find below the output table "want". I redacted some info from the names and email addresses of the people for privacy reasons.
I would like to extract only the email addresses from columnB but this table shows all the text from cells containing email addresses.
This code is untested, as I don't have your data.
data want;
set have;
nwords=countw(columnB,' ');
do i=1 to nwords;
thisword=scan(columnB,i,' ');
contains_at_symbol=find(thisword,'@');
if contains_at_symbol>0 then do;
email_address=thisword;
output;
end;
end;
drop i nwords thisword;
run;
When I run the code, it does not seem to extract email addresses only. It looks just like my previous result. Please find below my code and attached is the log and screen capture of my "want" table.
libname bank_lib 'P:\temp - to be deleted periodically';
filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';
proc import datafile=banking
out=bank_lib.banking
dbms=xlsx
replace;
run;
proc contents data=bank_lib.banking;
run;
proc print data=bank_lib.banking;
run;
data want;
set bank_lib.banking;
nwords=countw(columnB,' ');
do i=1 to nwords;
thisword=scan(columnB,i,' ');
contains_at_symbol=find(thisword,'@');
if contains_at_symbol>0 then do; email_address=thisword; output; end;
end;
drop i nwords thisword;
run;
NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.4 (TS1M6 MBCS3170) Licensed to, Site 70016086. NOTE: This session is executing on the X64_10PRO platform. NOTE: Analytical products: SAS/STAT 15.1 SAS/ETS 15.1 SAS/OR 15.1 SAS/IML 15.1 SAS/QC 15.1 NOTE: Additional host information: X64_10PRO WIN 10.0.19041 Workstation NOTE: SAS initialization used: real time 0.89 seconds cpu time 0.60 seconds 1 2 libname bank_lib 'P:\temp - to be deleted periodically'; NOTE: Libref BANK_LIB was successfully assigned as follows: Engine: V9 Physical Name: P:\temp - to be deleted periodically 3 4 filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx'; 5 6 proc import datafile=banking 7 out=bank_lib.banking 8 dbms=xlsx 9 replace; 10 run; NOTE: Variable Name Change. Form 1 - Report Summary -> Form_1___Report_Summary NOTE: Variable Name Change. 444 -> _444 NOTE: Variable Name Change. Employer Number -> Employer_Number NOTE: Variable Name Change. PO Box 5777 300 Union St. Sa -> VAR6 NOTE: Variable Name Change. 002001 -> _002001 NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with options MSGLEVEL=I. NOTE: The import data set has 905 observations and 7 variables. NOTE: BANK_LIB.BANKING data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 1.88 seconds cpu time 0.11 seconds 11 proc contents data=bank_lib.banking; NOTE: Writing HTML Body file: sashtml.htm 12 run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.70 seconds cpu time 0.09 seconds 13 proc print datat=bank_lib.banking; ----- 1 WARNING 1-322: Assuming the symbol DATA was misspelled as datat. 14 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: PROCEDURE PRINT used (Total process time): real time 0.13 seconds cpu time 0.12 seconds 15 16 data want; 17 set bank_lib.banking; 18 if find(columnB,'@')>0; 19 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: The data set WORK.WANT has 22 observations and 7 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds 20 21 data want; 22 set bank_lib.banking; 23 nwords=countw(columnB,' '); 24 do i=1 to nwords; 25 thisword=scan(columnB,i,' '); 26 contains_at_symbol=find(thisword,'@'); 27 if contains_at_symbol>0 then do; email_address=thisword; output; end; 28 end; 29 drop i nwords thisword; 30 run; ERROR: You cannot open WORK.WANT.DATA for output access with member-level control because WORK.WANT.DATA is in use by you in resource environment ViewTable Window. NOTE: The SAS System stopped processing this step because of errors. NOTE: DATA statement used (Total process time): real time 0.70 seconds cpu time 0.04 seconds 31 data want; 32 set bank_lib.banking; 33 nwords=countw(columnB,' '); 34 do i=1 to nwords; 35 thisword=scan(columnB,i,' '); 36 contains_at_symbol=find(thisword,'@'); 37 if contains_at_symbol>0 then do; email_address=thisword; output; end; 38 end; 39 drop i nwords thisword; 40 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: The data set WORK.WANT has 22 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.72 seconds cpu time 0.04 seconds 41 data want; 42 set bank_lib.banking; 43 nwords=countw(columnB,' '); 44 do i=1 to nwords; 45 thisword=scan(columnB,i,' '); 46 contains_at_symbol=find(thisword,'@'); 47 if contains_at_symbol>0 then do; email_address=thisword; output; end; 48 end; 49 drop i nwords thisword; 50 run; ERROR: You cannot open WORK.WANT.DATA for output access with member-level control because WORK.WANT.DATA is in use by you in resource environment ViewTable Window. NOTE: The SAS System stopped processing this step because of errors. NOTE: DATA statement used (Total process time): real time 1.42 seconds cpu time 0.06 seconds 51 data want; 52 set bank_lib.banking; 53 nwords=countw(columnB,' '); 54 do i=1 to nwords; 55 thisword=scan(columnB,i,' '); 56 contains_at_symbol=find(thisword,'@'); 57 if contains_at_symbol>0 then do; email_address=thisword; output; end; 58 end; 59 drop i nwords thisword; 60 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: The data set WORK.WANT has 22 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.01 seconds
Close your VIEWTABLE windows before rerunning your program. These are preventing you updating your SAS tables.
I closed my viewtables and re-ran the code, this time re-naming the output table too. It does not return only the email addresses.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.