Even when I close the view tables, the want table does not contain only email addresses. Here is my code, log and output table.
libname bank_lib 'P:\temp - to be deleted periodically';
filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx';
proc import datafile=banking
out=bank_lib.banking
dbms=xlsx
replace;
run;
proc contents data=bank_lib.banking;
run;
proc print data=bank_lib.banking;
run;
data want;
set bank_lib.banking;
nwords=countw(columnB,' ');
do i=1 to nwords;
thisword=scan(columnB,i,' ');
contains_at_symbol=find(thisword,'@');
if contains_at_symbol>0 then do;
email_address=thisword;
output;
end;
end;
drop i nwords thisword;
run;
NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.4 (TS1M6 MBCS3170) Licensed to, Site 70016086. NOTE: This session is executing on the X64_10PRO platform. NOTE: Analytical products: SAS/STAT 15.1 SAS/ETS 15.1 SAS/OR 15.1 SAS/IML 15.1 SAS/QC 15.1 NOTE: Additional host information: X64_10PRO WIN 10.0.19041 Workstation NOTE: SAS initialization used: real time 0.94 seconds cpu time 0.53 seconds 1 libname bank_lib 'P:\temp - to be deleted periodically'; NOTE: Libref BANK_LIB was successfully assigned as follows: Engine: V9 Physical Name: P:\temp - to be deleted periodically 2 3 filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx'; 4 5 proc import datafile=banking 6 out=bank_lib.banking 7 dbms=xlsx 8 replace; 9 run; NOTE: Variable Name Change. Form 1 - Report Summary -> Form_1___Report_Summary NOTE: Variable Name Change. 444 -> _444 NOTE: Variable Name Change. Employer Number -> Employer_Number NOTE: Variable Name Change. PO Box 5777 300 Union St. Sa -> VAR6 NOTE: Variable Name Change. 002001 -> _002001 NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with options MSGLEVEL=I. NOTE: The import data set has 905 observations and 7 variables. NOTE: BANK_LIB.BANKING data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 1.61 seconds cpu time 0.04 seconds 10 proc contents data=bank_lib.banking; NOTE: Writing HTML Body file: sashtml.htm 11 run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.64 seconds cpu time 0.09 seconds 12 proc print data=bank_lib.banking; 13 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: PROCEDURE PRINT used (Total process time): real time 0.13 seconds cpu time 0.10 seconds 14 data want; 15 set bank_lib.banking; 16 nwords=countw(columnB,' '); 17 do i=1 to nwords; 18 thisword=scan(columnB,i,' '); 19 contains_at_symbol=find(thisword,'@'); 20 if contains_at_symbol>0 then do; 21 email_address=thisword; 22 output; 23 end; 24 end; 25 drop i nwords thisword; 26 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: The data set WORK.WANT has 22 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
If all you want is just the email address variable then use a KEEP statement:
data want;
keep email_address;
set bank_lib.banking;
nwords=countw(columnB,' ');
do i=1 to nwords;
thisword=scan(columnB,i,' ');
contains_at_symbol=find(thisword,'@');
if contains_at_symbol>0 then do;
email_address=thisword;
output;
end;
end;
run;
This works! Thank you so much.
@SASUser_77 wrote:
Sorry, I fixed it so the error message no longer pops up. But when I do run the code it does not extract email addresses only.
Because you did not ask it to extract the email address. You only asked it to locate the observations that had @ characters in them.
If you want to only extract the words in the string that might be email addresses then perhaps you should loop through the string word by word.
data want;
set bank_lib.banking;
where find(columnB,'@') ;
length word $200 ;
do i=1 to countw(columnB,' ');
word = scan(columnB,i,' ');
if find(word,'@') then output;
end;
run;
I ran that code and it returns cells with email addresses, not jus the email addresses. Here is my code, log and want2 table.
libname bank_lib 'P:\temp - to be deleted periodically'; filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx'; proc import datafile=banking out=bank_lib.banking dbms=xlsx replace; run; proc contents data=bank_lib.banking; run; proc print data=bank_lib.banking; run; data want2; set bank_lib.banking; nwords=countw(columnB,' '); do i=1 to nwords; thisword=scan(columnB,i,' '); contains_at_symbol=find(thisword,'@'); if contains_at_symbol>0 then do; email_address=thisword; output; end; end; drop i nwords thisword; run;
NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.4 (TS1M6 MBCS3170) Licensed to CANADIAN HUMAN RIGHTS COMMISSION, Site 70016086. NOTE: This session is executing on the X64_10PRO platform. NOTE: Analytical products: SAS/STAT 15.1 SAS/ETS 15.1 SAS/OR 15.1 SAS/IML 15.1 SAS/QC 15.1 NOTE: Additional host information: X64_10PRO WIN 10.0.19041 Workstation NOTE: SAS initialization used: real time 0.80 seconds cpu time 0.59 seconds 1 libname bank_lib 'P:\temp - to be deleted periodically'; NOTE: Libref BANK_LIB was successfully assigned as follows: Engine: V9 Physical Name: P:\temp - to be deleted periodically 2 3 filename banking 'P:\temp - to be deleted periodically\definitely delete\banking.xlsx'; 4 5 proc import datafile=banking 6 out=bank_lib.banking 7 dbms=xlsx 8 replace; 9 run; NOTE: Variable Name Change. Form 1 - Report Summary -> Form_1___Report_Summary NOTE: Variable Name Change. 444 -> _444 NOTE: Variable Name Change. Employer Number -> Employer_Number NOTE: Variable Name Change. PO Box 5777 300 Union St. Sa -> VAR6 NOTE: Variable Name Change. 002001 -> _002001 NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with options MSGLEVEL=I. NOTE: The import data set has 905 observations and 7 variables. NOTE: BANK_LIB.BANKING data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 1.45 seconds cpu time 0.11 seconds 10 proc contents data=bank_lib.banking; NOTE: Writing HTML Body file: sashtml.htm 11 run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 1.00 seconds cpu time 0.10 seconds 12 proc print data=bank_lib.banking; 13 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: PROCEDURE PRINT used (Total process time): real time 0.12 seconds cpu time 0.11 seconds 14 data want; 15 set bank_lib.banking; 16 nwords=countw(columnB,' '); 17 do i=1 to nwords; 18 thisword=scan(columnB,i,' '); 19 contains_at_symbol=find(thisword,'@'); 20 if contains_at_symbol>0 then do; 21 email_address=thisword; 22 output; 23 end; 24 end; 25 drop i nwords thisword; 26 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: The data set WORK.WANT has 22 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 27 data want2; 28 set bank_lib.banking; 29 nwords=countw(columnB,' '); 30 do i=1 to nwords; 31 thisword=scan(columnB,i,' '); 32 contains_at_symbol=find(thisword,'@'); 33 if contains_at_symbol>0 then do; 34 email_address=thisword; 35 output; 36 end; 37 end; 38 drop i nwords thisword; 39 run; NOTE: There were 905 observations read from the data set BANK_LIB.BANKING. NOTE: The data set WORK.WANT2 has 22 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.06 seconds cpu time 0.00 seconds
Are you sure? Your photograph only shows values for one observation and two variables.
Your SAS log says that the dataset has
NOTE: The data set WORK.WANT has 22 observations and 9 variables.
What do you see when you look at the variable WORD where the email address was saved?
I took a screen capture of only one observation as an example. There are indeed 22 observations in the want2 table. I want to extract only the email addresses.
I do not understand what you mean when you say variable word, can you clarify?
@SASUser_77 wrote:
I took a screen capture of only one observation as an example. There are indeed 22 observations in the want2 table. I want to extract only the email addresses.
I do not understand what you mean when you say variable word, can you clarify?
If you run a program like this:
data want ;
set sashelp.class;
age2 = age**2 ;
run;
If you want to check if it worked by browsing the dataset (say by using whatever tool you took the photograph of) then you need to look at the value of the new variable AGE2 and not just the values of the old variables NAME or AGE .
This is a back-of-the-napkin solution that assumes that @ symbols are only in email addresses:
data need;
email = "blah blah henry_fifth@gmail.com blah"; /* example of a field with text and an email address*/
end= scan(email,2, "@");
start = scan(email,1, "@");
email_new = catx('@', scan(start, -1, ' '), scan(end, 1, ' '));
run;
I do not know the text of each of the email addresses that I need. There are hundreds of email addresses I need to extract.
The output table looks like the original table I have. Maybe I did something wrong.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.