Hello Experts,
I have a dataset contain only one variable, within the variable, the email address could be displayed in a single record or two records ...or many records.
How do I write a SAS code to concatenate all the email address into a single line and remove all the other email address :
(if the email address displayed in only a single line, we don't need to do any modification, if the email address displayed in more than one lines then all the email address should be joint together and remove all the other email address)
(Please ignore the envelope sign below)
Original dataset:
a
b
c
d
[the emails could be displayed in a single line or multiple lines
...
...
]
a
s
d
f
g
Desired output:
a
b
c
d
e@xxx.com a@xxx.com b@xxx.com c@xxx.com
a
s
d
f
g
Thank you!!!
Hi,
What identifies that the emails belong together, when there is more than one line, and how are the emails attributed to accounts/people/records?
Does your data set look similar to the following or somehow different?
data
test1;
input id $ email1 $ email2 $ (etc.);
datalines;
a email1@xxx.com email2@xxx.com
b email1@xxx.com email2@xxx.com
e email1@xxx.com email2@xxx.com
;
By the way, those little envelope icons were not meant to be in the sample/example datalines code above, those got somehow added to the reply.
Yes, that's right! please ignore the little envelope icon.
I want all the emails are combined or joint together to form a single line that represent all the emails and delete all the other records with "@" sign(only keep the complete list and all the other records without"@" sign.
And since I am posting responses, I might as well fix another error in my example above which was initially meant simply as an illustration to support my question but I don't want there to be incorrect code examples. Obviously, we need a TRUNCOVER and either a LENGTH statement before the INPUT statement or change the input stile to modified input which I have choses. Here it is again (although I can't do anything about the envelope icons, please ignore those):
data test1;
infile datalines truncover;
input id: $1. email1: $14. email2: $14.;
datalines;
a email1@xxx.com email2@xxx.com
b email1@xxx.com email2@xxx.com
e email1@xxx.com email2@xxx.com
;
Hi MaikH,
The actual record is like(ignore envelope icon) :
There should be only a single variable to include all the observation.
Original dataset:
a
b
c
d
[the emails could be displayed in a single line or multiple lines
]
a
s
d
f
g
Desired output:
a
b
c
d
a@xxxxxx.com b@xxxxxx.com c@xxxxxx.com d@xxxxxx.com
a
s
d
f
g
How do you know these email address are in the same group ?
If I understood what you mean .
data test1; input id : $40.; datalines; a email1@xxx.com email2@xxx.com b b b c email1@xxx.com d email1@xxx.com email1@xxx.com email2@xxx.com ; run; data temp; set test1; flag=findc('@',id); run; data want; set temp; by flag notsorted; length x _id $ 200; retain x ; if first.flag then call missing(x); x=catx(' ',x,id); if last.flag and flag then do;_id=x;output;end; if not flag then do;_id=id;output;end; keep _id; run;
Xia Keshan
Right, that's my question as well: how do you know which emails to group together in one record if there is only one column/variable containing the emails? Xia, it looks like you are interpreting the post as emails under a given letter, for example, all the emails following value "a" belong together, no emails under "b", one email under "c", and so on. Is that right?
No, actually that is not right.
According to OP's post :
"if the email address displayed in only a single line, we don't need to do any modification, if the email address displayed in more than one lines then all the email address should be joint together"
My reckoning is if there are two or more emails side by side , then put them together in a row . if there is only one email (i.e. previous is not a email, next is not a email either ), keep it intact .
Xia Keshan
Thanks, the problem is actually getting solved.
I want to group all the emails together to form one single record for all emails regardless any classification and eliminate all the emails left over after form single line(only keep the single line for the combination of all emails) and keep all the other non-email observations in the table.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.