BookmarkSubscribeRSS Feed
gyambqt
Obsidian | Level 7

Hello Experts,


I have a dataset contain only one variable, within the variable, the email address could be displayed in a single record or two records ...or many records.

How do I write a SAS code to concatenate all the email address into a single line and remove all the other email address :

(if the email address displayed in only a single line, we don't need to do any modification, if the email address displayed in more than one lines then all the email address should be joint together and remove all the other email address)


(Please ignore the envelope sign below)

Original dataset:

a

b

c

d

[the emails could be displayed in a single line or multiple lines

...

e@xxx.com a@xxx.com

b@xxx.com c@xxx.com

...

]

a

s

d

f

g

Desired output:

a

b

c

d

e@xxx.com a@xxx.com b@xxx.com c@xxx.com

a

s

d

f

g

Thank you!!!

10 REPLIES 10
MaikH_Schutze
Quartz | Level 8

Hi,

What identifies that the emails belong together, when there is more than one line, and how are the emails attributed to accounts/people/records?

Does your data set look similar to the following or somehow different?

 

data
test1;

input id $ email1 $ email2 $ (etc.);

datalines;

a email1@xxx.com email2@xxx.com

b email1@xxx.com email2@xxx.com

c email1@xxx.com

c email2@xxx.com

d email1@xxx.com

e email1@xxx.com email2@xxx.com

;

MaikH_Schutze
Quartz | Level 8

By the way, those little envelope icons were not meant to be in the sample/example datalines code above, those got somehow added to the reply.

gyambqt
Obsidian | Level 7

Yes, that's right! please ignore the little envelope icon.

I want all the emails are combined or joint together to form a single line that represent all the emails and delete all the other records with "@" sign(only keep the complete list and all the other records without"@" sign.

MaikH_Schutze
Quartz | Level 8

And since I am posting responses, I might as well fix another error in my example above which was initially meant simply as an illustration to support my question but I don't want there to be incorrect code examples. Obviously, we need a TRUNCOVER and either a LENGTH statement before the INPUT statement or change the input stile to modified input which I have choses. Here it is again (although I can't do anything about the envelope icons, please ignore those):

data test1;

infile datalines truncover;

input id: $1. email1: $14. email2: $14.;

datalines;

a email1@xxx.com email2@xxx.com

b email1@xxx.com email2@xxx.com

c email1@xxx.com

c email2@xxx.com

d email1@xxx.com

e email1@xxx.com email2@xxx.com

;

gyambqt
Obsidian | Level 7

Hi MaikH,

The actual record is like(ignore envelope icon) :

There should be only a single variable to include all the observation.

Original dataset:

a

b

c

d

[the emails could be displayed in a single line or multiple lines

a@xxxxxx.com b@xxxxxx.com

c@xxxxxx.com d@xxxxxx.com

]

a

s

d

f

g

Desired output:

a

b

c

d

a@xxxxxx.com b@xxxxxx.com c@xxxxxx.com d@xxxxxx.com

a

s

d

f

g

Ksharp
Super User

How do you know these email address are in the same group ?

Ksharp
Super User

If I understood what you mean .


data test1;
input id : $40.;
datalines;
a 
email1@xxx.com 
email2@xxx.com
b
b
b
c
email1@xxx.com
d
email1@xxx.com 
email1@xxx.com 
email2@xxx.com
;
run;
data temp;
 set test1;
 flag=findc('@',id);
run;
data want;
 set temp;
 by flag notsorted;
 length x _id $ 200;
 retain x ;
 if first.flag then call missing(x);
 x=catx(' ',x,id);
 if last.flag and flag  then do;_id=x;output;end;
 if not flag then do;_id=id;output;end;
 keep _id;
run;

Xia Keshan

MaikH_Schutze
Quartz | Level 8

Right, that's my question as well: how do you know which emails to group together in one record if there is only one column/variable containing the emails? Xia, it looks like you are interpreting the post as emails under a given letter, for example, all the emails following value "a" belong together, no emails under "b", one email under "c", and so on. Is that right?

Ksharp
Super User

No, actually that is not right.

According to OP's post :

"if the email address displayed in only a single line, we don't need to do any modification, if the email address displayed in more than one lines then all the email address should be joint together"

My reckoning is if there are two or more emails side by side , then put them together in a row . if there is only one email (i.e. previous is not a email, next is not a email either ), keep it intact .

Xia Keshan

gyambqt
Obsidian | Level 7

Thanks, the problem is actually getting solved.

I want to group all the emails together to form one single record for all emails regardless any classification and eliminate all the emails left over after form single line(only keep the single line for the combination of all emails) and keep all the other non-email observations in the table.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1893 views
  • 0 likes
  • 3 in conversation