DATA Step, Macro, Functions and more

DLM is @ and Data has @ (in email)

Reply
Frequent Contributor
Posts: 140

DLM is @ and Data has @ (in email)

Hi

 

i am having data with DLm as @ and data is also having @ and few have email id and few data is not having email ids

 

Ex:

 

Name |age| email |loc

 

A@10@a@.gm@vij

B@12@ @ kl

 

Trusted Advisor
Posts: 1,372

Re: DLM is @ and Data has @ (in email)

Read the email as two cosequtive variable and concatenate to to create one email variable

the you can drop the two origin varaibles.

 

Frequent Contributor
Posts: 140

Re: DLM is @ and Data has @ (in email)

Name @ age @ email  @ loc

a@10@ test@gmail.com@cul

b@12@  @mum

PROC Star
Posts: 7,360

Re: DLM is @ and Data has @ (in email)

Your second example record doesn't contain an email. Do you have missing values for any of the variables but, more importantly, are you missing any emails in your data?

 

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 140

Re: DLM is @ and Data has @ (in email)

Yes in some data there is no email id

Trusted Advisor
Posts: 1,372

Re: DLM is @ and Data has @ (in email)

When there is no email id then probably you may have two or more cosequents of @ (thhat is: @@  or @@@ )

to deal with definne DSD option in your INFILE statement, then you will get value to each variable or missing value.

Super User
Super User
Posts: 7,399

Re: DLM is @ and Data has @ (in email)

I am sorry, but who uses @ as a delimiter in anything other than an email?  Thats mad.  There are any number of characters, tab, comma, pipe etc.  Its not surising your having issues.

Regular Contributor
Posts: 181

Re: DLM is @ and Data has @ (in email)

When it's upto me and I don't want to use a comma I tend to use a tilde ~ as I've never seen that used for anything else in a genuine string...

PROC Star
Posts: 7,360

Re: DLM is @ and Data has @ (in email)

@sas_Forum: Not sure if you ever got a useable solution. I'd try something like the following:

data have (drop=_:);
  infile cards dlm='~' dsd truncover;
  input @;
  if count(_infile_,'@') eq 3 then _infile_=translate(_infile_,'~','@');
  else do _i=4 to 1 by -1;
    if _i ne 3 then do;
      call scan(_infile_ , _i, _position, _length, '@');
      substr(_infile_,_position+_length,1)='~';
    end;
  end;
  input Name $ age email $ loc $;
  cards;
A@10@a@.gm@vij
B@12@ @ kl
;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 9,673

Re: DLM is @ and Data has @ (in email)

It is like a sandwich. first eat the top bread,then eat the bottom bread, the left is hotdog.

 

data have;
input;
name=scan(_infile_,1,'@');
age=scan(_infile_,2,'@');
call scan(_infile_,2,p,l,'@');
call scan(_infile_,-1,p1,l1,'@');
loc=substr(_infile_,p+l+1,p1-p-l-2);
drop p l p1 l1;
cards;
a@10@ test@gmail.com@cul
b@12@  @mum
;
run;
Super User
Super User
Posts: 6,499

Re: DLM is @ and Data has @ (in email)

First tell whoever is making the file that they need to add quotes around any value that contains the delimited. If they cannot generate a file that can be parsed then why generate the file at all?

 

If you have only one field that could contain the delimiter and every line has every field then you can read from the beginning and the end and whatever is left over is what to use for the middle field.  You can let the INPUT statement handle reading the beginning, but to read the ending field(s) you will need to use programming.

 

So for your example with the possible delimiters appearing in the third of four fields you can read the first two with INPUT, the fourth with SCAN() and then using SUBSTRN() to pull out the value for the third.  Use the COLUMN= option on the INFILE statement to find where the third field starts. The length of the field is the total length of the input line minus the length of the last word and minus the starting location. If you values have leading spaces after the delimiter like in some of the examples in this thread then you probably wnat to use the LEFT() function to remove those.

 

data have;
  infile cards dsd dlm='@' column=cc truncover ;
  length name $20 age 8 email loc $100;
  input name age @ ;
  loc = scan(_infile_,-1,'@');
  email=left(substrn(_infile_,cc,lengthn(_infile_)-cc-lengthn(loc)));
cards;
a@10@ test@gmail.com@cul
b@12@@mum
;
Frequent Contributor
Posts: 117

Re: DLM is @ and Data has @ (in email)

Hi,

 

Create two variables for holding the mail and use if condition to concnate if there is actually mail id and if not send the values to next variable. I'm trying to make the code as simple as possible by understanding how the data pattern appears.

 

data test(drop=mail_ extra);
infile datalines dlm="@" dsd truncover;
input name :$8. age :8. mail_: $20. extra: $10. loc: $8. @;
if find(extra,".com","i") then mail=COMPRESS(CAT(mail_,"@",extra));
else loc=extra;
datalines;
user1@10@test1@gmail.com@value1
user2@20@@value2
user3@13@test3@yahoo.com@value3
;
run;

Thanks,

Suryakiran

Ask a Question
Discussion stats
  • 11 replies
  • 144 views
  • 0 likes
  • 8 in conversation