BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JAR
Obsidian | Level 7 JAR
Obsidian | Level 7

Could someone please help me with the following situation:

I have a raw data file, which has the following content:

Name:Tom

Age:24

Gender:Male

Email:Tom@test.com

Name:Dick

Age:32

Gender:Male

Name: Harry

Age:30

Gender:Male

Email:Harry@test.com

/*note that the second observation does not have Email */

My code (below) reads this:

data Basic1;

  length Record $100;

  infile "D:\Analytics\Data\Address.txt"  ;

  input @'Name:' name : $200. @@;

  if not missing(name);

  input @'Age:' age : 3. @@;

  if not missing(age);

  input @'Gender:' gender : $1. @@;

  if not missing(gender);

  input @'Email:' email : $20. @@;

  if not missing(email);

drop record;

run;

SAS reads only two observations, and the email for the second observation is taken from the third.

Obs        name    age         gender email

1              Tom       24           M            Tom@test.com

2              Dick        32           M            Harry@test.com

Could someone help me to fix this. Ideally, the email variable for the second observation, as it is missing, should be "".

Thanking in advance,

Jijil Ramakrishnan

1 ACCEPTED SOLUTION

Accepted Solutions
esjackso
Quartz | Level 8

Curiosity killed the cat with me and I came up with the method I would use:

data Basic1;

  length Rec $ 1000 Name $ 200 Age 8 Gender $ 1 Email $ 20;

  drop rec;

  infile "...your path...\Address.txt" end=last;

  input ;

  rec = _infile_;

  If index(rec,'Name:')>0 then do;

  Name = strip(scan(rec,2,":"));

  Age = .;

  Gender = "";

  Email = "";

  end;

  else if index(rec, 'Age:')>0 then Age = put(scan(rec,2,":"),8.);

  else if index(rec, 'Gender:')>0 then Gender = strip(scan(rec,2,":"));

  else if index(rec, 'Email:')>0 then Email = strip(scan(rec,2,":"));

  retain Name Age Gender Email;

  if rec = "" or last = 1 then output;

run;

I think this accomplishes what you need.

EJ

View solution in original post

14 REPLIES 14
esjackso
Quartz | Level 8

I believe your problem is your having sas looking for  "Email:"  (input @'Email:'), so for the second obs it jumps to the third observation which is the next occurrence of  that string.

Im sure there is an elegant way in the input statement to handle this, I would probably just read the data into a single variable and parse it with data step logic to get what you want.

EJ

JAR
Obsidian | Level 7 JAR
Obsidian | Level 7

Dear EJ,

Thanks for the tip. Before I try to parse, I thought we we could fix it in the input statement itself.

If someone could help me to do the following, it will be great:

If the string "Name" appears before "@" character, email ="".

I searched on Google but failed. Hard to believe that it is a rare problem...:(

Regards,

Jijil

jakarman
Barite | Level 11

Some remarks:

1/ The named list approach would be easier. It is pity you have the : and no = as separation. Perhaps something smart to change : = roles? SAS(R) 9.3 Statements: Reference (list input)

2/ I do not understand your approach.

As you have all:  if not missing(...); statements you are actually coding you do not one record where some variable is missing

You got exactly what you have coded.  2 records where nothing is missing.

The drop record ;  statement is coding to drop the variable record. As not being used yet it creates it and immediate drop it. 

I assume some basic concept of the data-step is failing. SAS data-steps are different to normal record processing of other languages that it does all "next record: processing automatic. You just code what should happen with a current record.

For a smart read process, reading the lines.

- is the blank line always there en indicating en of record?  (name age gender mail)

- is the order Always the same?

- can you intrpreted line/for line the values?

3/ To remember when alle data would be delivered as one record.

SAS(R) 9.3 Statements: Reference (missover  == flowover scanover stopover truncover)

Would suggest missover when records are at the end missing

---->-- ja karman --<-----
JAR
Obsidian | Level 7 JAR
Obsidian | Level 7

Dear Mr. Karman,

Thank you.

Here is how I can answer your questions

2.if not missing(..);is not necessary. The actual code is

data Basic1;

      infile "D:\Analytics\Data\Address.txt" ;

       input @'Name:' name : $200. @@;

       input @'Age:' age : 3. @@;

       input @'Gender:' gender : $1. @@;

       input @'Email:' email : $20. @@;

run;

(everything else was sort of my attempts to circumvent this problem. apologies)

- in the raw data, blank line is the end of a record

- order is always the same

- except email, other fields are all mandatory

(I am sorry, I do not know how to respond to other points)

Regards,

Jijil

jakarman
Barite | Level 11

try,

data Basic1;

      infile "D:\Analytics\Data\Address.txt" ;

       input @'Name:' name : $200. /

                    @'Age:' age : 3. /

                    @'Gender:' gender : $1. / tmpvar $1. @ ;

    IF length(tmpvar) > 1 Then input   @'Email:' email : $20. / ;

run;

(code not checked) 

Thinking the approach:

The "/" first 3 records can be processed in one input statement multiple lines at once.

One @ is enough to hold the iteration within this record

It will read the blank/non-blank line.

When blank next iteration....  otherwhise process mail and proceed one line.

This is playing with the line-orders and input.

Exercise to learn inputprocessing.  

---->-- ja karman --<-----
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

probably Jaap's solution would be the quickest and easiest and quite straightforward

data Basic1;

infile 'C:\Users\Trajce\Contacts\Desktop\test.txt' missover;

input @'Name:' name : $200. /

      @'Age:' age : 3. /

      @'Gender:' gender : $1. /

      @'Email:' email : $20.

      #4;

      proc print;run;

esjackso
Quartz | Level 8

Curiosity killed the cat with me and I came up with the method I would use:

data Basic1;

  length Rec $ 1000 Name $ 200 Age 8 Gender $ 1 Email $ 20;

  drop rec;

  infile "...your path...\Address.txt" end=last;

  input ;

  rec = _infile_;

  If index(rec,'Name:')>0 then do;

  Name = strip(scan(rec,2,":"));

  Age = .;

  Gender = "";

  Email = "";

  end;

  else if index(rec, 'Age:')>0 then Age = put(scan(rec,2,":"),8.);

  else if index(rec, 'Gender:')>0 then Gender = strip(scan(rec,2,":"));

  else if index(rec, 'Email:')>0 then Email = strip(scan(rec,2,":"));

  retain Name Age Gender Email;

  if rec = "" or last = 1 then output;

run;

I think this accomplishes what you need.

EJ

JAR
Obsidian | Level 7 JAR
Obsidian | Level 7

Thank you EJ!

Jijil Ramakrishnan

koomalkc
Fluorite | Level 6

EJ,

Running yr code for 'Missing Data', We are getting only last observation. Is that our ojective to get. Else, how to get all (3) observations using yr code.? could you please help us.

-Thanks

_Kamal

esjackso
Quartz | Level 8

Im afraid I dont understand the problem. The code works with the supplied sample data.

If your data is different then yes you would have to edit the code to allow for the differences. My guess is that you dont have empty rows in your txt file between observations like the sample data does so the output line only occurs on the last line of the input file where end of file flag is set (end = last .... last = 1 then output).

EJ

esjackso
Quartz | Level 8

You may want to start a new thread to solve any new issues.

EJ

koomalkc
Fluorite | Level 6

EJ,

I'm exactly taking about the supplied sample data posted by JAR. I just want to know how to get all 3 observations (one observation with missing email) with your code.

Following is the desired output..

Obs        name    age         gender email

1              Tom       24           M            Tom@test.com

2              Dick       32           M           

3             Harry     30            M          Harry@test.com

=Kamal


esjackso
Quartz | Level 8

Im not sure how to respond ... the posted code when used with the sample provides exactly what you show your desired output to be:

10-4-2013 8-06-19 AM.jpg

So Im not sure how to help you.

EJ

koomalkc
Fluorite | Level 6

EJ,

You are absolutely right, I don't know what went wrong last time when I ran the code . Today , I ran again and is  perfect.Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2062 views
  • 4 likes
  • 5 in conversation