Help using Base SAS procedures

Missing Data

Accepted Solution Solved
Reply
Contributor JAR
Contributor
Posts: 45
Accepted Solution

Missing Data

Could someone please help me with the following situation:

I have a raw data file, which has the following content:

Name:Tom

Age:24

Gender:Male

Email:Tom@test.com

NameSmiley Very Happyick

Age:32

Gender:Male

Name: Harry

Age:30

Gender:Male

Email:Harry@test.com

/*note that the second observation does not have Email */

My code (below) reads this:

data Basic1;

  length Record $100;

  infile "D:\Analytics\Data\Address.txt"  ;

  input @'Name:' name : $200. @@;

  if not missing(name);

  input @'Age:' age : 3. @@;

  if not missing(age);

  input @'Gender:' gender : $1. @@;

  if not missing(gender);

  input @'Email:' email : $20. @@;

  if not missing(email);

drop record;

run;

SAS reads only two observations, and the email for the second observation is taken from the third.

Obs        name    age         gender email

1              Tom       24           M            Tom@test.com

2              Dick        32           M            Harry@test.com

Could someone help me to fix this. Ideally, the email variable for the second observation, as it is missing, should be "".

Thanking in advance,

Jijil Ramakrishnan


Accepted Solutions
Solution
‎08-16-2013 03:54 PM
Super Contributor
Posts: 334

Re: Missing Data

Curiosity killed the cat with me and I came up with the method I would use:

data Basic1;

  length Rec $ 1000 Name $ 200 Age 8 Gender $ 1 Email $ 20;

  drop rec;

  infile "...your path...\Address.txt" end=last;

  input ;

  rec = _infile_;

  If index(rec,'Name:')>0 then do;

  Name = strip(scan(rec,2,":"));

  Age = .;

  Gender = "";

  Email = "";

  end;

  else if index(rec, 'Age:')>0 then Age = put(scan(rec,2,":"),8.);

  else if index(rec, 'Gender:')>0 then Gender = strip(scan(rec,2,":"));

  else if index(rec, 'Email:')>0 then Email = strip(scan(rec,2,":"));

  retain Name Age Gender Email;

  if rec = "" or last = 1 then output;

run;

I think this accomplishes what you need.

EJ

View solution in original post


All Replies
Super Contributor
Posts: 334

Re: Missing Data

I believe your problem is your having sas looking for  "Email:"  (input @'Email:'), so for the second obs it jumps to the third observation which is the next occurrence of  that string.

Im sure there is an elegant way in the input statement to handle this, I would probably just read the data into a single variable and parse it with data step logic to get what you want.

EJ

Contributor JAR
Contributor
Posts: 45

Re: Missing Data

Dear EJ,

Thanks for the tip. Before I try to parse, I thought we we could fix it in the input statement itself.

If someone could help me to do the following, it will be great:

If the string "Name" appears before "@" character, email ="".

I searched on Google but failed. Hard to believe that it is a rare problem...Smiley Sad

Regards,

Jijil

Trusted Advisor
Posts: 3,214

Re: Missing Data

Some remarks:

1/ The named list approach would be easier. It is pity you have the : and no = as separation. Perhaps something smart to change : = roles? SAS(R) 9.3 Statements: Reference (list input)

2/ I do not understand your approach.

As you have all:  if not missing(...); statements you are actually coding you do not one record where some variable is missing

You got exactly what you have coded.  2 records where nothing is missing.

The drop record ;  statement is coding to drop the variable record. As not being used yet it creates it and immediate drop it. 

I assume some basic concept of the data-step is failing. SAS data-steps are different to normal record processing of other languages that it does all "next record: processing automatic. You just code what should happen with a current record.

For a smart read process, reading the lines.

- is the blank line always there en indicating en of record?  (name age gender mail)

- is the order Always the same?

- can you intrpreted line/for line the values?

3/ To remember when alle data would be delivered as one record.

SAS(R) 9.3 Statements: Reference (missover  == flowover scanover stopover truncover)

Would suggest missover when records are at the end missing

---->-- ja karman --<-----
Contributor JAR
Contributor
Posts: 45

Re: Missing Data

Dear Mr. Karman,

Thank you.

Here is how I can answer your questions

2.if not missing(..);is not necessary. The actual code is

data Basic1;

      infile "D:\Analytics\Data\Address.txt" ;

       input @'Name:' name : $200. @@;

       input @'Age:' age : 3. @@;

       input @'Gender:' gender : $1. @@;

       input @'Email:' email : $20. @@;

run;

(everything else was sort of my attempts to circumvent this problem. apologies)

- in the raw data, blank line is the end of a record

- order is always the same

- except email, other fields are all mandatory

(I am sorry, I do not know how to respond to other points)

Regards,

Jijil

Attachment
Trusted Advisor
Posts: 3,214

Re: Missing Data

try,

data Basic1;

      infile "D:\Analytics\Data\Address.txt" ;

       input @'Name:' name : $200. /

                    @'Age:' age : 3. /

                    @'Gender:' gender : $1. / tmpvar $1. @ ;

    IF length(tmpvar) > 1 Then input   @'Email:' email : $20. / ;

run;

(code not checked) 

Thinking the approach:

The "/" first 3 records can be processed in one input statement multiple lines at once.

One @ is enough to hold the iteration within this record

It will read the blank/non-blank line.

When blank next iteration....  otherwhise process mail and proceed one line.

This is playing with the line-orders and input.

Exercise to learn inputprocessing.  

---->-- ja karman --<-----
Super Contributor
Super Contributor
Posts: 444

Re: Missing Data

probably Jaap's solution would be the quickest and easiest and quite straightforward

data Basic1;

infile 'C:\Users\Trajce\Contacts\Desktop\test.txt' missover;

input @'Name:' name : $200. /

      @'Age:' age : 3. /

      @'Gender:' gender : $1. /

      @'Email:' email : $20.

      #4;

      proc print;run;

Solution
‎08-16-2013 03:54 PM
Super Contributor
Posts: 334

Re: Missing Data

Curiosity killed the cat with me and I came up with the method I would use:

data Basic1;

  length Rec $ 1000 Name $ 200 Age 8 Gender $ 1 Email $ 20;

  drop rec;

  infile "...your path...\Address.txt" end=last;

  input ;

  rec = _infile_;

  If index(rec,'Name:')>0 then do;

  Name = strip(scan(rec,2,":"));

  Age = .;

  Gender = "";

  Email = "";

  end;

  else if index(rec, 'Age:')>0 then Age = put(scan(rec,2,":"),8.);

  else if index(rec, 'Gender:')>0 then Gender = strip(scan(rec,2,":"));

  else if index(rec, 'Email:')>0 then Email = strip(scan(rec,2,":"));

  retain Name Age Gender Email;

  if rec = "" or last = 1 then output;

run;

I think this accomplishes what you need.

EJ

Contributor JAR
Contributor
Posts: 45

Re: Missing Data

Thank you EJ!

Jijil Ramakrishnan

Contributor
Posts: 30

Re: Missing Data

EJ,

Running yr code for 'Missing Data', We are getting only last observation. Is that our ojective to get. Else, how to get all (3) observations using yr code.? could you please help us.

-Thanks

_Kamal

Super Contributor
Posts: 334

Re: Missing Data

Im afraid I dont understand the problem. The code works with the supplied sample data.

If your data is different then yes you would have to edit the code to allow for the differences. My guess is that you dont have empty rows in your txt file between observations like the sample data does so the output line only occurs on the last line of the input file where end of file flag is set (end = last .... last = 1 then output).

EJ

Super Contributor
Posts: 334

Re: Missing Data

You may want to start a new thread to solve any new issues.

EJ

Contributor
Posts: 30

Re: Missing Data

EJ,

I'm exactly taking about the supplied sample data posted by JAR. I just want to know how to get all 3 observations (one observation with missing email) with your code.

Following is the desired output..

Obs        name    age         gender email

1              Tom       24           M            Tom@test.com

2              Dick       32           M           

3             Harry     30            M          Harry@test.com

=Kamal


Super Contributor
Posts: 334

Re: Missing Data

Im not sure how to respond ... the posted code when used with the sample provides exactly what you show your desired output to be:

10-4-2013 8-06-19 AM.jpg

So Im not sure how to help you.

EJ

Contributor
Posts: 30

Re: Missing Data

EJ,

You are absolutely right, I don't know what went wrong last time when I ran the code . Today , I ran again and is  perfect.Thanks

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 766 views
  • 4 likes
  • 5 in conversation