DATA Step, Macro, Functions and more

How to deal with missing records in raw data when using line pointer controls?

Reply
Occasional Contributor
Posts: 6

How to deal with missing records in raw data when using line pointer controls?

Dears, 

 

I'm currently studying for the Base programming Sas certification and as I try to always go a little further on each issue, i got stuck at "Chapter 19 - Creating a Single Observation from Multiple Records". Indeed, it is explained that in order to use Line Pointer controls to read raw datasets, "the raw data file must contain the same number of records for each observation" (which make perfect sense). But no explanation is given on how to deal with missing records. Only this sentence : "For more information about working with raw data files that contain missing records, see the SAS documentation."

So I searched through the SAS documentation/the web, but couldn't find any solution. This is why I'm posting here my question : How can I manage raw datasets when observations are divided on multiple records and there are missing records? Should I use other controls than line pointers? Or maybe this is not possible through the use of pointer controls and I need to apply more "complex" methods to this kind of raw dataset?

Here is the example raw dataset used in the book. 

LEE ATHNOS  
1215 RAINTREE CIRCLE 
PHOENIX  AZ 85044  
HEIDIE BAKER  
1751 DIEHL ROAD  
MYRON BARKER 
131 DONERAIL DRIVE 
ATLANTA  GA 30363

 

First observation is divided on 3 records, same for the third, but the second has a missing record. How can i deal with it?

Thank you for any enlightenment you could give me on this.

Super User
Super User
Posts: 9,397

Re: How to deal with missing records in raw data when using line pointer controls?

This is more a theory question than a coding one.  How do you know (well more how would tell someone else) that the third row is missing for the second observation, and it is not MYRON BARKER?  If it was just those three then a simple if check would get you the result:

data want;
  infile "c:\abc.txt";
  length name addr city $200;
  retain name addr city;
  if _n_=1 then i=1;
  input;
  select(i);
    when(1) do;
      name=_input_;
      i=2;
      end;
    when(2) do;
      addr=_input_;
      i=3;
    when(3) do;
      if _input_ in ("PHOENIX","ATLANTA") then do;
        city=_input_;
        output;
        i=1;
      end;
      else do;
        output;
        name=_input_;
        i=2;
      end;
    otherwise;
  end;
run;
  
  
  
Super User
Posts: 6,626

Re: How to deal with missing records in raw data when using line pointer controls?

The programming is tedious, and beyond what I have patience for.  But the theory isn't too bad.  For example:

 

  • Read a line of data and save it in a fourth variable that you won't keep.
  • Inspect the line of data, and see where the first digit appears.  (FINDC function?)
  • If there are no digits, it's a name.  If the first digit is in column 1, it's a street address.  If it's toward the end of the line, it's a zipcode.
  • Proceed accordingly (trickier than it sounds).  You need to copy this fourth variable into the proper variable that will be kept, output appropriately (including detecting when you have read the last line of data in the file).

It's entirely possible that you need to retain the three variables that you want, and wipe out their values after outputting.  It's much simpler if you are certain that the middle line of data is the only one that might be missing.  I would suggest that this assumption is sufficient for "taking it a step further".

Super User
Posts: 13,293

Re: How to deal with missing records in raw data when using line pointer controls?

In my personal opinion any multiline record format that does not include a line role identifier is asking for problems. And yes I've had to deal with more than a few of those.

Example:

Name: LEE ATHNOS  
Street: 1215 RAINTREE CIRCLE 
City: PHOENIX  AZ 85044  
Name: HEIDIE BAKER  
Street: 1751 DIEHL ROAD  
Name: MYRON BARKER 
Street: 131 DONERAIL DRIVE 
City: ATLANTA  GA 30363

 

Then you have a consistent string to look for and allows cleaner reading code.

Of course the original problem will be traced back to data entry and the format used to collect the information and/or create the file you use. If this type of data is too be handled frequently it will be worth the effort to go back in the chain and fix it. If you are contracting work based on someone else's data then showing them the $$$ impact of you having to fix their process may get some attention.

Occasional Contributor
Posts: 6

Re: How to deal with missing records in raw data when using line pointer controls?

Thank you all for your advices. I suspected there will be no global solution and that it will more likely be a case by case assessment. I guess I was misleaded by what is written in the "SAS Cert Prep Guide - Base programming" book, suggesting that if I searched thoroughly the SAS documentation, I would find some indication on how to process raw data files with missing records. I mean, there aren't several ways to understand "For more information about working with raw data files that contains missing records, see the SAS documentation". 

 

Again, thank you all for your replies.

Ask a Question
Discussion stats
  • 4 replies
  • 158 views
  • 0 likes
  • 4 in conversation