BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AudeP
Fluorite | Level 6

Dears, 

 

I'm currently studying for the Base programming Sas certification and as I try to always go a little further on each issue, i got stuck at "Chapter 19 - Creating a Single Observation from Multiple Records". Indeed, it is explained that in order to use Line Pointer controls to read raw datasets, "the raw data file must contain the same number of records for each observation" (which make perfect sense). But no explanation is given on how to deal with missing records. Only this sentence : "For more information about working with raw data files that contain missing records, see the SAS documentation."

So I searched through the SAS documentation/the web, but couldn't find any solution. This is why I'm posting here my question : How can I manage raw datasets when observations are divided on multiple records and there are missing records? Should I use other controls than line pointers? Or maybe this is not possible through the use of pointer controls and I need to apply more "complex" methods to this kind of raw dataset?

Here is the example raw dataset used in the book. 

LEE ATHNOS  
1215 RAINTREE CIRCLE 
PHOENIX  AZ 85044  
HEIDIE BAKER  
1751 DIEHL ROAD  
MYRON BARKER 
131 DONERAIL DRIVE 
ATLANTA  GA 30363

 

First observation is divided on 3 records, same for the third, but the second has a missing record. How can i deal with it?

Thank you for any enlightenment you could give me on this.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

The programming is tedious, and beyond what I have patience for.  But the theory isn't too bad.  For example:

 

  • Read a line of data and save it in a fourth variable that you won't keep.
  • Inspect the line of data, and see where the first digit appears.  (FINDC function?)
  • If there are no digits, it's a name.  If the first digit is in column 1, it's a street address.  If it's toward the end of the line, it's a zipcode.
  • Proceed accordingly (trickier than it sounds).  You need to copy this fourth variable into the proper variable that will be kept, output appropriately (including detecting when you have read the last line of data in the file).

It's entirely possible that you need to retain the three variables that you want, and wipe out their values after outputting.  It's much simpler if you are certain that the middle line of data is the only one that might be missing.  I would suggest that this assumption is sufficient for "taking it a step further".

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

This is more a theory question than a coding one.  How do you know (well more how would tell someone else) that the third row is missing for the second observation, and it is not MYRON BARKER?  If it was just those three then a simple if check would get you the result:

data want;
  infile "c:\abc.txt";
  length name addr city $200;
  retain name addr city;
  if _n_=1 then i=1;
  input;
  select(i);
    when(1) do;
      name=_input_;
      i=2;
      end;
    when(2) do;
      addr=_input_;
      i=3;
    when(3) do;
      if _input_ in ("PHOENIX","ATLANTA") then do;
        city=_input_;
        output;
        i=1;
      end;
      else do;
        output;
        name=_input_;
        i=2;
      end;
    otherwise;
  end;
run;
  
  
  
Astounding
PROC Star

The programming is tedious, and beyond what I have patience for.  But the theory isn't too bad.  For example:

 

  • Read a line of data and save it in a fourth variable that you won't keep.
  • Inspect the line of data, and see where the first digit appears.  (FINDC function?)
  • If there are no digits, it's a name.  If the first digit is in column 1, it's a street address.  If it's toward the end of the line, it's a zipcode.
  • Proceed accordingly (trickier than it sounds).  You need to copy this fourth variable into the proper variable that will be kept, output appropriately (including detecting when you have read the last line of data in the file).

It's entirely possible that you need to retain the three variables that you want, and wipe out their values after outputting.  It's much simpler if you are certain that the middle line of data is the only one that might be missing.  I would suggest that this assumption is sufficient for "taking it a step further".

ballardw
Super User

In my personal opinion any multiline record format that does not include a line role identifier is asking for problems. And yes I've had to deal with more than a few of those.

Example:

Name: LEE ATHNOS  
Street: 1215 RAINTREE CIRCLE 
City: PHOENIX  AZ 85044  
Name: HEIDIE BAKER  
Street: 1751 DIEHL ROAD  
Name: MYRON BARKER 
Street: 131 DONERAIL DRIVE 
City: ATLANTA  GA 30363

 

Then you have a consistent string to look for and allows cleaner reading code.

Of course the original problem will be traced back to data entry and the format used to collect the information and/or create the file you use. If this type of data is too be handled frequently it will be worth the effort to go back in the chain and fix it. If you are contracting work based on someone else's data then showing them the $$$ impact of you having to fix their process may get some attention.

AudeP
Fluorite | Level 6

Thank you all for your advices. I suspected there will be no global solution and that it will more likely be a case by case assessment. I guess I was misleaded by what is written in the "SAS Cert Prep Guide - Base programming" book, suggesting that if I searched thoroughly the SAS documentation, I would find some indication on how to process raw data files with missing records. I mean, there aren't several ways to understand "For more information about working with raw data files that contains missing records, see the SAS documentation". 

 

Again, thank you all for your replies.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 919 views
  • 0 likes
  • 4 in conversation