Help using Base SAS procedures

Input Statement where the number of cards varies for each observation?

Reply
Contributor
Posts: 25

Input Statement where the number of cards varies for each observation?

Where to start?

Input is a single text file with an unknown number of observations.

each obs has 6 header cards,

followed by a variable number of data cards,

Then that repeats for many obs.

The format for the header cards (colon delimited) is different from the data cards (comma, delimited).

I need to input a card, see what card it is to extract the needed data.

Output 1 obs per data card that also includes any header data.

input file:

Report for: name is here

subject name: name

serial number: 23456789

Algorithm: john or Jane

blank line

list of column names, csv style

data (csv style)

data (csv style)

data (csv style) varable number of data cards

REPEAT THE BLOCK

Super User
Posts: 10,511

Re: Input Statement where the number of cards varies for each observation?

Something like this may get you started;

data want;

     infile <all of your infile information goes here include END=Fileend>;

     length ReportName SubjectName $ 50 SerialNumber $ 9 Algorithm $ 50; /* since serial numbers usually aren't used in computations I treat them as character, also helps when you get letters or - / and such*/

     retain  ReportName SubjectName SerialNumber Algorithm ; /* so the header information is with each record*/

/* _infile_ below is the input buffer, you can examine it before input*/

     do until (index(_infile_,"Report for") > 0);

          input;

     end;  /* starts reading until it finds the first line of your record now you should be able to read the header info NOTE: THE STRINGS FOLLOWING @ BELOW MUST BE EXACTLY AS THEY APPEAR IN THE DATA FOR CASE AND SPACES BETWEEN WORDS*/

     input @'Report for: ' ReportName $50. /

               @'subject name: '; SubjectName $50. /

               @'serial number: ' SerialNumber $9. /

               @'Algorithm: ' Algorithm $50. //;

/* your csv data should start here*/

     do while (index(_infile_,"Report for:") = 0 and not fileend) ;

          < your CSV input statement goes here>;

          output;

     end;

     ;

run;

Contributor
Posts: 25

Re: Input Statement where the number of cards varies for each observation?

Thanks for your post.  It has some very interesting items.  I especially like the INPUT @'string' that searches and inputs at the same time.

This is what I was working on in the mean time:

DATA sleep2;

  INFILE 'M:\temp\all.dat' DLM=',' DSD LRECL=256 PAD MISSOVER;

    LENGTH SleepAlgorithm $ 12;

  INFORMAT InBedDate OutBedDate Onsetdate MMDDYY10.;

    FORMAT InBedDate OutBedDate Onsetdate MMDDYY10.;

    LENGTH InBedTime OutBedTime OnsetTime $ 10;

    RETAIN line line1-line6;

  INPUT line $ 1-256 @;   *** input card and hold in buffer ***;

  *** header lines get re-read here ***;

  IF INDEX(line, "Sleep Report for:")>0 THEN INPUT line1 $ 1-256;

  IF INDEX(line, "Subject Name:")>0     THEN INPUT line2 $ 1-256;

  IF INDEX(line, "Serial Number:")>0    THEN INPUT line3 $ 1-256;

  IF INDEX(line, "Sleep Algorithm:")>0  THEN INPUT line4 $ 1-256;

  IF line=""                            THEN INPUT line5 $ 1-256;   * blank line*;

  IF INDEX(line, "In Bed Date")>0       THEN INPUT line6 $ 1-256;  * column names line *;

  *** data lines get re-read here ***;

  IF INDEX(line, "Cole-Kripke,")>0  OR INDEX(line,"Sadeh,")>0 THEN DO;

    INPUT @1 SleepAlgorithm InBedDate InBedTime OutBedDate OutBedTime OnsetDate OnsetTime Latency TotalCounts Efficiency TotalMinutesinBed TotalSleepTime WakeAfterSleepOnset NumberofAwakenings AverageAwakeningLength;

    OUTPUT;

  END;

RUN;

PROC PRINT;

RUN

The final hang up was how SAS holds lines.  I finally discovered the column pointer was at the end of the held line and my primary data Input statement was trying to read starting at the end which resulted in all missing data (with the correct number of output obs). thanks to my using the PAD and MISSOVER INFILE options.

Using the INFILE COLUMN= and LINE= options and PUT (to the log) help me track this problem down.

Adding the @1 resets the column pointer in the current buffer line.

The earlier INPUT line(s) used specified columns so they didn't rely on the column pointer.

My method means reading each line twice so it's probably not as fast but this file is relatively small.

Thanks again,

Rick

Ask a Question
Discussion stats
  • 2 replies
  • 256 views
  • 3 likes
  • 2 in conversation