DATA Step, Macro, Functions and more

Read multiple lines as part of a single record

Reply
Occasional Contributor
Posts: 10

Read multiple lines as part of a single record

I have the following data in a tab delimited .dat file:

 

1   cord   BAD APPLES   bad apples are terrible for your health
2   logo   BAD APPLES   bad apples are terrible for
your health
3   cord   GOOD APPLES   bad apples are terrible for your health
4   brat    BAD APPLES   bad apples are terrible for
your health
5   brat    BAD APPLES   bad apples are terrible for your health
6   cord   BAD MANGOES   bad apples are terrible for your health
7   logo   BAD APPLES   bad apples are terrible for your health
8   cord   BAD APPLES   bad apples are terrible for your health


I want to read this data into SAS with the following variable names:
study_id type fruit_type description

Note that in some cases, an observation extends beyond on line (one record). The main problem i want to solve is how to tell SAS that when it encounters a point where the contents of the variable, "description" extends beyond one line, it should read the next line as part of the description variable.

I will highly appreciate a help on this.

Thank you.

 

Super User
Posts: 10,538

Re: Read multiple lines as part of a single record

Are there actual line breaks such as carriage return or line feed such as from using an "Enter" key when typing? or is line just wrapping due to line length?

 

Or is this coming from Excel where people were entering data with  the Alt-Enter to make multiline entries?

 

Is there any notable pattern involved?

Occasional Contributor
Posts: 10

Re: Read multiple lines as part of a single record

This is coming from a log file with no text wrapping. One notable pattern is that when ever the description field extends beyond one line, the next line starts with the string "PCP". The next line should have started with a number. So what SAS does everytime it encounter this line break is it sets the ID (which supposed to be a numerical variable to missing. 

 

Super User
Posts: 9,687

Re: Read multiple lines as part of a single record

[ Edited ]

Once you got table WANT,it is easy for you to get final result.

 (Edited)

 

data temp;
 infile 'c:\temp\x.txt' length=len lrecl=32767;
 input x $varying32767. len;
 pid=prxparse('/^\d+\s+/');
 if prxmatch(pid,x) then group+1;
run;

data want;
length want $ 400;
 do i=1 by 1 until(last.group);
  set temp;
  by group;
  want=cats(want,x);
 end;
 keep want;
run;
Occasional Contributor
Posts: 10

Re: Read multiple lines as part of a single record

Thanks in the context of my data, what will the variables, cats, x,  and want be?

Super User
Posts: 9,687

Re: Read multiple lines as part of a single record

Get it from varaible WANT.

 

study_id=scan(want,1,'09'x);

fruit_type=scan(want,2,'09'x);

..............

Occasional Contributor
Posts: 10

Re: Read multiple lines as part of a single record

Another issue is that i dont understand what is the relationship between the data temp, want, and my data. How will i link them?

 

Super User
Posts: 9,687

Re: Read multiple lines as part of a single record


TEMP contain the data from your txt file and make a group variable.

WANT is aligning these wrong rows and get the final result you need.
Frequent Contributor
Posts: 116

Re: Read multiple lines as part of a single record

Try this

 

data temp;
input study_id$ type$ frut_type$ description &$ 100.;
datalines;
cord BAD APPLES bad apples are terrible for your health
logo BAD APPLES bad apples are terrible for your health
cord GOOD APPLES bad apples are terrible for your health
brat BAD APPLES bad apples are terrible for your health
brat BAD APPLES bad apples are terrible for your health
cord BAD MANGOES bad apples are terrible for your health
logo BAD APPLES bad apples are terrible for your health
cord BAD APPLES bad apples are terrible for your health
;
run;

Respected Advisor
Posts: 3,901

Re: Read multiple lines as part of a single record

[ Edited ]

The following should do the job. You will have to point your infile statement to your external file and use as delimiter a tab instead of a comma - but else the code should pretty much do the job as posted.

I've used as test condition whether it's a new record of a flowover from the previous record, if there is a digit on position one of the record. You can of course also use some other test if that's not sufficient for your data (eg: is there a TAB in the input string?).

 

data sample;
  infile datalines dlm=',' truncover;

  /* read a new line of data and map against variables */
  input @1 study_id type:$10. fruit_type:$20. description:$100.;

  /* read the next line of data. Hold the input pointer on the current line */
  input @@;

  /* test if new observation or flowover from previous record           */
  /* test condition: first non blank character in row is not a digit    */
  _test= notdigit(substrn(left(_infile_),1,1));

  /* if it's a flowover then concatenate the remainder string to the description variable */
  /*   and then set input pointer to next row of source data */
  if _test then
    do;
      description=catx(' ',description,_infile_);
      input;
    end;

  datalines;
1,cord,BAD APPLES,bad apples are terrible for your health
2,logo,BAD APPLES,bad apples are terrible for
your health
3,cord,GOOD APPLES,bad apples are terrible for your health
4,brat,BAD APPLES,bad apples are terrible for
your health
5,brat,BAD APPLES,bad apples are terrible for your health
6,cord,BAD MANGOES,bad apples are terrible for your health
7,logo,BAD APPLES,bad apples are terrible for your health
8,cord,BAD APPLES,bad apples are terrible for your health
;
run;

input @@;

1. Read a new line of data into the input buffer

2. _infile_ that's an automatic variable which gives us access to the input buffer

3. @@ holds the input pointer on the current line even over the iteration of a data step

    - gets only released by another input statement. That's why this empty input statement is required if the test is true.

 

http://support.sas.com/documentation/cdl/en/lestmtsref/69738/HTML/default/viewer.htm#n0oaql83drile0n...

Ask a Question
Discussion stats
  • 9 replies
  • 319 views
  • 0 likes
  • 5 in conversation