DATA Step, Macro, Functions and more

How to read multiple lines of data in a single variable.

Reply
Occasional Contributor
Posts: 13

How to read multiple lines of data in a single variable.

Hi,

 

I am trying to read a file which hasa variable at the end of the record. This variable's value may be in the same record or may go upto multiple line as well. I tried many options but not able to read.

 

Attached is the sample data and variable names.

Kindly help.

 

Thanks

Lokesh

Valued Guide
Posts: 797

Re: How to read multiple lines of data in a single variable.

I'm not aware of a way to make proc import read multiple lines as a single observation.  So what you want do is make the multiple lines into single lines, when needed - then run proc import.  For your '~' separated text file, assume you know in advance that you have 26 data fields (i.e. 25 '~') per record, and that no complete record needs more than 500 characters:

 

filename tmp temp;
data _null_;
  infile 'c:\temp\t.tilde';
  length outtxt $500;
  do i=1 by 1 while (countc(outtxt,'~')<25);
    input;
    outtxt=catx(' ',outtxt,_infile_);
  end;
  file tmp;
  put outtxt;
run;

proc import datafile=tmp
  out=want   dbms=dlm   replace;
  delimiter='~';
  getnames=yes;
run;

 

Notes:

  1. Then "filename tmp temp" uses the TEMP location, which means that SAS will find a physical location for the data it will receive, and will also delete the file at the end of your sas session.
  2. OUTTXT is going to contain 1 (or more) concatenated input records until 25 '~' are held.
  3. When an 'input' statement is executed, the automatic variable _INFILE_ contains the input line, regardless of whether the input statement names any variables.
  4. COUNTC function count the number of occurrence of the second argument ('~') that are found in the first argument
  5. CATX concatenate the 2nd and 3rd (and 4th, 5th, ...) arguments separated by the first argument.
  6. FILE tmp provide a destination for the PUT statement.

 

If you tried the proc import against the original file, you'd have problems with multiline records. 

Occasional Contributor
Posts: 13

Re: How to read multiple lines of data in a single variable.

Hi,

 

Thanks a lot for your help.

 

It given me a correct output however i am having little hard time in understanding the do loop logic in your code. Can you please explain it in detail for my understanding.

 

Like how this is working with _infile_

 

Thanks

Trusted Advisor
Posts: 1,398

Re: How to read multiple lines of data in a single variable.

Here is a tested code, less elegant than @mkeintz but it gives the desired result:

filename text_in '/folders/myshortcuts/My_Folders/flat/sample_Survey.txt';

data text;
   retain confirmation_number first_name last_name 
          exam_code form_code site_code testdate
          v1-v9 survey01-survey09 obs;
   length survey09 $200;
   format testdate date9.;
            
   infile text_in truncover firstobs=2 dlm='~' dsd eof=done;
   input confirmation $16. @;  
   obs+1; 
   if input(confirmation,?? 16.) > 0 then do;
       if obs > 1 then output;
   
       input @1 confirmation_number first_name $ last_name $
             exam_code $ form_code $ site_code $ testdate mmddyy10. @+1
             v1 $ v2 $ v3 $ v4 $ v5 $ v6 $ v7 $ v8 $ v9 $ 
             survey01 $ survey02 $ survey03 $ survey04 $ survey05 $
             survey06 $ survey07 $ survey08 $ survey09 $
       ; 
   end; 
   else do;
       input @1 a_line $200.;
       survey09 = catx(' ',survey09,scan(a_line,1,'~'));   
   end;
   drop obs confirmation a_line;
return;
done:
   output;
run;
Trusted Advisor
Posts: 1,398

Re: How to read multiple lines of data in a single variable.

I assumed that the extra lines are always the continuation of survey09.
Valued Guide
Posts: 797

Re: How to read multiple lines of data in a single variable.

This do loop works as follows:

 

 

  do i=1 by 1 while (countc(outtxt,'~')<25);
    input;
    outtxt=catx(' ',outtxt,_infile_);
  end;

 

  1. The "do i=1 by 1 while (countc(outtxt,'~')<25 says to start doing doing something and keep doing the same thing (i.e. looping) as long as there less than 25 '~' in the outtxt variable. And what is the "thing" being repeatedly done?

    First, since every time the top of the data step starts, outtxt is reset to missing, it starts out with 0 '~'.  So the do-loop is going to be iterated as least once.  Inside the loop it does the following:
        - INPUT;     /* reads a line, but populates no variable except automatic variable _INFILE_ which gets a direct copy of the input line.
       - CATX:  Concatenates the content of OUTTXT (which starts out blank) with the contents of _INFILE_.  For the first iteration, it's just copying _INFILE_ to OUTTXT.  For all later iterations, it's appending  _INFILE_ to the non-empty OUTTXT, separating them by a blank)

    So this loop keeps extending the content of OUTTXT

       - go back to top of the loop and rechecks the count of '~' in outtxt.  Once it stops being <25 the looping stops and the subsequent statements are executed:

  2. After the loop, the contents of OUTTXT is written to the target of the PUT statement.  That target is the destination identified by the FILE statement.
Ask a Question
Discussion stats
  • 5 replies
  • 534 views
  • 0 likes
  • 3 in conversation