DATA Step, Macro, Functions and more

Input text spanning over multiple lines (and between boundaries)

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Input text spanning over multiple lines (and between boundaries)

[ Edited ]

Dear All,

 

I am a bit confused between all the input options since I didn't pratice for a long time.

 

My input text file is of this form (Unix with new line):

 

#Name:some text
some text again
#This is a new line

 

The dataset variable should have this form:

1     Name:some text some text again

2     This is a new line

 

The number of new line is undiffined, it can be 1 or 10, and basically I would like to input everything between the Hashs.

Is there a simple way ? I don't manage to get the correct result!

 

Thank you guys !


Accepted Solutions
Solution
‎09-25-2015 06:23 AM
Super User
Super User
Posts: 7,046

Re: Input text spanning over multiple lines (and between boundaries)

Posted in reply to proc_sortt

One way to make it easier is to just make a new file that has the line concatenated.

THen you can read the new file without worrying about the wrapped lines.

 

data _null_;
  infile tmpfile1 ;
  file tmpfile2 ;
  input;
  if _infile_ =: '#' and _n_ > 1 then put ;
  put _infile_ @@ ;
run;

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Input text spanning over multiple lines (and between boundaries)

Posted in reply to proc_sortt

Some example data may give us more clues.

 

You may want to examine the _infile_ automatic variable that allows you to look at an input line and parse or manipulate.

If your requirement is to read a varying number of lines that belong on a single record then you'll likely be looking at RETAIN to get the variables on one record across the input lines. If there is something in the first part of the record that would tell you how many lines to read (unlikely but never know) then you may be able to build conditional input statements with the / row pointer to read multiple lines.

Valued Guide
Posts: 860

Re: Input text spanning over multiple lines (and between boundaries)

Posted in reply to proc_sortt

This is specific to your example.  If this doesn't work with your real data you should provide a more specific example:

 

data have;
infile cards dsd;
informat line $25.;
input line;
cards;
# This is line one,
Continue line one,
# This is line two,
Continue line two,
;

data want;
set have;
count + 1;
if count = 3 then count = 1;
l_line = lag(line);
if count = 2 then do;
    wanted = catx(' ',l_line,line);
end;
run;

Occasional Contributor
Posts: 6

Re: Input text spanning over multiple lines (and between boundaries)

[ Edited ]
Posted in reply to proc_sortt

Dear Both,

 

thank you for your response.

 

First I need to specify that I am reading from a flat text file (under UNIX) which contains many new line characters.

Below in the first 4 lines of the file:

 

#Name:dlb.sas Date:20150922.182011 User:kmr Label:Development Version:2 Comment:continued to work
#Name:batches.xlsx Date:20150922.175845 User:gojut Label:Draft Version:378 Comment:JGM(22-09-15)
#Name:Communication_Tracker.xlsx Date:20150922.164528 User:chaa3 Label:Draft Version:16 Comment:comments for DZO
#Name:ches.xlsx Date:20150922.145913 User:goj Label:Draft Version:377 Comment:JGM(22-09-15)
#Name:af.txt Date:20150922.144818 User:mvi1 Label: Attributes: Version:210 Comment:Qc
code for list Added file "mast_qc.sas".
#Name:test.rtf Date:20150922.144806 User:deshmsa7 Label: Attributes:Status Version:12 Comment:to update

 

And what i aim to obtain as a sas dataset is:

 

#Name:dlb.sas Date:20150922.182011 User:kmr Label:Development Version:2 Comment:continued to work 
#Name:batches.xlsx Date:20150922.175845 User:gojut Label:Draft Version:378 Comment:JGM(22-09-15) 
#Name:Communication_Tracker.xlsx Date:20150922.164528 User:chaa3 Label:Draft Version:16 Comment:comments for DZO 
#Name:ches.xlsx Date:20150922.145913 User:goj Label:Draft Version:377 Comment:JGM(22-09-15) 
#Name:af.txt Date:20150922.144818 User:mvi1 Label: Attributes: Version:210 Comment:Qc code for list Added file "mast_qc.sas". 
#Name:test.rtf Date:20150922.144806 User:deshmsa7 Label: Attributes:Status Version:12 Comment:to update

 

I though it was possible to input many lines to one variable and to start a new observation each time '#' is encounter, but apprently it is more complicated than expected !

 

Thanks !!

 

Solution
‎09-25-2015 06:23 AM
Super User
Super User
Posts: 7,046

Re: Input text spanning over multiple lines (and between boundaries)

Posted in reply to proc_sortt

One way to make it easier is to just make a new file that has the line concatenated.

THen you can read the new file without worrying about the wrapped lines.

 

data _null_;
  infile tmpfile1 ;
  file tmpfile2 ;
  input;
  if _infile_ =: '#' and _n_ > 1 then put ;
  put _infile_ @@ ;
run;
Super User
Posts: 5,504

Re: Input text spanning over multiple lines (and between boundaries)

Posted in reply to proc_sortt

This is untested, but should work:

 

data want;

length oneline $ 3000;

infile sometext end=done;

input @;

retain oneline;

if _n_=1 then do;

   oneline = _infile_;

   delete;

end;

if _infile_ =: '#' then do;

   output;

   oneline = _infile_;

end;

else oneline = trim(oneline) || ' ' || _infile_;

if done then output;

run;

 

It's a little confusing, but I think I have all the bases covered.

Occasional Contributor
Posts: 6

Re: Input text spanning over multiple lines (and between boundaries)

Posted in reply to proc_sortt

Dear Both,

 

Thank you very much for providing simple and elegant solutions.

 

Both solutions work !

 

 

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 383 views
  • 1 like
  • 5 in conversation