BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
proc_sortt
Calcite | Level 5

Dear All,

 

I am a bit confused between all the input options since I didn't pratice for a long time.

 

My input text file is of this form (Unix with new line):

 

#Name:some text
some text again
#This is a new line

 

The dataset variable should have this form:

1     Name:some text some text again

2     This is a new line

 

The number of new line is undiffined, it can be 1 or 10, and basically I would like to input everything between the Hashs.

Is there a simple way ? I don't manage to get the correct result!

 

Thank you guys !

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

One way to make it easier is to just make a new file that has the line concatenated.

THen you can read the new file without worrying about the wrapped lines.

 

data _null_;
  infile tmpfile1 ;
  file tmpfile2 ;
  input;
  if _infile_ =: '#' and _n_ > 1 then put ;
  put _infile_ @@ ;
run;

View solution in original post

6 REPLIES 6
ballardw
Super User

Some example data may give us more clues.

 

You may want to examine the _infile_ automatic variable that allows you to look at an input line and parse or manipulate.

If your requirement is to read a varying number of lines that belong on a single record then you'll likely be looking at RETAIN to get the variables on one record across the input lines. If there is something in the first part of the record that would tell you how many lines to read (unlikely but never know) then you may be able to build conditional input statements with the / row pointer to read multiple lines.

Steelers_In_DC
Barite | Level 11

This is specific to your example.  If this doesn't work with your real data you should provide a more specific example:

 

data have;
infile cards dsd;
informat line $25.;
input line;
cards;
# This is line one,
Continue line one,
# This is line two,
Continue line two,
;

data want;
set have;
count + 1;
if count = 3 then count = 1;
l_line = lag(line);
if count = 2 then do;
    wanted = catx(' ',l_line,line);
end;
run;

proc_sortt
Calcite | Level 5

Dear Both,

 

thank you for your response.

 

First I need to specify that I am reading from a flat text file (under UNIX) which contains many new line characters.

Below in the first 4 lines of the file:

 

#Name:dlb.sas Date:20150922.182011 User:kmr Label:Development Version:2 Comment:continued to work
#Name:batches.xlsx Date:20150922.175845 User:gojut Label:Draft Version:378 Comment:JGM(22-09-15)
#Name:Communication_Tracker.xlsx Date:20150922.164528 User:chaa3 Label:Draft Version:16 Comment:comments for DZO
#Name:ches.xlsx Date:20150922.145913 User:goj Label:Draft Version:377 Comment:JGM(22-09-15)
#Name:af.txt Date:20150922.144818 User:mvi1 Label: Attributes: Version:210 Comment:Qc
code for list Added file "mast_qc.sas".
#Name:test.rtf Date:20150922.144806 User:deshmsa7 Label: Attributes:Status Version:12 Comment:to update

 

And what i aim to obtain as a sas dataset is:

 

#Name:dlb.sas Date:20150922.182011 User:kmr Label:Development Version:2 Comment:continued to work 
#Name:batches.xlsx Date:20150922.175845 User:gojut Label:Draft Version:378 Comment:JGM(22-09-15) 
#Name:Communication_Tracker.xlsx Date:20150922.164528 User:chaa3 Label:Draft Version:16 Comment:comments for DZO 
#Name:ches.xlsx Date:20150922.145913 User:goj Label:Draft Version:377 Comment:JGM(22-09-15) 
#Name:af.txt Date:20150922.144818 User:mvi1 Label: Attributes: Version:210 Comment:Qc code for list Added file "mast_qc.sas". 
#Name:test.rtf Date:20150922.144806 User:deshmsa7 Label: Attributes:Status Version:12 Comment:to update

 

I though it was possible to input many lines to one variable and to start a new observation each time '#' is encounter, but apprently it is more complicated than expected !

 

Thanks !!

 

Tom
Super User Tom
Super User

One way to make it easier is to just make a new file that has the line concatenated.

THen you can read the new file without worrying about the wrapped lines.

 

data _null_;
  infile tmpfile1 ;
  file tmpfile2 ;
  input;
  if _infile_ =: '#' and _n_ > 1 then put ;
  put _infile_ @@ ;
run;
Astounding
PROC Star

This is untested, but should work:

 

data want;

length oneline $ 3000;

infile sometext end=done;

input @;

retain oneline;

if _n_=1 then do;

   oneline = _infile_;

   delete;

end;

if _infile_ =: '#' then do;

   output;

   oneline = _infile_;

end;

else oneline = trim(oneline) || ' ' || _infile_;

if done then output;

run;

 

It's a little confusing, but I think I have all the bases covered.

proc_sortt
Calcite | Level 5

Dear Both,

 

Thank you very much for providing simple and elegant solutions.

 

Both solutions work !

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 5678 views
  • 1 like
  • 5 in conversation