DATA Step, Macro, Functions and more

Reading data contained in multiple lines from text file

Reply
Contributor SJN
Contributor
Posts: 21

Reading data contained in multiple lines from text file

[ Edited ]
data INPUT;
infile DATALINES DLM="|" missover;
DATALINES;
Name|Vidhya
Phone1|123456
Phone2|987655
Name|Shree
Phone1|345678
Name|Jagan
Phone1|5435678
Phone2|5431232
Name|Sharma
Phone1|8934751
;
run;

 

looking for output 

name|phone1|phone2

Super User
Super User
Posts: 6,499

Re: Reading data contained in multiple lines from text file

[ Edited ]

In general read it in as NAME/VALUE pairs and use PROC TRANSPOSE to convert from multiple rows to multiple columns.

You will want to generate a new variable to identify which blocks of rows in the original are to be grouped together into a new row.

 

data tall ;
  infile DATALINES DLM="|" DSD TRUNCOVER ;
  length name $32 value $200 ;
  input name value;
  if upcase(name)='NAME' then id+1;
DATALINES;
Name|Vidhya
Phone1|123456
Phone2|987655
Name|Shree
Phone1|345678
Name|Jagan
Phone1|5435678
Phone2|5431232
Name|Sharma
Phone1|8934751
;
run;
proc transpose data=tall out=want (drop=_:) ;
  by id;
  id name;
  var value;
run;

 

Contributor SJN
Contributor
Posts: 21

Re: Reading data contained in multiple lines from text file

[ Edited ]

Thank you for your reply. I'm intrested in datastep method.

Super User
Super User
Posts: 6,499

Re: Reading data contained in multiple lines from text file

Why should it be in a single data step instead taking advantage of PROC TRANSPOSE?

 

You can do the same thing in a data step, but it takes a lot more code.  You will however get more control over the definition of the individual variables.

Trusted Advisor
Posts: 1,115

Re: Reading data contained in multiple lines from text file

[ Edited ]

Hi @SJN,

 

You could replace the PROC TRANSPOSE step by the following data step:

data want;
array var $200 name phone1 phone2;
do until(last.id);
  set tall(drop=name); /* or rename NAME to _NAME if you want to check its values instead of relying on counter I */
  by id;
  i=sum(i,1);
  var[i]=value;
end;
drop i: v:;
run;

Please note that PROC TRANSPOSE, unlike the data step, would handle the case of more than two phone numbers without any change.

Super User
Posts: 5,081

Re: Reading data contained in multiple lines from text file

[ Edited ]

A DATA step approach:

 

data want;

infile datalines dlm="|" missover end=done;

length name $ 20 phone1 phone2 $ 15;

retain name phone1 phone2;

input identifier $ @;

if identifier = 'Name' then do;

   if _n_ > 1 then output;

   input name;

   phone1=' ';

   phone2=' ';

end;

else if identifier = 'Phone1' then input phone1;

else if identifier = 'Phone2' then input phone2;

if done then output;

datalines;

...

;

 

It's untested.  The only piece I'm not sure about is whether END= works in combination with DATALINES but it should otherwise be OK.

Trusted Advisor
Posts: 1,115

Re: Reading data contained in multiple lines from text file

Great, so everything is in one data step.

 

END= does not work with DATALINES (I wasn't sure either), but EOF= can be used instead. The latter requires a statement label, though. Hence, the line "if done then output;" could be replaced by 

return;
done: output;
drop identifier;

to make it work.

Ask a Question
Discussion stats
  • 6 replies
  • 348 views
  • 3 likes
  • 4 in conversation