BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
leehsin
Quartz | Level 8

I have a .txt data file like this:

Title
desc1
date
name type bday
alan card 11/15/1950
bob e-card 08/23/1951

I want to read into SAS ignoring the line1-3 and get the data like this:

name type bday
alan card 11/15/1950
bob e-card 08/23/1951

I tried proc import or infile, but none of them succeeded.

proc import datafile = 'text.txt'
 out = data1
 dbms = dlm replace;
 delimiter = ' ';
 getnames= yes;
 namerow= 4;
 datarow= 5;
run;

data data2;
	infile 'text.txt' dsd dlm=' ' truncover firstobs=4;
run;

I was not able to figure out how to assign the starting row to read, and at the same time how to get the column names.

 

Thanks ahead for any suggestions!

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

filename D "%sysfunc(pathname(WORK))/t.txt";

%* Create data file;
data  _null_;
   file D;
   put 'Title'
     / 'desc1'
     / 'date'
     / 'name type bday'
     / 'alan card 11/15/1950'
     / 'bob e-card 08/23/1951';
run;

%*  Read variable names and create renaming statement;
data _null_;
  infile D firstobs=4;
  input;
  length RENAMELIST $4000;
  do I= 1 to countw(_INFILE_);
     RENAMELIST=catx(' ',RENAMELIST,(catt('VAR',I,'=',scan(_INFILE_,I))));
  end;
  call symput('renamelist',RENAMELIST);
  stop;
run;

%* Read data;
proc import datafile = D
            out      = WANT
            dbms     = dlm 
            replace;
 delimiter = ' ';
 getnames  = no;
 datarow   = 5;
run;

%* Rename columns;
proc datasets nolist;
  modify WANT ;
  rename &renamelist.;
quit;
 
name type bday
alan card 11/15/1950
bob e-card 08/23/1951

 

View solution in original post

18 REPLIES 18
Tom
Super User Tom
Super User

Why are you asking SAS to guess how to read a file that only has three variables?

data want ;
  infile cards dsd dlm=' ' truncover ;
  length name $20 type $10 bday 8 ;
  if _n_= 1 then input title $200. / desc $200. / date :mmddyy. / ;
  input name type bday :mmddyy.;
  format bday date date9.;
  retain title desc date;
cards;
BDAY CARDS
bla bla
03/20/2020
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
;
Obs    name     type       bday         title        desc        date

 1     alan    card      15NOV1950    BDAY CARDS    bla bla    20MAR2020
 2     bob     e-card    23AUG1951    BDAY CARDS    bla bla    20MAR2020
leehsin
Quartz | Level 8
Thanks.

I don't need the top three lines. And I actually have a data set with hundreds of columns, so I want to avoid using inputting column names manually. The best way is just to read the data existed in the .txt file.
Tom
Super User Tom
Super User

Not sure a TEXT file is the best format to store data if you don't already know what data types the text is supposed to represent.  But if you want to let PROC IMPORT guess at how to define the variables then just tell it NOT to read the names.  

proc import datafile=text dbms=dlm out=want replace ;
  delimiter=' ';
  datarow=5;
  getnames=NO;
run;

You can always read the names yourself and then use them to rename the variables.

proc import datafile=text dbms=dlm out=headers replace;
  delimiter=' ';
  datarow=4;
  getnames=NO;
run;
proc transpose data=headers(obs=1) out=names ;
  var _all_;
run;
filename code temp;
data _null_;
 set names end=eof;
 file code;
 if _n_=1 then put 'rename ';
 put _name_ '=' col1 :$quote. +(-1) 'n' ;
 if eof then put ';';
run;
proc datasets nolist lib=work;
modify want ;
%include code / source2;
run;
quit;
Obs    name     type        bday

 1     alan    card      11/15/1950
 2     bob     e-card    08/23/1951
leehsin
Quartz | Level 8

The .txt file is my source data and actually contains large amount of columns and rows. It has headers (title, descriptions, and even tail at the bottom of the data body). There are many .txt files to be imported. To manually input the names of column is not practical. The example I presented here is just a simplified one that represent the structure of .txt data file. If a program can import this example .txt file and output the SAS data that I want, my mission is accomplished. For now, I have to manually read these .txt files using Excel, delete the unwanted parts, and save it into a csv file,  then these .csv files are subjected for SAS to import. 

 

Thanks for all the suggestions and helps!

Reeza
Super User

There were several solutions posted that do not require manual intervention. Did none of them work for you at all?
I tested mine with your exact posted and it imported the file exactly as requested, though it's hard coded to ignore the first 4 lines in the header. It's possible to make that dynamic, if the number of lines to ignore changes, but since you were providing 4/5 to the PROC IMPORT I assumed that was fixed parameters. It's easy enough to make it a macro as well to pass the file name and the number of header lines if that's another approach. 

 

 

 


@leehsin wrote:

The .txt file is my source data and actually contains large amount of columns and rows. It has headers (title, descriptions, and even tail at the bottom of the data body). There are many .txt files to be imported. To manually input the names of column is not practical. The example I presented here is just a simplified one that represent the structure of .txt data file. If a program can import this example .txt file and output the SAS data that I want, my mission is accomplished. For now, I have to manually read these .txt files using Excel, delete the unwanted parts, and save it into a csv file,  then these .csv files are subjected for SAS to import. 

 

Thanks for all the suggestions and helps!


 

ballardw
Super User

Proc Import will ONLY get the variable names if they are the first row (at least in the current incarnation).

If there is only one block of the 3 lines you don't want then I would make a copy of the file deleting those 3 lines and import the rest as a start.

If you are going to read multiple files in this format in the future then copy the code from the LOG that Proc Import generates, paste into the Editor and save the program. Clean up things like line numbers ans such. Then add the Firstobs=5 to read the next file of the same format. Just change the infile to match the new file and the output data set name(if desired).

 

If you have multiple blocks of the Title, Desc and Date in the middle of the file then you need to provide some way to tell Title and Desc from a Name and then parse the data file a bit. If you search the forum for "input _infile_ @;" you will likely find some examples parsing such a file.

Patrick
Opal | Level 21

@leehsin  If this is really only about skipping the first 3 lines then you could just read/write the text file skipping the first 3 lines. That's what below tested code does.

/* create sample external file HAVE */
filename have temp lrecl=100;
data _null_;
  file have;
  infile datalines;
  input;
  _infile_=strip(_infile_);
  put _infile_;
  datalines;
Title
desc1
date
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
;

/* write source external file HAVE to intermediary external file INTER
   - skip first 3 rows
*/
filename inter temp lrecl=100;
data _null_;
  file inter;
  infile have firstobs=4;
  input;
  put _infile_;
run;
filename have clear;

/* use proc import to read external file INTER */
proc import 
 file = inter
 out = want
 dbms = dlm 
 replace;
 delimiter = ' ';
 getnames= yes;
 guessingrows=max;
run;
filename inter clear;

proc print data=want;
run;

 Capture.JPG

Reeza
Super User

Try the default proc import, remove the namerow  & getnames option. If that doesn’t work, please post the log or indicate what didn’t work. 


@leehsin wrote:

I have a .txt data file like this:

Title
desc1
date
name type bday
alan card 11/15/1950
bob e-card 08/23/1951

I want to read into SAS ignoring the line1-3 and get the data like this:

name type bday
alan card 11/15/1950
bob e-card 08/23/1951

I tried proc import or infile, but none of them succeeded.

proc import datafile = 'text.txt'
 out = data1
 dbms = dlm replace;
 delimiter = ' ';
 getnames= yes;
 namerow= 4;
 datarow= 5;
run;

data data2;
	infile 'text.txt' dsd dlm=' ' truncover firstobs=4;
run;

I was not able to figure out how to assign the starting row to read, and at the same time how to get the column names.

 

Thanks ahead for any suggestions!

 

 

 

 


 

leehsin
Quartz | Level 8

It can be replicated by the following steps:

1) Use Notepad to save the following text as a .txt file in your folder:

Title
desc1
date
name type bday
alan card 11/15/1950
bob e-card 08/23/1951

 

2) Assign macro &path. with your path. Use the following code to import the attached .txt file:

%let path = "yourpath";

proc import datafile = "&path.\text.txt"
out = data1
dbms = dlm replace;
delimiter = ' ';
getnames= yes;
run;

SAS read the text successfully. 5 rows and 3 columns created in work.data1 from the text.txt file. But the value in work.data1 is:

column name         Title   VAR2   VAR3

row1                      desc1

row2                      date

row3                      name   type     bday

row4                      alan     card     11/15/1950

row5                      bob     e-card   08/23/1951

 

I want a data like this, only have three columns and three rows including the column names:

column name         name   type     bday

row2                      alan     card     11/15/1950

row3                      bob     e-card   08/23/1951

 

If use the following code:

proc import datafile = "&path.\text.txt"
 out = data1
 dbms = dlm replace;
 delimiter = ' ';
 getnames= yes;
 namerow= 4;
 datarow= 5;
run;

then failed with the error message:

110  proc import datafile = "&path.\text.txt"
111   out = data1
112   dbms = dlm replace;
NOTE: The previous statement has been deleted.
NOTE: The previous statement has been deleted.
113   delimiter = ' ';
114   getnames= yes;
115   namerow= 4;
      -------
      180
ERROR 180-322: Statement is not valid or it is used out of proper order.

116   datarow= 5;
117  run;


NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds 

 

Reeza
Super User

As far as I can see there isn't a namerow option. I can find the data row but it still gets the names from the first row. 

 

@Tom solution to read the names and data separately is one option. Another is to recreate the file removing the first few lines and then read it in using PROC IMPORT. 


Here's an example of how that can get done. This is designed for the temp file to go the work folder so the temp file doesn't stick around if this is something going into production or for cleaner code. 

%let work_path=%sysfunc(pathname(work));

data _null_;
infile '/home/fkhurshed/Demo1/demo.txt.sas';
file "&work_path./temp.txt";
input ;
if _n_ < 4 then delete;
else put _infile_;
run;

proc import out=want datafile="&work_path./temp.txt" dbms=dlm replace;
delimiter = ' ';
getnames=yes;
run;
RichardDeVen
Barite | Level 11

Copy the file to a temporary file that the IMPORT procedure can more easily use.

 

filename mydata 'c:\temp\mydata.txt';
filename nohead temp;

data _null_;
  infile mydata;
    file nohead;

  input;
  if _n_ > 3;
  put _infile_;
run;


proc import 
 datafile = nohead
 out = data1
 dbms = dlm replace
;
 delimiter = ' ';
 getnames= yes;
run;

filename mydata clear;
filename nohead clear;

<soapbox>

 

  • GETNAMES causes the procedure to presume row is the header row containing field name. There are no options for also specifying a header row as anything other than the first row.
    Should SAS add an option NAMEROW= ?
  • Output of an operating system command can be piped to the IMPORT procedure.  Problem is that IMPORT needs random access that PIPE does not have.
    filename nohead pipe "more +3 c:\temp\mydata.txt";
    proc import dbms=dlm datafile=nohead replace out=data2;
    --- LOG ---
    ERROR: Random access not allowed.
    Should SAS update Proc IMPORT to be able to process text data from a stream?

</soapbox>

Kurt_Bremser
Super User

If this is a one-off thing, using a preliminary step to remove the top three lines and then proc import makes sense. But if you have to run that import repeatedly, you will have to use a data step anyway to achieve consistent results, and then you can solve everything in one step by using the firstobs= option.

ChrisNZ
Tourmaline | Level 20

Like this?

filename D "%sysfunc(pathname(WORK))/t.txt";

%* Create data file;
data  _null_;
   file D;
   put 'Title'
     / 'desc1'
     / 'date'
     / 'name type bday'
     / 'alan card 11/15/1950'
     / 'bob e-card 08/23/1951';
run;

%*  Read variable names and create renaming statement;
data _null_;
  infile D firstobs=4;
  input;
  length RENAMELIST $4000;
  do I= 1 to countw(_INFILE_);
     RENAMELIST=catx(' ',RENAMELIST,(catt('VAR',I,'=',scan(_INFILE_,I))));
  end;
  call symput('renamelist',RENAMELIST);
  stop;
run;

%* Read data;
proc import datafile = D
            out      = WANT
            dbms     = dlm 
            replace;
 delimiter = ' ';
 getnames  = no;
 datarow   = 5;
run;

%* Rename columns;
proc datasets nolist;
  modify WANT ;
  rename &renamelist.;
quit;
 
name type bday
alan card 11/15/1950
bob e-card 08/23/1951

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 12552 views
  • 5 likes
  • 8 in conversation