I have a .txt data file like this:
Title
desc1
date
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
I want to read into SAS ignoring the line1-3 and get the data like this:
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
I tried proc import or infile, but none of them succeeded.
proc import datafile = 'text.txt'
out = data1
dbms = dlm replace;
delimiter = ' ';
getnames= yes;
namerow= 4;
datarow= 5;
run;
data data2;
infile 'text.txt' dsd dlm=' ' truncover firstobs=4;
run;
I was not able to figure out how to assign the starting row to read, and at the same time how to get the column names.
Thanks ahead for any suggestions!
Like this?
filename D "%sysfunc(pathname(WORK))/t.txt";
%* Create data file;
data _null_;
file D;
put 'Title'
/ 'desc1'
/ 'date'
/ 'name type bday'
/ 'alan card 11/15/1950'
/ 'bob e-card 08/23/1951';
run;
%* Read variable names and create renaming statement;
data _null_;
infile D firstobs=4;
input;
length RENAMELIST $4000;
do I= 1 to countw(_INFILE_);
RENAMELIST=catx(' ',RENAMELIST,(catt('VAR',I,'=',scan(_INFILE_,I))));
end;
call symput('renamelist',RENAMELIST);
stop;
run;
%* Read data;
proc import datafile = D
out = WANT
dbms = dlm
replace;
delimiter = ' ';
getnames = no;
datarow = 5;
run;
%* Rename columns;
proc datasets nolist;
modify WANT ;
rename &renamelist.;
quit;
name | type | bday |
---|---|---|
alan | card | 11/15/1950 |
bob | e-card | 08/23/1951 |
Why are you asking SAS to guess how to read a file that only has three variables?
data want ;
infile cards dsd dlm=' ' truncover ;
length name $20 type $10 bday 8 ;
if _n_= 1 then input title $200. / desc $200. / date :mmddyy. / ;
input name type bday :mmddyy.;
format bday date date9.;
retain title desc date;
cards;
BDAY CARDS
bla bla
03/20/2020
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
;
Obs name type bday title desc date 1 alan card 15NOV1950 BDAY CARDS bla bla 20MAR2020 2 bob e-card 23AUG1951 BDAY CARDS bla bla 20MAR2020
Not sure a TEXT file is the best format to store data if you don't already know what data types the text is supposed to represent. But if you want to let PROC IMPORT guess at how to define the variables then just tell it NOT to read the names.
proc import datafile=text dbms=dlm out=want replace ;
delimiter=' ';
datarow=5;
getnames=NO;
run;
You can always read the names yourself and then use them to rename the variables.
proc import datafile=text dbms=dlm out=headers replace;
delimiter=' ';
datarow=4;
getnames=NO;
run;
proc transpose data=headers(obs=1) out=names ;
var _all_;
run;
filename code temp;
data _null_;
set names end=eof;
file code;
if _n_=1 then put 'rename ';
put _name_ '=' col1 :$quote. +(-1) 'n' ;
if eof then put ';';
run;
proc datasets nolist lib=work;
modify want ;
%include code / source2;
run;
quit;
Obs name type bday 1 alan card 11/15/1950 2 bob e-card 08/23/1951
The .txt file is my source data and actually contains large amount of columns and rows. It has headers (title, descriptions, and even tail at the bottom of the data body). There are many .txt files to be imported. To manually input the names of column is not practical. The example I presented here is just a simplified one that represent the structure of .txt data file. If a program can import this example .txt file and output the SAS data that I want, my mission is accomplished. For now, I have to manually read these .txt files using Excel, delete the unwanted parts, and save it into a csv file, then these .csv files are subjected for SAS to import.
Thanks for all the suggestions and helps!
Do all these txt files share the same structure?
There were several solutions posted that do not require manual intervention. Did none of them work for you at all?
I tested mine with your exact posted and it imported the file exactly as requested, though it's hard coded to ignore the first 4 lines in the header. It's possible to make that dynamic, if the number of lines to ignore changes, but since you were providing 4/5 to the PROC IMPORT I assumed that was fixed parameters. It's easy enough to make it a macro as well to pass the file name and the number of header lines if that's another approach.
@leehsin wrote:
The .txt file is my source data and actually contains large amount of columns and rows. It has headers (title, descriptions, and even tail at the bottom of the data body). There are many .txt files to be imported. To manually input the names of column is not practical. The example I presented here is just a simplified one that represent the structure of .txt data file. If a program can import this example .txt file and output the SAS data that I want, my mission is accomplished. For now, I have to manually read these .txt files using Excel, delete the unwanted parts, and save it into a csv file, then these .csv files are subjected for SAS to import.
Thanks for all the suggestions and helps!
Proc Import will ONLY get the variable names if they are the first row (at least in the current incarnation).
If there is only one block of the 3 lines you don't want then I would make a copy of the file deleting those 3 lines and import the rest as a start.
If you are going to read multiple files in this format in the future then copy the code from the LOG that Proc Import generates, paste into the Editor and save the program. Clean up things like line numbers ans such. Then add the Firstobs=5 to read the next file of the same format. Just change the infile to match the new file and the output data set name(if desired).
If you have multiple blocks of the Title, Desc and Date in the middle of the file then you need to provide some way to tell Title and Desc from a Name and then parse the data file a bit. If you search the forum for "input _infile_ @;" you will likely find some examples parsing such a file.
@leehsin If this is really only about skipping the first 3 lines then you could just read/write the text file skipping the first 3 lines. That's what below tested code does.
/* create sample external file HAVE */
filename have temp lrecl=100;
data _null_;
file have;
infile datalines;
input;
_infile_=strip(_infile_);
put _infile_;
datalines;
Title
desc1
date
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
;
/* write source external file HAVE to intermediary external file INTER
- skip first 3 rows
*/
filename inter temp lrecl=100;
data _null_;
file inter;
infile have firstobs=4;
input;
put _infile_;
run;
filename have clear;
/* use proc import to read external file INTER */
proc import
file = inter
out = want
dbms = dlm
replace;
delimiter = ' ';
getnames= yes;
guessingrows=max;
run;
filename inter clear;
proc print data=want;
run;
Try the default proc import, remove the namerow & getnames option. If that doesn’t work, please post the log or indicate what didn’t work.
@leehsin wrote:
I have a .txt data file like this:
Title desc1 date name type bday alan card 11/15/1950 bob e-card 08/23/1951
I want to read into SAS ignoring the line1-3 and get the data like this:
name type bday alan card 11/15/1950 bob e-card 08/23/1951
I tried proc import or infile, but none of them succeeded.
proc import datafile = 'text.txt' out = data1 dbms = dlm replace; delimiter = ' '; getnames= yes; namerow= 4; datarow= 5; run; data data2; infile 'text.txt' dsd dlm=' ' truncover firstobs=4; run;
I was not able to figure out how to assign the starting row to read, and at the same time how to get the column names.
Thanks ahead for any suggestions!
It can be replicated by the following steps:
1) Use Notepad to save the following text as a .txt file in your folder:
Title
desc1
date
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
2) Assign macro &path. with your path. Use the following code to import the attached .txt file:
%let path = "yourpath";
proc import datafile = "&path.\text.txt"
out = data1
dbms = dlm replace;
delimiter = ' ';
getnames= yes;
run;
SAS read the text successfully. 5 rows and 3 columns created in work.data1 from the text.txt file. But the value in work.data1 is:
column name Title VAR2 VAR3
row1 desc1
row2 date
row3 name type bday
row4 alan card 11/15/1950
row5 bob e-card 08/23/1951
I want a data like this, only have three columns and three rows including the column names:
column name name type bday
row2 alan card 11/15/1950
row3 bob e-card 08/23/1951
If use the following code:
proc import datafile = "&path.\text.txt"
out = data1
dbms = dlm replace;
delimiter = ' ';
getnames= yes;
namerow= 4;
datarow= 5;
run;
then failed with the error message:
110 proc import datafile = "&path.\text.txt"
111 out = data1
112 dbms = dlm replace;
NOTE: The previous statement has been deleted.
NOTE: The previous statement has been deleted.
113 delimiter = ' ';
114 getnames= yes;
115 namerow= 4;
-------
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
116 datarow= 5;
117 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
As far as I can see there isn't a namerow option. I can find the data row but it still gets the names from the first row.
@Tom solution to read the names and data separately is one option. Another is to recreate the file removing the first few lines and then read it in using PROC IMPORT.
Here's an example of how that can get done. This is designed for the temp file to go the work folder so the temp file doesn't stick around if this is something going into production or for cleaner code.
%let work_path=%sysfunc(pathname(work));
data _null_;
infile '/home/fkhurshed/Demo1/demo.txt.sas';
file "&work_path./temp.txt";
input ;
if _n_ < 4 then delete;
else put _infile_;
run;
proc import out=want datafile="&work_path./temp.txt" dbms=dlm replace;
delimiter = ' ';
getnames=yes;
run;
Copy the file to a temporary file that the IMPORT procedure can more easily use.
filename mydata 'c:\temp\mydata.txt';
filename nohead temp;
data _null_;
infile mydata;
file nohead;
input;
if _n_ > 3;
put _infile_;
run;
proc import
datafile = nohead
out = data1
dbms = dlm replace
;
delimiter = ' ';
getnames= yes;
run;
filename mydata clear;
filename nohead clear;
<soapbox>
filename nohead pipe "more +3 c:\temp\mydata.txt";
proc import dbms=dlm datafile=nohead replace out=data2;
--- LOG ---
ERROR: Random access not allowed.
Should SAS update Proc IMPORT to be able to process text data from a stream?</soapbox>
If this is a one-off thing, using a preliminary step to remove the top three lines and then proc import makes sense. But if you have to run that import repeatedly, you will have to use a data step anyway to achieve consistent results, and then you can solve everything in one step by using the firstobs= option.
Like this?
filename D "%sysfunc(pathname(WORK))/t.txt";
%* Create data file;
data _null_;
file D;
put 'Title'
/ 'desc1'
/ 'date'
/ 'name type bday'
/ 'alan card 11/15/1950'
/ 'bob e-card 08/23/1951';
run;
%* Read variable names and create renaming statement;
data _null_;
infile D firstobs=4;
input;
length RENAMELIST $4000;
do I= 1 to countw(_INFILE_);
RENAMELIST=catx(' ',RENAMELIST,(catt('VAR',I,'=',scan(_INFILE_,I))));
end;
call symput('renamelist',RENAMELIST);
stop;
run;
%* Read data;
proc import datafile = D
out = WANT
dbms = dlm
replace;
delimiter = ' ';
getnames = no;
datarow = 5;
run;
%* Rename columns;
proc datasets nolist;
modify WANT ;
rename &renamelist.;
quit;
name | type | bday |
---|---|---|
alan | card | 11/15/1950 |
bob | e-card | 08/23/1951 |
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.