BookmarkSubscribeRSS Feed
deleted_user
Not applicable
How can I import a text file and capture the column headers when the headers do not start on row 1? I have a text file with a 21-row header, 20 of which I wish to ignore. The 21st row has the column headers I wish to use, and the data begins on the 22nd row. The 'getnames=yes' option assumes that the column headers are on the first row. I need 'getnames=21' but that is not an option. Any ideas?
5 REPLIES 5
Cynthia_sas
SAS Super FREQ
Hi:
I believe there is a DATAROW= option for PROC IMPORT. But it may not result in the variable names that you want.

If I have this data:
[pre]
Line1
Line2
Line3
Line4
Line5
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
[/pre]

In a file called c:\temp\funnydata.txt, then after this PROC IMPORT, your WORK.FUNNY1 dataset will have 3 variables named VAR1, VAR2 and VAR3...and 2 observations.

[pre]
proc import datafile='c:\temp\funnydata.txt'
out=work.funny1
dbms=dlm replace;
delimiter=' ' ;
getnames=no;
datarow=7;
run;

proc print data=work.funny1;
run;
[/pre]

Another method is to use a DATA step program:
[pre]
data work.funny2 ;
infile 'c:\temp\funnydata.txt'
missover firstobs=7 ;
input name $ type $ bday : mmddyy10.;
format bday mmddyy10.;
run;

proc print data=work.funny;
run;

[/pre]
and this will be the result of the proc print:
[pre]
Obs name type bday

1 alan card 11/15/1950
2 bob e-card 08/23/1951
[/pre]

There are other things I probably should have done like have a LENGTH statement or a FORMAT statement for any variables, but this data step works because the FIRSTOBS option controls where SAS starts reading the data. Note that in my INPUT statement, I have hardcoded the names of the variables in the INPUT statement -- so I get my variable names of choice.

cynthia
deleted_user
Not applicable
Thank you for your help! There is one more piece to the puzzle that I failed to mention. I am reading in multiple files like this, and they have varying numbers of columns.

I had used the 'datarow=' option for proc import, which resulted in var1, var2, etc. with the column headers in observation 1. Can I loop through a name assignment?
Could I do something like (pseudocode, not sure exactly how to do this)
for K = 1 to number of variables
vname(varK) = value(observation 1, column K)
next K

Is there some way of ignoring or deleting the first 20 observations, so that the 21st observation with the column headers now becomes the 1st observation, so that I can use the 'getnames=yes' option?

My ultimate solution would be to somehow
1) search the data for a unique known column header that will always appear in the file,
2) ignore prior rows/observations, and
3) read in the column headers as variable names with the data directly below this.

This way I would be able to deal with files containing variable "fluff" prior to my column headers, and variable number of columns.
deleted_user
Not applicable
the infile statement of a data step has option FIRSTOBS=
That would enable you to skip over earlier data.
Cynthia_sas
SAS Super FREQ
Hi:
I did show the use of FIRSTOBS in my DATA Step program. The trick is that somehow you have to "peek" inside the file and look at ROW 6 in the input file in order to get the column names and types for the INPUT statement.

The PROC IMPORT solution is nice because it determines the type of the variables that are being read. The downside of the PROC IMPORT solution is that in order for it to work correctly (with DATAROWS), you essentially have to turn GETNAMES=NO in order to get the 'automatic' names of VAR1, VAR2, VAR3.

I am tempted to suggest something like this:
[pre]
** strip out the "bad" lines;
** and make a NEW file for PROC IMPORT;
filename funnyin 'c:\temp\funnydata.txt';
filename notfunny 'c:\temp\notfunny.txt';

data _null_;
length recline $256;
infile 'c:\temp\funnydata.txt'
missover firstobs=6 length=lg;
input recline $varying. lg;
file notfunny;
if lg gt 0 then put _infile_;
run;

** Use the NOTFUNNY file for PROC IMPORT;
proc import datafile=notfunny
out=work.goodfile
dbms=dlm replace;
delimiter=' ';
getnames=yes;
run;

proc print data=goodfile;
title 'after stripping out bad lines at top of file';
run;
[/pre]

And if you used that program on the data in c:\temp\funnydata.txt:
[pre]
Line1
Line2
Line3
Line4
Line5
name type bday
alan card 11/15/1950
bob e-card 08/23/1951

[/pre]

then you would get PROC IMPORT reading the "new" row 1 to get the variable names. So the "new" NOTFUNNY output file would look like this (stored in c:\temp\notfunny.txt):
[pre]
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
[/pre]

It is this changed file that you could then send to PROC IMPORT. If you needed logic to get the first row of column headers, then you could code that logic in the first DATA step program instead of using FIRSTOBS.

cynthia
Doc_Duke
Rhodochrosite | Level 12
SAS Enterprise Guide will do what you want as long as the header row is before any data rows.

If you can get a copy of DBMScopy, it does it easily. SAS bought the product some years ago, but doesn't seem to support it anymore.

Doc

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1526 views
  • 0 likes
  • 3 in conversation