Re: Import Odd Text File

deleted_user · Posted 06-18-2008 05:33 PM

How can I import a text file and capture the column headers when the headers do not start on row 1? I have a text file with a 21-row header, 20 of which I wish to ignore. The 21st row has the column headers I wish to use, and the data begins on the 22nd row. The 'getnames=yes' option assumes that the column headers are on the first row. I need 'getnames=21' but that is not an option. Any ideas?

Cynthia_sas · Posted 06-18-2008 10:55 PM

Hi:
I believe there is a DATAROW= option for PROC IMPORT. But it may not result in the variable names that you want.

If I have this data:
[pre]
Line1
Line2
Line3
Line4
Line5
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
[/pre]

In a file called c:\temp\funnydata.txt, then after this PROC IMPORT, your WORK.FUNNY1 dataset will have 3 variables named VAR1, VAR2 and VAR3...and 2 observations.

[pre]
proc import datafile='c:\temp\funnydata.txt'
out=work.funny1
dbms=dlm replace;
delimiter=' ' ;
getnames=no;
datarow=7;
run;

proc print data=work.funny1;
run;
[/pre]

Another method is to use a DATA step program:
[pre]
data work.funny2 ;
infile 'c:\temp\funnydata.txt'
missover firstobs=7 ;
input name $ type $ bday : mmddyy10.;
format bday mmddyy10.;
run;

proc print data=work.funny;
run;

[/pre]
and this will be the result of the proc print:
[pre]
Obs name type bday

1 alan card 11/15/1950
2 bob e-card 08/23/1951
[/pre]

There are other things I probably should have done like have a LENGTH statement or a FORMAT statement for any variables, but this data step works because the FIRSTOBS option controls where SAS starts reading the data. Note that in my INPUT statement, I have hardcoded the names of the variables in the INPUT statement -- so I get my variable names of choice.

cynthia

deleted_user · Posted 06-19-2008 09:57 AM

Thank you for your help! There is one more piece to the puzzle that I failed to mention. I am reading in multiple files like this, and they have varying numbers of columns.

I had used the 'datarow=' option for proc import, which resulted in var1, var2, etc. with the column headers in observation 1. Can I loop through a name assignment?
Could I do something like (pseudocode, not sure exactly how to do this)
for K = 1 to number of variables
vname(varK) = value(observation 1, column K)
next K

Is there some way of ignoring or deleting the first 20 observations, so that the 21st observation with the column headers now becomes the 1st observation, so that I can use the 'getnames=yes' option?

My ultimate solution would be to somehow
1) search the data for a unique known column header that will always appear in the file,
2) ignore prior rows/observations, and
3) read in the column headers as variable names with the data directly below this.

This way I would be able to deal with files containing variable "fluff" prior to my column headers, and variable number of columns.

deleted_user · Posted 06-20-2008 01:00 PM

the infile statement of a data step has option FIRSTOBS=
That would enable you to skip over earlier data.

Cynthia_sas · Posted 06-20-2008 03:29 PM

Hi:
I did show the use of FIRSTOBS in my DATA Step program. The trick is that somehow you have to "peek" inside the file and look at ROW 6 in the input file in order to get the column names and types for the INPUT statement.

The PROC IMPORT solution is nice because it determines the type of the variables that are being read. The downside of the PROC IMPORT solution is that in order for it to work correctly (with DATAROWS), you essentially have to turn GETNAMES=NO in order to get the 'automatic' names of VAR1, VAR2, VAR3.

I am tempted to suggest something like this:
[pre]
** strip out the "bad" lines;
** and make a NEW file for PROC IMPORT;
filename funnyin 'c:\temp\funnydata.txt';
filename notfunny 'c:\temp\notfunny.txt';

data _null_;
length recline $256;
infile 'c:\temp\funnydata.txt'
missover firstobs=6 length=lg;
input recline $varying. lg;
file notfunny;
if lg gt 0 then put _infile_;
run;

** Use the NOTFUNNY file for PROC IMPORT;
proc import datafile=notfunny
out=work.goodfile
dbms=dlm replace;
delimiter=' ';
getnames=yes;
run;

proc print data=goodfile;
title 'after stripping out bad lines at top of file';
run;
[/pre]

And if you used that program on the data in c:\temp\funnydata.txt:
[pre]
Line1
Line2
Line3
Line4
Line5
name type bday
alan card 11/15/1950
bob e-card 08/23/1951

[/pre]

then you would get PROC IMPORT reading the "new" row 1 to get the variable names. So the "new" NOTFUNNY output file would look like this (stored in c:\temp\notfunny.txt):
[pre]
name type bday
alan card 11/15/1950
bob e-card 08/23/1951
[/pre]

It is this changed file that you could then send to PROC IMPORT. If you needed logic to get the first row of column headers, then you could code that logic in the first DATA step program instead of using FIRSTOBS.

cynthia

Doc_Duke · Posted 06-19-2008 09:46 AM

SAS Enterprise Guide will do what you want as long as the header row is before any data rows.

If you can get a copy of DBMScopy, it does it easily. SAS bought the product some years ago, but doesn't seem to support it anymore.

Doc

Import Odd Text File