importing a text file with no headers

Tal · Posted 12-04-2017 06:57 PM

Hi all,

I have a huge text file which I need to import into SAS. I cannot open it in notepad (too big) and when I open it in wordpad the rows get messed up and no way I can determine the starting position of each field

so I write my code based on some of the layout I got from business

Column Name	DATA TYPE
CONTACT_ID	VARCHAR2 (17 Byte)
COMPANY_NAME	VARCHAR2 (39 Byte)
FIRST_NAME	VARCHAR2 (30 Byte)
LAST_NAME	VARCHAR2 (33 Byte)
GENDER	VARCHAR2 (3 Byte)
PREFERED_LANGUAGE	VARCHAR2 (4 Byte)
MARITAL_STATUS	VARCHAR2 (12 Byte)
HOME_PHONE	VARCHAR2 (19 Byte)
CELL_PHONE	VARCHAR2 (19 Byte)
BUISINESS_PHONE	VARCHAR2 (21 Byte)
EMAIL	VARCHAR2 (50 Byte)
DT_BIRTH	VARCHAR2 (8 Byte)
VIP_INDICATOR	VARCHAR2 (3 Byte)
STAFF_INDICATOR	VARCHAR2 (3 Byte)
CLIENT_SINCE_DATE	VARCHAR2 (8 Byte)
DNSH	VARCHAR2 (3 Byte)
HOME_DNCL	VARCHAR2 (3 Byte)
CELL_DNCL	VARCHAR2 (3 Byte)
BUSINESS_DNCL	VARCHAR2 (3 Byte)
DNSO	VARCHAR2 (3 Byte)
DT_TO_DATE	VARCHAR2 (8 Byte)

data clientinfo;

infile "&rootname\clientinfo.txt" truncover lrecl=32767 pad obs=5;

input

@1 CONTACT_ID $17.

@18 COMPANY_NAME $39.

@57 FIRST_NAME $30.

@87 LAST_NAME $33.

@110 GENDER $3.

@113 PREFERED_LANGUAGE $4.

@117 MARITAL_STATUS $12.

@129 HOME_PHONE $19.

@148 CELL_PHONE $19.

@167 BUISINESS_PHONE $21.

@188 EMAIL $50.

@238 DT_BIRTH $8.

@246 VIP_INDICATOR $3.

@249 STAFF_INDICATOR $3.

@252 CLIENT_SINCE_DATE $8.

@260 DNSH $3.

@263 HOME_DNCL $3.

@266 CELL_DNCL $3.

@269 BUSINESS_DNCL $3.

@272 DNSO $3.

@275 DT_TO_DATE $8.;

but when I run this I don't get an accurate output (some of the values getting truncated). Is there any way to find out the starting position of the fields or even to import the file?

I even tried with the proc import wizard on pc sas and did not work.

I was able to import 3 other small text files which was able to open and I know this big one is in the same format as those 3 (the value of the first field from the next row start right after the end of value of the last field from the previous one)

Any suggestions please? 🙂

ballardw · Posted 12-04-2017 07:05 PM

I'm afraid that you may have to post examples of the file that the truncation occurs on.

I am going to guess without example data that your issue is mostly coming from values that have embedded blanks.

This line:

@18 COMPANY_NAME $39.

Says to read, starting at column 18. But since you don't have any further instructions it will stop at the first blank if there is one.

You might try fixed column which would look like

company_name $ 18-56

Tal · Posted 12-04-2017 07:22 PM

I just tried with fixed column and got same result

the first 4 variables are importing properly. Truncation starts after that so I guess I need to keep guessing about the layout...

Reeza · Posted 12-04-2017 07:29 PM

You haven’t specified a delimiter, there’s no delimiter at all?

If you have smaller files that are the exact same format use the same code but point it to your big file.

If you can show the code that works for the smaller files that may help.

My approach for these type of issues is to to use PowerShell commands to first get a small subset of the file. Perfect your program on the small file and then implement on the large file. You already have small files so this isn’t necessary.

Tal · Posted 12-04-2017 08:00 PM

sorry I was not clear enough
when I said same format I meant all fiels have blanks between the fields and the next row starts right after the previous so for example if one of the smaller files had 3 variables :code1 age and code2 the text file would be like :
AAA 45 BBBAAA 55 BBBAAA 66 BBBAAA....... so when I use a code like:

data clientinfo;

infile "&rootname\clientinfo.txt" truncover lrecl=32767 pad obs=5;
input
@1 code1 $3.
@6 age $2.
@10 code2 $3.
I'd get a perfect output
All these files have different layout though(completely different files) it just that I cannot open the huge file
Can you please share a link to some docs about the powershell thing?
ps. I have 2 blank spaces between each field

BrunoMueller · Posted 12-05-2017 03:59 AM

To see what is in your file I suggest to use the following code:

data _null_;
  infile "<yourFile>" obs=10;
  input;
  list;
run;

If the input record contains non printable chars then you will get the input record also displayed in hex. So you will see the delimiter between data values. Maybe it is x09 for a tab character.

Tal · Posted 12-04-2017 08:31 PM

I just ran this:

data _null_;

infile "&rootname\clientinfo.txt";

input;

put _infile_;

if _n_ > 20 then

stop;

run;

and in the log I got exactly the same that I'd get in my output file from the above code so looks like the text file I was provided is all messed up

Kurt_Bremser · Posted 12-05-2017 03:04 AM

@Tal wrote:

I just ran this:

data _null_;

infile "&rootname\clientinfo.txt";

input;

put _infile_;

if _n_ > 20 then

stop;

run;

and in the log I got exactly the same that I'd get in my output file from the above code so looks like the text file I was provided is all messed up

Modify your code:

data _null_;
infile "&rootname\clientinfo.txt";
file "&rootname\testclient.txt";
input;
put _infile_;
if _n_ > 20 then
stop;
run;

So you can post the resulting file (testclient.txt) as an attachment here, if there's no sensitive data in it. This might give us more clues.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tal · Posted 12-05-2017 03:35 PM

Thanks all for your inputs

Turns out that the text file I was provided had a lot of bad data.

I was only reading in 5 records but when I pulled a 1000 I realized everything is fine

Looks like the first records from that file were clients from different countries out of Canada with missing gender, weird postal codes....

which I thought were truncated

But all good

Thanks again

importing a text file with no headers

Re: importing a text file with no headers

Re: importing a text file with no headers

Re: importing a text file with no headers

Re: importing a text file with no headers

Re: importing a text file with no headers

Re: importing a text file with no headers

Re: importing a text file with no headers

Re: importing a text file with no headers

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!