BookmarkSubscribeRSS Feed
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

Hi all,

I have  a huge text file which I need to import into SAS. I cannot open it in notepad (too big) and when I open it in wordpad the rows get messed up and no  way  I can determine the starting  position of each field

so I write my code based on some of the layout  I got from business

 

 

Column Name DATA TYPE
CONTACT_ID VARCHAR2 (17 Byte)
COMPANY_NAME VARCHAR2 (39 Byte)
FIRST_NAME VARCHAR2 (30 Byte)
LAST_NAME VARCHAR2 (33 Byte)
GENDER VARCHAR2 (3 Byte)
PREFERED_LANGUAGE VARCHAR2 (4 Byte)
MARITAL_STATUS VARCHAR2 (12 Byte)
HOME_PHONE VARCHAR2 (19 Byte)
CELL_PHONE VARCHAR2 (19 Byte)
BUISINESS_PHONE VARCHAR2 (21 Byte)
EMAIL VARCHAR2 (50 Byte)
DT_BIRTH VARCHAR2 (8 Byte)
VIP_INDICATOR VARCHAR2 (3 Byte)
STAFF_INDICATOR VARCHAR2 (3 Byte)
CLIENT_SINCE_DATE VARCHAR2 (8 Byte)
DNSH VARCHAR2 (3 Byte)
HOME_DNCL VARCHAR2 (3 Byte)
CELL_DNCL VARCHAR2 (3 Byte)
BUSINESS_DNCL VARCHAR2 (3 Byte)
DNSO VARCHAR2 (3 Byte)
DT_TO_DATE VARCHAR2 (8 Byte)

 

 

data clientinfo;

infile "&rootname\clientinfo.txt" truncover lrecl=32767 pad obs=5;

 

input

@1 CONTACT_ID $17.

@18 COMPANY_NAME $39.

@57 FIRST_NAME $30.

@87 LAST_NAME $33.

@110 GENDER $3.

@113 PREFERED_LANGUAGE $4.

@117 MARITAL_STATUS $12.

@129 HOME_PHONE $19.

@148 CELL_PHONE $19.

@167 BUISINESS_PHONE $21.

@188 EMAIL $50.

@238 DT_BIRTH $8.

@246 VIP_INDICATOR $3.

@249 STAFF_INDICATOR $3.

@252 CLIENT_SINCE_DATE $8.

@260 DNSH $3.

@263 HOME_DNCL $3.

@266 CELL_DNCL $3.

@269 BUSINESS_DNCL $3.

@272 DNSO $3.

@275 DT_TO_DATE $8.;

 

but when I run this I don't get an accurate output (some of the values getting truncated). Is there any way to find out the  starting position of the fields or even to  import the  file?

I even  tried with the proc import wizard on pc sas and did not work.

I was able to  import 3 other small text files which was able to open and I know this big one  is in the  same  format as those  3 (the value of the first field from the next row start right after the end of value of the last field  from the previous one)

Any suggestions please? 🙂

 

 

8 REPLIES 8
ballardw
Super User

I'm afraid that you may have to post examples of the file that the truncation occurs on.

I am going to guess without example data that your issue is mostly coming from values that have embedded blanks.

This line:

@18 COMPANY_NAME $39.

 

Says to read, starting at column 18. But since you don't have any further instructions it will stop at the first blank if there is one.

 

You might try fixed column which would look like

 

company_name $ 18-56

 

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

I just  tried with fixed column and   got same result

the  first 4 variables are importing properly. Truncation starts after that so I guess I need to keep guessing about the layout...

 

 

Reeza
Super User

You haven’t specified a delimiter, there’s no delimiter at all?

 

If you have smaller files that are the exact same format use the same code but point it to your big file. 

If you can show the code that works for the smaller files that may help. 

 

My approach for these type of issues is to to use PowerShell commands to first get a small subset of the file. Perfect your program on the small file and then implement on the large file. You already have small files so this isn’t necessary. 

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

sorry I was not clear enough
when I said same format I meant all fiels have blanks between the fields and the next row starts right after the previous so for example if one of the smaller files had 3 variables :code1 age and code2 the text file would be like :
AAA 45 BBBAAA 55 BBBAAA 66 BBBAAA....... so when I use a code like:

data clientinfo;

infile "&rootname\clientinfo.txt" truncover lrecl=32767 pad obs=5;
input
@1 code1 $3.
@6 age $2.
@10 code2 $3.
I'd get a perfect output
All these files have different layout though(completely different files) it just that I cannot open the huge file
Can you please share a link to some docs about the powershell thing?
ps.  I have 2 blank spaces between each field

BrunoMueller
SAS Super FREQ

To see what is in your file I suggest to use the following code:

data _null_;
  infile "<yourFile>" obs=10;
  input;
  list;
run;

If the input record contains non printable chars then you will get the input record also displayed in hex. So you will see the delimiter between data values. Maybe it is x09 for a tab character.

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

I just ran this:

data _null_;

infile "&rootname\clientinfo.txt";

input;

put _infile_;

if _n_ > 20 then

stop;

run;

 

and  in the log I got exactly the same  that I'd get in my  output file  from the  above code so looks like the text file I was provided is all messed up

Kurt_Bremser
Super User

@Tal wrote:

I just ran this:

data _null_;

infile "&rootname\clientinfo.txt";

input;

put _infile_;

if _n_ > 20 then

stop;

run;

 

and  in the log I got exactly the same  that I'd get in my  output file  from the  above code so looks like the text file I was provided is all messed up


Modify your code:

data _null_;
infile "&rootname\clientinfo.txt";
file "&rootname\testclient.txt";
input;
put _infile_;
if _n_ > 20 then
stop;
run;

So you can post the resulting file (testclient.txt) as an attachment here, if there's no sensitive data in it. This might give us more clues.

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

Thanks all for your inputs

Turns out that the text file I was provided had a lot of bad data.

I was only reading in 5 records but when I pulled a  1000 I realized everything is fine

Looks like the  first records  from  that file were clients  from different  countries out of Canada with missing gender, weird postal codes....

which I thought were truncated

But all good

Thanks again

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 9705 views
  • 4 likes
  • 5 in conversation