BookmarkSubscribeRSS Feed
DavidJ
Calcite | Level 5
I have a tab-delimited file which was created in Excel 2007 under Windows which looks as follows (portion shown, tabs represented by a T)
95008T 68106T 44970T 7001T
95014T 100020T 54412T 2643T

The file is uploaded to the mainframe and I have a SAS program which reads
in the file as follows:

filename census 'CENSUS.DATA' disp=shr;

Data Census;
infile census dlm='05'X expandtabs missover lrecl=300

input zipcode $5.
median_income 8.
population 8.
pop_density 8.;

run;

The code runs fine EXCEPT for lines where median_income is 100000 and higher (six digits vs.five). For those lines, SAS then complains there is invalid data for the population field and truncates the median_income field (the value should be 100020)

NOTE: Invalid data for population in line 143 14-21.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6--
143 95014 100020 54412 2643
zipcode=95014 med_Income=10002 population=. pop_density=2643

I have tried not using the expandtabs option and the file does not read in properly at all.
6 REPLIES 6
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Suggest you dump the EXPANDTABS parameter, which is causing your $5 zip-code field to be expanded to $8, resulting in your data-truncation. Otherwise, you will need to account for the adjusted field-width with each of your input data-source locations.

Scott Barry
SBBWorks, Inc.
DavidJ
Calcite | Level 5
I'll look into that, thanks for the quick response, Scott.
Robert_Bardos
Fluorite | Level 6
Putting the lengths into a LENGTH statement might solve your problem.
Ran the following (with your T delimiters) on z/OS batch:

[pre]

Data Census;
infile census dlm='T' expandtabs missover /*lrecl=300*/ ;

length zipcode $5 median_income population pop_density 8 ;

input zipcode
median_income
population
pop_density ;

run;

[/pre]

Robert
garybald
Calcite | Level 5
When I use a tab delimited file on the mainframe, I use LIST input and tell SAS the maximum length each variable can be. If the length is shorter than the max, SAS will allow for it. I code like this:
Data Temp;
infile in dlm='05'x dsd trancover;
input v1 : $45.
v2 : $10.
v3 : 12.
etc ;

The colon : allows you to read the value from the next nonblank column until the pointer reaches the next blank column, in this case reading from tab to tab.
garybald
Calcite | Level 5
Sorry, I should have TRUNCOVER instead of trancover
Peter_C
Rhodochrosite | Level 12
hi DavidJ

your best direction is coming from "garybald"

tab-delimited and csv-type data files have no fixed structure, as implied by explicit lengths on an input statement.
To support these data types SAS data steps have the features "list input" and DSD.
It is DSD which has the biggest effect because "normal" list input would expect "white space" to be the "delimiter" between data items. For that way. multiple blanks are ignored and treated as "one delimiter".
For DSD (I like to interpret as Delimiter Separated Data) each blank is a delimiter and consecutive delimiters imply a missing value. This DSD feature was introduced coinciding with the introduction of SAS on ms-windows platforms, so it is not really a new feature- but doesn't seem to be among basic SAS training, judging by the number of discussions about how to handle delimiter separated data.

Any time you have to deal with delimited data (csv, tab or pipe or whatever) remember to use DSD.

good luck
peterC

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2746 views
  • 0 likes
  • 5 in conversation