I have a tab-delimited file which was created in Excel 2007 under Windows which looks as follows (portion shown, tabs represented by a T)
95008T 68106T 44970T 7001T
95014T 100020T 54412T 2643T
The file is uploaded to the mainframe and I have a SAS program which reads
in the file as follows:
filename census 'CENSUS.DATA' disp=shr;
infile census dlm='05'X expandtabs missover lrecl=300
input zipcode $5.
The code runs fine EXCEPT for lines where median_income is 100000 and higher (six digits vs.five). For those lines, SAS then complains there is invalid data for the population field and truncates the median_income field (the value should be 100020)
NOTE: Invalid data for population in line 143 14-21.
143 95014 100020 54412 2643
zipcode=95014 med_Income=10002 population=. pop_density=2643
I have tried not using the expandtabs option and the file does not read in properly at all.
Suggest you dump the EXPANDTABS parameter, which is causing your $5 zip-code field to be expanded to $8, resulting in your data-truncation. Otherwise, you will need to account for the adjusted field-width with each of your input data-source locations.
When I use a tab delimited file on the mainframe, I use LIST input and tell SAS the maximum length each variable can be. If the length is shorter than the max, SAS will allow for it. I code like this:
infile in dlm='05'x dsd trancover;
input v1 : $45.
v2 : $10.
v3 : 12.
The colon : allows you to read the value from the next nonblank column until the pointer reaches the next blank column, in this case reading from tab to tab.
tab-delimited and csv-type data files have no fixed structure, as implied by explicit lengths on an input statement.
To support these data types SAS data steps have the features "list input" and DSD.
It is DSD which has the biggest effect because "normal" list input would expect "white space" to be the "delimiter" between data items. For that way. multiple blanks are ignored and treated as "one delimiter".
For DSD (I like to interpret as Delimiter Separated Data) each blank is a delimiter and consecutive delimiters imply a missing value. This DSD feature was introduced coinciding with the introduction of SAS on ms-windows platforms, so it is not really a new feature- but doesn't seem to be among basic SAS training, judging by the number of discussions about how to handle delimiter separated data.
Any time you have to deal with delimited data (csv, tab or pipe or whatever) remember to use DSD.