07-03-2013 12:29 PM
I have a comma delimited variable length file that I would like to read. The beginning 58 bytes contain fixed data and at byte 59 the comma delimiter begins for the first value. Each variable will contain the zcode(the Z0*** value) and whatever value is associated with the zcode (could be a date, amount code, etc..) The record length can be up to 30,000 bytes. See below for example of input file:
********************************* Top of Data *********************************
My goal is to create a data set that contains the fixed data and one of the values on a separate line. Below is an example using the first record of the above DSN:
Z00101990244 Z0029 ...
Z00101990244 Z0016 -365.000
Z00101990244 Z0012 0.000
Z00101990244 Z0013 0.000
Z00101990244 Z0014 0.000
Z00101990244 Z0015 0.000
I have tried using the following code, but I am definitely missing something:
INFILE IN1 MISSOVER;
FORMAT SRCVAL $80.;
INPUT @09 CUST $CHAR12.
@25 ACCT13 13.
IF INDEX(_INFILE_,",") > 0 THEN DO;
INPUT @ "," SRCVAL
Any help you can lend to the solution would be greatly appreciated.
07-03-2013 01:29 PM
Sounds like you just need to use the DSD option on the input. Note: Use TRUNCOVER instead of MISSOVER, it is almost never the case that you want to throw away short fields rather than truncating the input to match the available characters.
You did not provide the full fixed field information, so I just read it as one long string variable call COL1.
infile tmpfile1 dsd truncover length=ls col=col lrecl=32000 ;
length col1 $58 colnum 8 zcode $5 srcval $80 ;
input col1 @ ;
do colnum=1 by 1 until (col > ls) ;
input srcval @;
zcode = srcval;
srcval = substr(srcval,6);
keep col1 zcode srcval ;
keep colnum ;
proc print; run;
07-03-2013 01:52 PM
07-03-2013 02:52 PM
DSD - This tells SAS to treat the input as delimited and when there are adjacent delimiters to treat it as meaning a missing value. Also sets the actual delimiter character to a comma. You can override that using the DLM or DELIMITER option.
TRUNCOVER - Tells SAS to NOT move to a new line when you try to read past the end. Plus if you try to read more character than are left on the line it just gives you what it has instead of returning missing value.
LRECL - Needed because the maximum line size to read is larger than the default of 256.
LENGTH=LS - The variable LS with have the length of the current line.
COL=COL - The variable COL will have the column number where SAS will read next.
LENGTH statement defines the variable names, order, type and length. One of my pet peeves is when people use FORMAT statements as surrogate for specifying the length.
Trialing @ on the input statements. This allow you to hold the column pointer and line pointer so that you can continue reading data from this line later.
INPUT COL1 @ ;
Notice that I did not need to use an informat. SAS will read everything up to the next delimiter (comma) and put it into variable COL1. Since I defined the variable already with the LENGTH statement SAS knows to read it as character.
do colnum=1 by 1 until (col > ls) ;
The flexibility of the data step DO statement is one of SAS's most powerful features. I can combined incremental and conditional looping in one statement. I can use incremental without a TO= value as the loop will stop because of the UNTIL() clause. The effect is that COLNUM will start at 1 and increment by one for each time through the loop. It will stop when there are no more commas on the line.
Parsing SRCVAL into ZCODE and rest. There is a trick here in that my code depends on ZCODE being defined as length 5. To be more explicit you can use SUBSTR(srvval,1,5) on the right of the assignment statement. The other assignment statement uses the fact that SUBSTR() allows you to leave out the third parameter. That basically means take everything from the starting column to the end of the input string.
OUTPUT - This says write the observation now. So it can write multiple output records for each line of input.
KEEP - This tells it which of the variables to keep on the output. I think that for the version I posted it is not needed as the COL and LS variables created by the INFILE option will be automatically dropped (if you want to keep them you need assign the value to another variable). I split out the COLNUM variable as a separate KEEP statement because that was not listed in the original posters request, but I added it as it seemed like information that should be available.
As to reading fields in the fixed portion of the data you could combine that with the delimited input. I would recommend replacing the first INPUT statement with something like:
INPUT @9 xx $char12. @25 yy 13. @1 col1 @ ;
This will insure that all of the first column is read so that pointer is left waiting before the DO loop to read the first of the repeating columns.
07-03-2013 09:33 PM
One more way .
filename tmpfile1 'c:\temp\a.txt'; data want; infile tmpfile1 dsd truncover lrecl=32000 ; input col1 : $40. temp : $40. @; do until( missing(temp)); zcode = substr(temp,1,5); srcval = substr(temp,6); output; input temp : $40. @; end; keep col1 zcode srcval ; run;