BookmarkSubscribeRSS Feed
kjd2121
Calcite | Level 5

I have a comma delimited variable length file that I would like to read.  The beginning 58 bytes contain fixed data and at byte 59 the comma delimiter begins for the first value.  Each variable will contain the zcode(the Z0*** value) and whatever value is associated with the zcode (could be a date, amount code, etc..) The record length can be up to 30,000 bytes.  See below for example of input file:

---5----+----6----+----7----+----8----+----9----+----0----+----1----+----2----+-

********************************* Top of Data *********************************

Z00101990244,Z0029...,Z0016-365.000,Z00120.000,Z00130.000,Z00140.000,Z00150.000,

Z00102010001,Z0029...,Z00160.000,Z00120.000,Z00130.000,Z00140.000,Z00150.000,Z00

Z00101997248,Z0029...,Z00160.000,Z00120.000,Z00130.000,Z00140.000,Z00150.000,Z00

Z00101999029,Z0029...,Z00160.000,Z00120.000,Z00130.000,Z00140.000,Z00150.000,Z00

Z00101999291,Z0029...,Z00160.000,Z00120.000,Z00130.000,Z00140.000,Z00150.000,Z00

.....

My goal is to create a data set that contains the fixed data and one of the values on a separate line.  Below is an example using the first record of the above DSN:

Z00101990244 Z0029 ...

Z00101990244 Z0016 -365.000

Z00101990244 Z0012 0.000

Z00101990244 Z0013 0.000

Z00101990244 Z0014 0.000

Z00101990244 Z0015 0.000

......


I have tried using the following code, but I am definitely missing something:


DATA DELIM1;                                       

  INFILE IN1 MISSOVER;                             

    FORMAT SRCVAL $80.;                            

    INPUT @09  CUST    $CHAR12.                    

          @25  ACCT13  13.                         

          @ ;                                      

    IF INDEX(_INFILE_,",") > 0 THEN DO;            

      INPUT @ "," SRCVAL                           

            @;                                     

      OUTPUT;                                      

    END;                                           

                                                   

Any help you can lend to the solution would be greatly appreciated.

6 REPLIES 6
Tom
Super User Tom
Super User

Sounds like you just need to use the DSD option on the input. Note: Use TRUNCOVER instead of MISSOVER, it is almost never the case that you want to throw away short fields rather than truncating the input to match the available characters.

You did not provide the full fixed field information, so I just read it as one long string variable call COL1.

data delim1;

  infile tmpfile1 dsd truncover length=ls col=col lrecl=32000 ;

  length col1 $58 colnum 8 zcode $5 srcval $80 ;

  input col1 @ ;

  do colnum=1 by 1 until (col > ls) ;

    input srcval @;

    zcode = srcval;

    srcval = substr(srcval,6);

    output;

  end;

  keep col1 zcode srcval ;

  keep colnum ;

run;

proc print; run;

       

esjackso
Quartz | Level 8

Maybe its because its a day before a holiday in the US, but your great code is stumping me about how its doing what its doing. If you have some time could you offer some further explanation on the code?

Thanks!

EJ

Tom
Super User Tom
Super User

http://www2.sas.com/proceedings/sugi23/Training/p275.pdf

Infile options:

DSD - This tells SAS to treat the input as delimited and when there are adjacent delimiters to treat it as meaning a missing value.  Also sets the actual delimiter character to a comma.  You can override that using the DLM or DELIMITER option.

TRUNCOVER - Tells SAS to NOT move to a new line when you try to read past the end. Plus if you try to read more character than are left on the line it just gives you what it has instead of returning missing value.

LRECL -  Needed because the maximum line size to read is larger than the default of 256.

LENGTH=LS - The variable LS with have the length of the current line.

COL=COL - The variable COL will have the column number where SAS will read next.

LENGTH statement defines the variable names, order, type and length.  One of my pet peeves is when people use FORMAT statements as surrogate for specifying the length.

Trialing @ on the input statements.  This allow you to hold the column pointer and line pointer so that you can continue reading data from this line later.

INPUT COL1 @ ;

Notice that I did not need to use an informat. SAS will read everything up to the next delimiter (comma) and put it into variable COL1.  Since I defined the variable already with the LENGTH statement SAS knows to read it as character.

do colnum=1 by 1 until (col > ls) ;

The flexibility of the data step DO statement is one of SAS's most powerful features. I can combined incremental and conditional looping in one statement. I can use incremental without a TO= value as the loop will stop because of the UNTIL() clause. The effect is that COLNUM will start at 1 and increment by one for each time through the loop. It will stop when there are no more commas on the line.


Parsing SRCVAL into ZCODE and rest.  There is a trick here in that my code depends on ZCODE being defined as length 5.  To be more explicit you can use SUBSTR(srvval,1,5) on the right of the assignment statement.  The other assignment statement uses the fact that SUBSTR() allows you to leave out the third parameter.  That basically means take everything from the starting column to the end of the input string.


OUTPUT -  This says write the observation now.  So it can write multiple output records for each line of input.


KEEP - This tells it which of the variables to keep on the output.  I think that for the version I posted it is not needed as the COL and LS variables created by the INFILE option will be automatically dropped (if you want to keep them you need assign the value to another variable).  I split out the COLNUM variable as a separate KEEP statement because that was not listed in the original posters request, but I added it as it seemed like information that should be available.


As to reading fields in the fixed portion of the data you could combine that with the delimited input.  I would recommend replacing the first INPUT statement with something like:


INPUT @9 xx $char12. @25 yy 13. @1 col1 @ ;


This will insure that all of the first column is read so that pointer is left waiting before the DO loop to read the first of the repeating columns.





esjackso
Quartz | Level 8

As always Thanks Tom!!

EJ

kjd2121
Calcite | Level 5

Tom,

Thank you very much for an elegant solution.  Works perfectly!!

Ksharp
Super User

One more way .

filename tmpfile1 'c:\temp\a.txt';
data want;
  infile tmpfile1 dsd truncover  lrecl=32000 ;
  input col1 : $40.  temp : $40. @;
  do until( missing(temp));
    zcode = substr(temp,1,5);
    srcval = substr(temp,6);
    output;
     input temp : $40. @;
  end;
  keep col1 zcode srcval ;
run;


Ksharp

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2176 views
  • 1 like
  • 4 in conversation