DATA Step, Macro, Functions and more

Help - Infile Multiple variable Length Fields

Reply
Occasional Contributor Caz
Occasional Contributor
Posts: 5

Help - Infile Multiple variable Length Fields

I have received a very bizarre text file that I need to read into SAS and have become stuck. Here is a flavour of what it is like

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1Smiley Very HappyUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

I have started by using code such as

DATA pays;

INFILE DD TRUNCOVER SCANOVER;

INPUT @'{1:' Field1 £4.

          @'{2:' Field2 £5.;

RUN;

This read the first 2 columns of data fine but I am now struggling to figure out how to do the rest and wonder if anyone out there can assist?

The results that I am seeking

Field1         Field2          Field3                         Field4

ABCD         10234                                                40.45 

TRHG         38976          {H132}                           567.43

LKMN         23109                                            10432.67 

HGKO        19876          {J987}{F102}                  4320.12

DUHN         23098         {G123}{H987}{J234}          345.23

Valued Guide
Posts: 2,177

Help - Infile Multiple variable Length Fields

*

most people know of the INFILE option DLM= and use it with a constant like ','  '|' and '09'x.

What fewer appreciate is that this option can define a variable name.

This capablility allows you to change the delimiter each time you INPUT

;

data test( keep= field1-field4);

 

infile cards dlm=dlm truncover  column=col;

call missing( col, len ) ;

dlm='{:}' ;

do while( col LE len ) ;

input   var_ind  @ ;

if not len then len = length(_infile_) ;

select (var_ind) ;

   when (1) do ;

      dlm= '}{' ;

      input field1  :$10. @ ;

   end;

   when (2) do ;

      dlm= '}{' ;

      input field2  :32. @ ;

   end;

   when (3) do ;

      dlm= ':' ;

      input field3  :$100. @ ;

      if col < len then input @+(-2) @ ;

      *repositioning to the field number ;

      substr( field3, length( field3) -2 ) = ' ' ;

      * removing the trailing }{#: ;

   end;

   when (4) do ;

      dlm= '}{:' ;

      input field4  :32. @ ;

   end;

   otherwise put '???' _infile_ ;

end ;

dlm='{:}' ;

end ;

output ;

format field1 $6. field2 field4 best8. field3 $22. ;

cards ;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1Smiley Very HappyUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

;

Occasional Contributor Caz
Occasional Contributor
Posts: 5

Help - Infile Multiple variable Length Fields

Thank you very much Peter. It certainly works.

I did not realise it would appear so complex, now to get my head around it.......:smileyconfused:

Valued Guide
Posts: 2,177

Help - Infile Multiple variable Length Fields

Caz

the program is about as complicated as your data - the SELECT/WHEN group provide organisation for parsing and inputting the differing field types. ( cannot use { and} to delimit FIELD3 so I used the : after the next field indicator - might be interesting when 3 is last on a row Smiley Happy

Perhaps some perl regex wizard might have some way of parsing the content into strings in a single statement, but I fail to see how that would provide the control to specify the informats for each field. You will still need use the field indicator in the data to pick out the apropriate informat.

peter

ps

any regex wizard want to get in touch, I'd be interested in a related issue

Trusted Advisor
Posts: 1,301

Help - Infile Multiple variable Length Fields

That was a very fun problem.  Very difficult for me to come up with the right regular expression.

data test;

length f $32 field1-field4 $30;

if _n_=1 then prx=prxparse('/(\{\d:.+?(?=(\{\d\Smiley Happy|(\Z)))/');

retain prx;

input@;

start=1;

stop=length(_infile_);

call prxnext(prx,start,stop,_infile_,p,l);

do while(p>0);

  f=substr(_infile_,p+1,l-2);

  select(scan(f,1,':'));

   when(1)field1=scan(f,2,':');

   when(2)field2=scan(f,2,':');

   when(3)field3=scan(f,2,':');

   when(4)field4=scan(f,2,':');

   otherwise;

  end;

  call prxnext(prx,start,stop,_infile_,p,l);

end;

output;

keep field1-field4;

cards;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1Smiley Very HappyUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

;

run;

Super User
Super User
Posts: 7,074

Re: Help - Infile Multiple variable Length Fields

Not that hard if you use truncover and read initially as strings and then add a little logic to remove the trailing characters and convert to numeric if wanted.

data x ;

  infile cards truncover ;

  input @1 @'1:' field1 $4.

        @1 @'2:' field2c $12.

        @1 @'3:' field3 $200.

        @1 @'4:' field4c $12.

  ;

  field2 = input(scan(field2c,1,'}'),12.);

  field4 = input(scan(field4c,1,'}'),12.);

  if index(field3,':') then field3=substr(field3,1,index(field3,':')-4);

  put (field1-field4) (=/);

cards;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1:DUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

run;

field1=ABCD

field2=10234

field3=

field4=40.45

field1=TRHG

field2=38976

field3={H132}

field4=567.43

field1=LKMN

field2=23109

field3=

field4=10432.67

field1=HGKO

field2=19876

field3={J987}{F102}

field4=4320.12

field1=DUHN

field2=23098

field3={G123}{H987}{J234}

field4=345.23

You might need more logic if you have records with field 3 without a trailing field 4 to eliminate the extra close curly bracket.

You also might want to split the field 3 values into multiple records.

data x ;

  infile cards truncover ;

  input @1 @'1:' field1 $4.

        @1 @'2:' field2c $12.

        @1 @'3:' field3x $200.

        @1 @'4:' field4c $12.

  ;

  field2 = input(scan(field2c,1,'}'),12.);

  field4 = input(scan(field4c,1,'}'),12.);

  if index(field3x,':') then field3x=substr(field3x,1,index(field3x,':')-4);

  length field3 $4 ;

  put (field1 field2 field4) (=/);

  do i=1 by 1 until (field3=' ');

    field3=scan(field3x,i,'{}');

    if i=1 or field3 ne ' ' then do;

      output;

      put field3= ;

    end;

  end;

cards;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1:DUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

run;

field1=ABCD

field2=10234

field4=40.45

field3=

field1=TRHG

field2=38976

field4=567.43

field3=H132

field1=LKMN

field2=23109

field4=10432.67

field3=

field1=HGKO

field2=19876

field4=4320.12

field3=J987

field3=F102

field1=DUHN

field2=23098

field4=345.23

field3=G123

field3=H987

field3=J234

Occasional Contributor Caz
Occasional Contributor
Posts: 5

Re: Help - Infile Multiple variable Length Fields

Thank you very much Tom.

This is the easiest option for me to follow and naturally follows what I was exploring.

I have today taken this idea forward to the much bigger, nastier file that I have and it is working reasonably well.

What is the significance of the @1 on each of the input rows? Is that forcing it to stay on the one input row at a time?

Super User
Super User
Posts: 7,074

Re: Help - Infile Multiple variable Length Fields

The @1 will move the pointer back to column 1.  This is to prevent if from missing fields because the pointer has moved past them. For example in the line without any {3:....} field the pointer would have moved to the end of the line so the {4:...} field would not be found.

If your file has mulitple copies of the same field (for example {1:XXXX}{1:YYYY} then you would need more complicated logic to read that.

Super User
Posts: 10,044

Re: Help - Infile Multiple variable Length Fields

Then The fourth input method(named input) would be  helpful.

data _null_ ;
file x ;
retain max .;
input;
*get magic number;
max=max(max,input(scan(_infile_,-3,':{}','M'),best8.));
call symputx('max',max);
_infile_=prxchange('s/\}$//o',1,strip(_infile_));
_infile_=prxchange('s/(\{|\}\{)(?=\d+:)/ field/o',-1,_infile_);
_infile_=prxchange('s/:/=/o',-1,_infile_); 
put _infile_;
datalines;
{1:ABCD}{2:10234}{4:40.45}
{1:TRHG}{2:38976}{3:{H132}}{4:567.43}
{1:LKMN}{2:23109}{4:10432.67}
{1:HGKO}{2:19876}{3:{J987} {F102}}{4:4320.12}
{1:DUHN}{2:23098}{3:{G123} {H987}{J234}}{4:345.23}
run;
%put &max;

data x;
 infile x ;
 input (field1 - field&max.) (= $100.);
run;





Ksharp

Message was edited by: xia keshan

Valued Guide
Posts: 765

Re: Help - Infile Multiple variable Length Fields

hi ... similar to Tom's idea ...

data x (drop=tempSmiley Happy;

infile datalines truncover;

input

@1 @'{1:' field1 $4.

@1 @'{2:' field2 $5.

@1 @'{3:' temp3 $100.

@1 @'{4:' temp4 $10.

;

if ^missing(temp3) then field3 = substr(temp3,1,find(temp3,'}}'));

field4 = input(compress(temp4,'.','kd'),10.);

datalines;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1Smiley Very HappyUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

;

run;

Ask a Question
Discussion stats
  • 9 replies
  • 287 views
  • 5 likes
  • 6 in conversation