BookmarkSubscribeRSS Feed
Caz
Calcite | Level 5 Caz
Calcite | Level 5

I have received a very bizarre text file that I need to read into SAS and have become stuck. Here is a flavour of what it is like

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1:DUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

I have started by using code such as

DATA pays;

INFILE DD TRUNCOVER SCANOVER;

INPUT @'{1:' Field1 £4.

          @'{2:' Field2 £5.;

RUN;

This read the first 2 columns of data fine but I am now struggling to figure out how to do the rest and wonder if anyone out there can assist?

The results that I am seeking

Field1         Field2          Field3                         Field4

ABCD         10234                                                40.45 

TRHG         38976          {H132}                           567.43

LKMN         23109                                            10432.67 

HGKO        19876          {J987}{F102}                  4320.12

DUHN         23098         {G123}{H987}{J234}          345.23

9 REPLIES 9
Peter_C
Rhodochrosite | Level 12

*

most people know of the INFILE option DLM= and use it with a constant like ','  '|' and '09'x.

What fewer appreciate is that this option can define a variable name.

This capablility allows you to change the delimiter each time you INPUT

;

data test( keep= field1-field4);

 

infile cards dlm=dlm truncover  column=col;

call missing( col, len ) ;

dlm='{:}' ;

do while( col LE len ) ;

input   var_ind  @ ;

if not len then len = length(_infile_) ;

select (var_ind) ;

   when (1) do ;

      dlm= '}{' ;

      input field1  :$10. @ ;

   end;

   when (2) do ;

      dlm= '}{' ;

      input field2  :32. @ ;

   end;

   when (3) do ;

      dlm= ':' ;

      input field3  :$100. @ ;

      if col < len then input @+(-2) @ ;

      *repositioning to the field number ;

      substr( field3, length( field3) -2 ) = ' ' ;

      * removing the trailing }{#: ;

   end;

   when (4) do ;

      dlm= '}{:' ;

      input field4  :32. @ ;

   end;

   otherwise put '???' _infile_ ;

end ;

dlm='{:}' ;

end ;

output ;

format field1 $6. field2 field4 best8. field3 $22. ;

cards ;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1:DUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

;

Caz
Calcite | Level 5 Caz
Calcite | Level 5

Thank you very much Peter. It certainly works.

I did not realise it would appear so complex, now to get my head around it.......:smileyconfused:

Peter_C
Rhodochrosite | Level 12

Caz

the program is about as complicated as your data - the SELECT/WHEN group provide organisation for parsing and inputting the differing field types. ( cannot use { and} to delimit FIELD3 so I used the : after the next field indicator - might be interesting when 3 is last on a row 🙂

Perhaps some perl regex wizard might have some way of parsing the content into strings in a single statement, but I fail to see how that would provide the control to specify the informats for each field. You will still need use the field indicator in the data to pick out the apropriate informat.

peter

ps

any regex wizard want to get in touch, I'd be interested in a related issue

FriedEgg
SAS Employee

That was a very fun problem.  Very difficult for me to come up with the right regular expression.

data test;

length f $32 field1-field4 $30;

if _n_=1 then prx=prxparse('/(\{\d:.+?(?=(\{\d\:)|(\Z)))/');

retain prx;

input@;

start=1;

stop=length(_infile_);

call prxnext(prx,start,stop,_infile_,p,l);

do while(p>0);

  f=substr(_infile_,p+1,l-2);

  select(scan(f,1,':'));

   when(1)field1=scan(f,2,':');

   when(2)field2=scan(f,2,':');

   when(3)field3=scan(f,2,':');

   when(4)field4=scan(f,2,':');

   otherwise;

  end;

  call prxnext(prx,start,stop,_infile_,p,l);

end;

output;

keep field1-field4;

cards;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1:DUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

;

run;

Tom
Super User Tom
Super User

Not that hard if you use truncover and read initially as strings and then add a little logic to remove the trailing characters and convert to numeric if wanted.

data x ;

  infile cards truncover ;

  input @1 @'1:' field1 $4.

        @1 @'2:' field2c $12.

        @1 @'3:' field3 $200.

        @1 @'4:' field4c $12.

  ;

  field2 = input(scan(field2c,1,'}'),12.);

  field4 = input(scan(field4c,1,'}'),12.);

  if index(field3,':') then field3=substr(field3,1,index(field3,':')-4);

  put (field1-field4) (=/);

cards;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1:DUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

run;

field1=ABCD

field2=10234

field3=

field4=40.45

field1=TRHG

field2=38976

field3={H132}

field4=567.43

field1=LKMN

field2=23109

field3=

field4=10432.67

field1=HGKO

field2=19876

field3={J987}{F102}

field4=4320.12

field1=DUHN

field2=23098

field3={G123}{H987}{J234}

field4=345.23

You might need more logic if you have records with field 3 without a trailing field 4 to eliminate the extra close curly bracket.

You also might want to split the field 3 values into multiple records.

data x ;

  infile cards truncover ;

  input @1 @'1:' field1 $4.

        @1 @'2:' field2c $12.

        @1 @'3:' field3x $200.

        @1 @'4:' field4c $12.

  ;

  field2 = input(scan(field2c,1,'}'),12.);

  field4 = input(scan(field4c,1,'}'),12.);

  if index(field3x,':') then field3x=substr(field3x,1,index(field3x,':')-4);

  length field3 $4 ;

  put (field1 field2 field4) (=/);

  do i=1 by 1 until (field3=' ');

    field3=scan(field3x,i,'{}');

    if i=1 or field3 ne ' ' then do;

      output;

      put field3= ;

    end;

  end;

cards;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1:DUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

run;

field1=ABCD

field2=10234

field4=40.45

field3=

field1=TRHG

field2=38976

field4=567.43

field3=H132

field1=LKMN

field2=23109

field4=10432.67

field3=

field1=HGKO

field2=19876

field4=4320.12

field3=J987

field3=F102

field1=DUHN

field2=23098

field4=345.23

field3=G123

field3=H987

field3=J234

Caz
Calcite | Level 5 Caz
Calcite | Level 5

Thank you very much Tom.

This is the easiest option for me to follow and naturally follows what I was exploring.

I have today taken this idea forward to the much bigger, nastier file that I have and it is working reasonably well.

What is the significance of the @1 on each of the input rows? Is that forcing it to stay on the one input row at a time?

Tom
Super User Tom
Super User

The @1 will move the pointer back to column 1.  This is to prevent if from missing fields because the pointer has moved past them. For example in the line without any {3:....} field the pointer would have moved to the end of the line so the {4:...} field would not be found.

If your file has mulitple copies of the same field (for example {1:XXXX}{1:YYYY} then you would need more complicated logic to read that.

Ksharp
Super User

Then The fourth input method(named input) would be  helpful.

data _null_ ;
file x ;
retain max .;
input;
*get magic number;
max=max(max,input(scan(_infile_,-3,':{}','M'),best8.));
call symputx('max',max);
_infile_=prxchange('s/\}$//o',1,strip(_infile_));
_infile_=prxchange('s/(\{|\}\{)(?=\d+:)/ field/o',-1,_infile_);
_infile_=prxchange('s/:/=/o',-1,_infile_); 
put _infile_;
datalines;
{1:ABCD}{2:10234}{4:40.45}
{1:TRHG}{2:38976}{3:{H132}}{4:567.43}
{1:LKMN}{2:23109}{4:10432.67}
{1:HGKO}{2:19876}{3:{J987} {F102}}{4:4320.12}
{1:DUHN}{2:23098}{3:{G123} {H987}{J234}}{4:345.23}
run;
%put &max;

data x;
 infile x ;
 input (field1 - field&max.) (= $100.);
run;





Ksharp

Message was edited by: xia keshan

MikeZdeb
Rhodochrosite | Level 12

hi ... similar to Tom's idea ...

data x (drop=temp:);

infile datalines truncover;

input

@1 @'{1:' field1 $4.

@1 @'{2:' field2 $5.

@1 @'{3:' temp3 $100.

@1 @'{4:' temp4 $10.

;

if ^missing(temp3) then field3 = substr(temp3,1,find(temp3,'}}'));

field4 = input(compress(temp4,'.','kd'),10.);

datalines;

{1:ABCD}{2:10234}{4:40.45}

{1:TRHG}{2:38976}{3:{H132}}{4:567.43}

{1:LKMN}{2:23109}{4:10432.67}

{1:HGKO}{2:19876}{3:{J987}{F102}}{4:4320.12}

{1:DUHN}{2:23098}{3:{G123}{H987}{J234}}{4:345.23}

;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1302 views
  • 5 likes
  • 6 in conversation