DATA Step, Macro, Functions and more

Problem with variable length greater 32767

Reply
Contributor
Posts: 25

Problem with variable length greater 32767

Hi together,

i got a big problem.

As the output of a Security-Scan a got a flatfile. The variables are delimited with a semicolon.

But the variable "state" contains all the ports of a host... the variable is longer than 32767 characters... how can i read all the characters? a split into more variables would be ok but i dont know how to get that.

  1. DATA
    WORK.outG1;

    LENGTH

        host             $ 200

        state            $ 32767

        note             $ 200

        os               $ 200;

    DROP

        F5

        F6 ;

    LABEL

        state            = "port/status"

        note             = "notiz";

    FORMAT

        host             $CHAR200.

        state            $CHAR32767.

        note             $CHAR200.

        os               $CHAR200. ;

    INFORMAT

        host             $CHAR200.

        state            $CHAR32767.

        note             $CHAR200.

        os               $CHAR200.;

    INFILE'C:\Users\sy053.LAN.002\AppData\Local\Temp\SEG3112\outG-7619b19402784883a2ffb82139d30210.txt'

        LRECL=700000

        ENCODING="WLATIN1"

        TERMSTR=CRLF

        DLM='7F'x

   MISSOVER

        DSD;

    INPUT

        host             : $CHAR200.

        state            : $CHAR32767.

        note             : $CHAR200.

        os               : $CHAR200.

        F5               : $1.

        F6               : $1.;

RUN;

Super Contributor
Posts: 334

Re: Problem with variable length greater 32767

There is probably a better way but you can combine input types in a data step (ie delimited and formated).

The maximum number of new variables may be a issue. If lrecl is close to correct then there might be an extra 20 plus 30,000 length variables that need to be defined.

I can test this right now but try replacing in the input statement (and probably the length statement) state with state1-state25 $30000. without a colon qualifier.

Interesting issue ... I hope others have ideas as well!

EJ

Super User
Posts: 5,441

Re: Problem with variable length greater 32767

To deal with dynamic no of variables, transpose the data as you read it.

Have the the scan function (maybe in combination other functions depending on layout of the state variable) read each portno from the input buffer (_INPUT_), and then do an output.

Host, note and os will be repeated on several records - if that's ok depends on what you intend to do with file after the import.

Data never sleeps
New Contributor
Posts: 4

Re: Problem with variable length greater 32767

Try reading the file using infile. Something like:

data work.outgl;

     length ...

     format ...

     informat...

     infile <file> options;

     *here define a try like block*

        input _infile_;

  

           *then look at the length of the read line*

               if length(_infile_) > 32700 then do;

                      <You might have to store the max length of the _infile_ into a macro var here and use it after str1>

                    str1 = input(substr(_infile_,1,32700), $32767.);

                    str2 = input(substr(_infile_,32701,<maxlength>),$32767.);

              end;

              else;

                         str1 = input(_infile_,$32767.);

              end;

          .............

          ..........

   run;

of course, the code is a rough draft and not complete. But I was just trying to give you an idea.... Smiley Happy

PROC Star
Posts: 1,760

Re: Problem with variable length greater 32767

I am afraid you may have to read the file (or at least part of it) as a byte stream to go beyond the 32k lrecl limitation.

This is done by setting the RECFM option appropriately.

filename TEST 'f:\temp\test.txt' recfm=n;

data _null_; *create one very long record;

file TEST;

length A $500;

do CHAR=33 to 120;

  A=repeat(byte(CHAR),500);

  put A @;

end;

A=byte(121);

put A;

run;

data OUT;

infile TEST;

length A $256;

do until(find(A,byte(121)));

   input A $256. @; *read very long record 256 bytes at a time, until end reached;

   output;

end;

run;

Super User
Posts: 10,046

Re: Problem with variable length greater 32767

Post a sample file is a better way to explain your question.

Ksharp

Contributor
Posts: 25

Re: Problem with variable length greater 32767

Thanks too all of your ideas,

Achilles gave me a idea, i will try i soon.

Her is a sample, the Portsstring can be up to about 200k:

Host: 10.131.113.131 (SAP00931.lan.****.de) Ports: 80/open/tcp//http//Microsoft IIS httpd 6.0/, 1042/open/tcp//msrpc//Microsoft Windows RPC/, 2555/open/tcp/////, 2580/open/tcp//tributary?///, 3389/open/tcp//ms-wbt-server//Microsoft Terminal Service/, 4999/open/tcp//hfcs-manager?/// Ignored State: closed (8997) OS: Microsoft Windows Server 2003 SP1 or SP2

EDIT:

I tried now the following:

  %macro count;

     DATA test;

    INFILE 'W:\Prz-IT63-LINUX-SCHWACHSTELLENANALYSE\#IT63kd\DEA\Scan01\sv_10.131.112.x-142.x_ex_AIX-AS400\outG';

           input _infile_;

           do count=1 to 20;

                if(length(_infile_) > 32767 then do;

                str&count = input(substr(_infile_,(count*32767,((count+1)*32767),$32767.);

           end;

           else;

                str21 = input(_infile_$32767.);

           end;

     run;

%mend;

%count;

But it gaves me several errors

ERROR: The _INFILE_ variable cannot be referenced by
the INPUT statement.

WARNING: Apparent symbolic reference COUNT not resolved.

ERROR 388-185: Expecting an arithmetic operator.

ERROR 180-322: Statement is not valid or it is used
out of proper order.

ERROR 76-322: Syntax error, statement will be ignored.

ERROR 160-185: No matching IF-THEN clause.

ERROR 78-322: Expecting a ','.

ERROR
161-185: No matching DO/SELECT statement.


Why is the COUNT Variable not resolved?

PROC Star
Posts: 1,760

Re: Problem with variable length greater 32767

Because you never defined the COUNT macro variable that you use in str&count.

Also, then do; doesn't have a closing end;

You ought to be more careful and properly check your code before posting here and asking questions to people who will use their time and skill to help you.

Anyway, the logic you are trying to use is flawed as _infile_ cannot be longer than lrecl, i.e. 32k max.

Again, the only way to read such a long record afaik is to stream it.

Contributor
Posts: 25

Re: Problem with variable length greater 32767

Your Code above overwrite my File and filled it with something like !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!""""""""""""""""""""""""""""""""""""""""""""""""§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§ and so on...
glad i had a backup.

i submitted a ticket to SAS Support now.

PROC Star
Posts: 1,760

Re: Problem with variable length greater 32767

Did you read the comment *create one very long record; ?

Why did you recreate the sample data?

Did you even try to understand what the code does?

Contributor
Posts: 25

Re: Problem with variable length greater 32767

i tried, but i understand about äh... nothing. Sorry

Super User
Posts: 10,046

Re: Problem with variable length greater 32767

You did not define macro variable &count.

data _null_;
file 'c:\temp\x.txt' lrecl=200000;
input x $char1. @@;     
retain max;
if x=':' then do;x='=';     n=0;end;
 else if x=',' then do;
                      n+1; max=max(max,n) ;
                           name= cats('p',n,'=');
                           put +(-1) ' ' name @;
                           call symputx('max',max);
                           return;
                         end;
put +(-1) x @;
cards4;
Host: 10.131.113.131 (SAP00931.lan.****.de) Ports: 80/open/tcp//http//Microsoft IIS httpd 6.0/, 1042/open/tcp//msrpc//Microsoft Windows RPC/, 2555/open/tcp/////, 2580/open/tcp//tributary?///, 3389/open/tcp//ms-wbt-server//Microsoft Terminal Service/, 4999/open/tcp//hfcs-manager?/// Ignored State: closed (8997) OS: Microsoft Windows Server 2003 SP1 or SP2
;;;;
run;

data have;
 infile 'c:\temp\x.txt'  lrecl=200000;
 input (host ports p1 - p&max State os) (= $2000.)  ;
run;

Ksharp

PROC Star
Posts: 1,760

Re: Problem with variable length greater 32767

lrecl=200000 ? Wow, that's great!

When was the 32k limit lifted?

The online doc for lrecl in 9.3 states:

Range:1–32767

Super User
Posts: 10,046

Re: Problem with variable length greater 32767

NO. actually according to SAS9.2 documentation(dictionary), it would be almost 2G ,

Sorry.Chris

using a file may do a trick.

PROC Star
Posts: 1,760

Re: Problem with variable length greater 32767

Mmm, weird.

The documentation for the LRECL system option states 32kb max, but the doc for the LRECL filename option states 1Gb max for windows and unix.

I missed that change. I wonder why the discrepancy between the 2 lengths. That discrepancy sure threw me off in any case.

No need for streaming then, all good.

Sorry for misleading you DJDaniel.

Thanks for the update KSharp.

Mmm weird again: It looks like sas did an half-baked enhancement. lrecl is longer but then variable _infile_ causes errors.

The following code should be valid but throws an error. What's going on sas?

data T (compress=yes);

  infile TEST lrecl=500000 pad missover termstr=CRLF dlm=';' dsd;

  input;

L=length(_infile_);

run;

yields:

ERROR: The LRECL / LINESIZE for infile TEST exceeds the maximum allowable length for an _INFILE_ or _INFILE_= variable (32,767).

The DATA STEP will not be executed.

so all is not so clear it seems.

Ask a Question
Discussion stats
  • 27 replies
  • 3506 views
  • 5 likes
  • 6 in conversation