Parsing a file

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Parsing a file

I'm trying to parse an input dataset, to output only certain lines and can't get that working.

 

My input files looks like this –

 

Top

.

.

.

/LOGON

Line 1

Line 2

Etc

/LOGOFF

.

.

.

Bottom

 

I want to write the lines found between the /LOGON and /LOGOFF out to a dataset.

 

I've been playing around with variations of this, but its not giving me what I'm after –

 

 

DATA COMMAND;                                  
     INFILE FLAGFILE;                          
     INPUT @2  CMD     $50.;                   
                                               
  IF SUBSTR(CMD,2,6) = '/LOGON' then           
    do while (SUBSTR(CMD,2,7) ¬= '/LOGOFF');   
       put CMD;                                
    end;                                       

 

Thanks….


Accepted Solutions
Solution
‎08-25-2016 04:15 PM
Trusted Advisor
Posts: 1,475

Re: Parsing a file

Then just change: if index(a_line, ...) = 2 instead if ... > 0

View solution in original post


All Replies
Super User
Posts: 11,134

Re: Parsing a file

Does this help:

 

data want;
   input @;
   length cmd $ 50.;
   if index (_infile_,"/LOGON") > 0 then input;
   else if index (_infile_,"/LOGOFF")>0 then input;
   else do;
      cmd=_infile_;
      output;
   end;
datalines;
/LOGON
Line 1
Line 2
Etc
/LOGOFF
/LOGON
second Line 1
second Line 2
second Etc
second etc 2
/LOGOFF
;
run;
Super User
Posts: 5,366

Re: Parsing a file

The problem needs a little clarification.  Are there lines before the first /LOGON that you need to ignore?  Are there lines after the final /LOGOFF that you also need to ignore?  Could there be multiple pairs of logons and logoffs, with garbage in between?

 

The program might be as simple as:

 

data command;

input flagfile truncover;

input @2 cmd $50.;

if cmd in ('/LOGON', '/LOGOFF') then delete;

run;

Super User
Posts: 5,366

Re: Parsing a file

Anticipating the answers to some of the questions, this would be a flexible way to approach the problem:

 

data command;

infile flagfile truncover;

retain status 'logged out';

input @2  cmd $50.;

if cmd='/LOGON' then status='logged in';

else if cmd='/LOGOFF' then status='logged out';

else if status='logged in' then output;

drop status;

run;

 

It's not clear whether a function (substr, index) needs to be applied when searching for /LOGON or /LOGOFF ... depends on what is actually in your data lines.

Super User
Posts: 5,366

Re: Parsing a file

Based on your last explanation, here is a shortcut that would improve the speed:

 

data command;

infile flagfile truncover;

retain status 'logged out';

input @2  cmd $50.;

if cmd='/LOGON' then status='logged in';

else if cmd='/LOGOFF' then stop;

else if status='logged in' then output;

drop status;

run;

 

Again, this relies on cmd being exactly "/LOGON" or "/LOGOFF".  If there are other characters on the line (including leading blanks), another function might have to be applied.

Trusted Advisor
Posts: 1,475

Re: Parsing a file

You have two types of lies: lines to skip and lines to output.

in such case I'll do:

 

data out;

  retain phase 0;

  infile ... truncover;

  input a_line $80.;

  if index(a_line, '/LOGON'  ) > 0 then do; phase=1;  input a_line; end;   /* skipping the /LOGON line */

  if index(a_line, '/LOGOFF' ) > 0 then phase = 0;                                      /* skipping the /LOGOFF line and the follows */

  if phase=1 then output;

  drop phase;

run;

 

Please check does it fits your request.

        

Occasional Contributor
Posts: 15

Re: Parsing a file

@ballardw – not quite. This includes data before and after the ?LOGON & ?LOGOFF that I want to drop.

 

@Astounding – yes there are lines before ?LOGON that I need to ignore. And lines after /LOGOFF I need to ignore. There is a single pair og LOGON & LOGOFF, and it’s the data between them which is valid.

 

@Shmuel  - this drops all lines preceding and including the /Logon, it out puts the data I want. However it then outputs all data after the /LOGOFF which I need to drop.

 

 

Thanks all.

Trusted Advisor
Posts: 1,475

Re: Parsing a file

when /LOGOFF is found the PHASE is assigned to 0, so no output will be done on next lines

except if a new /LOGON line encountered.

Occasional Contributor
Posts: 15

Re: Parsing a file

Hi Schmuel - You are correct, the issue is I have a second LOGON & LOGON, though not in columns 2-8. Thats why I had attempted the SUBSTR earlier, as its only the LOGON/LOGOFF in cols 2-8 that I'm concerned with.

 

 

Thanks.

Solution
‎08-25-2016 04:15 PM
Trusted Advisor
Posts: 1,475

Re: Parsing a file

Then just change: if index(a_line, ...) = 2 instead if ... > 0

Trusted Advisor
Posts: 1,475

Re: Parsing a file

If there may be only one pair of /LOGIN - /LOGOFF then

you better do:

   if index(a_line, '/LOGOFF') = 2 then stop;

No need to continue reading the file.

Occasional Contributor
Posts: 15

Re: Parsing a file

Needed to go =1 vs =2, but that gives me what I'm after.

 

Appreciate the assistance.

 

 

Thanks

Respected Advisor
Posts: 4,819

Re: Parsing a file

You need a RETAINed variable:

 

DATA COMMAND;                                  
retain keep;
INFILE datalines;                          
INPUT CMD $50.; 
if left(_infile_) = "/LOGON" then keep=1;
else if left(_infile_) = "/LOGOFF" then keep=0;
else if keep then output;
drop keep;
datalines;
Before
/LOGON
Line 1
Line 2
Etc
/LOGOFF
After
;

proc print; run;
PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 507 views
  • 0 likes
  • 5 in conversation