I'm trying to parse an input dataset, to output only certain lines and can't get that working.
My input files looks like this –
Top
.
.
.
/LOGON
Line 1
Line 2
Etc
/LOGOFF
.
.
.
Bottom
I want to write the lines found between the /LOGON and /LOGOFF out to a dataset.
I've been playing around with variations of this, but its not giving me what I'm after –
DATA COMMAND;
INFILE FLAGFILE;
INPUT @2 CMD $50.;
IF SUBSTR(CMD,2,6) = '/LOGON' then
do while (SUBSTR(CMD,2,7) ¬= '/LOGOFF');
put CMD;
end;
Thanks….
Then just change: if index(a_line, ...) = 2 instead if ... > 0
Does this help:
data want;
input @;
length cmd $ 50.;
if index (_infile_,"/LOGON") > 0 then input;
else if index (_infile_,"/LOGOFF")>0 then input;
else do;
cmd=_infile_;
output;
end;
datalines;
/LOGON
Line 1
Line 2
Etc
/LOGOFF
/LOGON
second Line 1
second Line 2
second Etc
second etc 2
/LOGOFF
;
run;
The problem needs a little clarification. Are there lines before the first /LOGON that you need to ignore? Are there lines after the final /LOGOFF that you also need to ignore? Could there be multiple pairs of logons and logoffs, with garbage in between?
The program might be as simple as:
data command;
input flagfile truncover;
input @2 cmd $50.;
if cmd in ('/LOGON', '/LOGOFF') then delete;
run;
Anticipating the answers to some of the questions, this would be a flexible way to approach the problem:
data command;
infile flagfile truncover;
retain status 'logged out';
input @2 cmd $50.;
if cmd='/LOGON' then status='logged in';
else if cmd='/LOGOFF' then status='logged out';
else if status='logged in' then output;
drop status;
run;
It's not clear whether a function (substr, index) needs to be applied when searching for /LOGON or /LOGOFF ... depends on what is actually in your data lines.
Based on your last explanation, here is a shortcut that would improve the speed:
data command;
infile flagfile truncover;
retain status 'logged out';
input @2 cmd $50.;
if cmd='/LOGON' then status='logged in';
else if cmd='/LOGOFF' then stop;
else if status='logged in' then output;
drop status;
run;
Again, this relies on cmd being exactly "/LOGON" or "/LOGOFF". If there are other characters on the line (including leading blanks), another function might have to be applied.
You have two types of lies: lines to skip and lines to output.
in such case I'll do:
data out;
retain phase 0;
infile ... truncover;
input a_line $80.;
if index(a_line, '/LOGON' ) > 0 then do; phase=1; input a_line; end; /* skipping the /LOGON line */
if index(a_line, '/LOGOFF' ) > 0 then phase = 0; /* skipping the /LOGOFF line and the follows */
if phase=1 then output;
drop phase;
run;
Please check does it fits your request.
@ballardw – not quite. This includes data before and after the ?LOGON & ?LOGOFF that I want to drop.
@Astounding – yes there are lines before ?LOGON that I need to ignore. And lines after /LOGOFF I need to ignore. There is a single pair og LOGON & LOGOFF, and it’s the data between them which is valid.
@Shmuel - this drops all lines preceding and including the /Logon, it out puts the data I want. However it then outputs all data after the /LOGOFF which I need to drop.
Thanks all.
when /LOGOFF is found the PHASE is assigned to 0, so no output will be done on next lines
except if a new /LOGON line encountered.
Hi Schmuel - You are correct, the issue is I have a second LOGON & LOGON, though not in columns 2-8. Thats why I had attempted the SUBSTR earlier, as its only the LOGON/LOGOFF in cols 2-8 that I'm concerned with.
Thanks.
Then just change: if index(a_line, ...) = 2 instead if ... > 0
If there may be only one pair of /LOGIN - /LOGOFF then
you better do:
if index(a_line, '/LOGOFF') = 2 then stop;
No need to continue reading the file.
Needed to go =1 vs =2, but that gives me what I'm after.
Appreciate the assistance.
Thanks
You need a RETAINed variable:
DATA COMMAND;
retain keep;
INFILE datalines;
INPUT CMD $50.;
if left(_infile_) = "/LOGON" then keep=1;
else if left(_infile_) = "/LOGOFF" then keep=0;
else if keep then output;
drop keep;
datalines;
Before
/LOGON
Line 1
Line 2
Etc
/LOGOFF
After
;
proc print; run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.