Solved: Re: Parsing input file to create two output files

serge68 · Posted 01-27-2023 09:42 AM

I'm parsing a file and trying associate the entries there into one of two categories, complex or simple. Then ultimately save those off in separate output files.

My test data/script is as follows:

 data have;                                      
     infile cards;                               
     input msg $ 01-80;                          
     cards;                                      
-  23025  110400    LSCHD,LIST=SCHD,JOB=HAWKEYE  
FRED     NDAY=250                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=HOTLIPS  
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=KLINGER  
WILMA     DAY=213                                
FRED      DAY=051                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=BJ       
BARNEY    DAY=213                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=CHARLES 
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025 
;                                               
run;                                            
                                                
data want;                                      
  set have;                                     
    if msg =: '-' then do;                      
       var1 = (substr(msg, 41,08));             
       keep var1;                               
    end;                                        
    else if msg =: 'FRED'    or                 
            msg =: 'WILMA'   or                 
            msg =: 'BARNEY'  then do;           
                file complex;                   
                put @1 var1;                    
            end;                                
    else do;                                    
            file simple;    
            put @1 var1;  
         end;             
                          
    ;

The lines starting with a dash(-) I use to get var1. If a subsequent line starts with a specific value(fred, Wilma, barney), then var1 gets classified as complex. Otherwise var1 gets classified as simple.

Although it indicated that records are written to both files, they are blank records.

NOTE: 6 records were written to the file COMPLEX.  
NOTE: 5 records were written to the file SIMPLE.

My Complex file should consist of:

HAWKEYE

KLINGER

BJ

While the Simple file should consist of :

HOTLIPS

CHARLES

Appreciate the assistance.

ballardw · Posted 01-27-2023 11:03 AM

Your rules don't really clearly describe what output should come from

SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=KLINGER  
WILMA     DAY=213                                
FRED      DAY=051                                
FRED      DAY=051

You have both WILMA and FRED without reading a new var1. Your rules did not state "first only following" or "last" or some other rule as to exactly which of these triggers the output.

This matches your desired output for the given example text:

 data complex (keep=Var1) simple (keep=var1);                                      
     infile cards truncover; 
     length var1 word $ 15; 
     retain var1;
     input @;                          
     if _infile_ =:'-' then do;
        input @'JOB=' var1;
     end;
     else if not missing(var1) then do;
        word=(scan(_infile_,1));
        if word in ('FRED' 'WILMA' 'BARNEY') then do;
           output complex;
           call missing(var1);
        end;
        else do;
           output simple;
           call missing(var1);
        end;

     end;
     cards;                                      
-  23025  110400    LSCHD,LIST=SCHD,JOB=HAWKEYE  
FRED     NDAY=250                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=HOTLIPS  
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=KLINGER  
WILMA     DAY=213                                
FRED      DAY=051                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=BJ       
BARNEY    DAY=213                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=CHARLES 
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025 
;                                               
run;

The first Infile statement with the @ is there to populate the automatic variable _infile_ which has the current input line.

If you have not seen it before the =: is "begins with" and the @'text string' on an input statement says "go to the position where the string is and start reading. So there isn't a need to hard code in column count where JOB= occurs.

This assumes that you want the first value like Fred, Wilma or Barney only to output. Setting the Var1 value to missing after it is written once and then testing to see if a value of the variable is available to write is one way to control how many tests are needed.

To send output to two different sets you need the names on the Data statement and then an explicit Output <data set name> only writes to that one set.

More complex values may need to modify the input statements as your example data all consists of one word. More words will require additional code.

View solution in original post

Kurt_Bremser · Posted 01-27-2023 09:52 AM

The way I see it, none of your strings is longer than 40 bytes, so your SUBSTR starting at positiion 41 will only fetch blanks.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

serge68 · Posted 01-27-2023 10:25 AM

Not sure if its how I represented the data here, but the substr does work. Running this finds the entries I'm after there -

data want;                          
  set have;                         
    if msg =: '-' then do;          
       var1 = (substr(msg, 41,08)); 
       keep var1;                   
    end;                            
                                    
    ;                               
                                    
 proc print data=want;

returns this -

Obs    var1      
                 
  1    HAWKEYE   
  2              
  3              
  4    HOTLIPS   
  5              
  6    KLINGER   
  7              
  8              
  9              
 10              
 11    BJ        
 12              
 13              
 14              
 15    CHARLES   
 16

PaigeMiller · Posted 01-27-2023 09:54 AM

It really helps if you LOOK AT your data to see what is happening. (Maxim 3, Know your Data)

For all records where NOT msg=:'-' (these are the ones that will potentially be written out, the records that begin with a dash never get to the rest of the code), var1 is always blank.

--
Paige Miller

serge68 · Posted 01-27-2023 10:28 AM

Right, and its probably just my ignorance, but I'm not understanding why that is.

PaigeMiller · Posted 01-27-2023 10:32 AM

@serge68 wrote:

Right, and its probably just my ignorance, but I'm not understanding why that is.

The simple answer is that you wrote code which produces blank VAR1. All records which begin with '-' get var1 computed, records that do not begin with '-' will not have a VAR1 computed and only the records that do not begin with '-' will be sent to the output files.

How to fix it? This is untested, I don't know if it gets the desired results, but you can test it ... add a RETAIN statement so that the value in VAR1 is carried forward to the next record. First few lines:

data want;
    retain var1;
    set have;

--
Paige Miller

serge68 · Posted 01-27-2023 11:14 AM

Thanks, the retain is helping.

My complex file looks good.

Though the simple file gets all var1 entries added to it(both those that are simple & complex). Struggling with how to not have the complex entries included there.

Tom · Posted 01-27-2023 10:34 AM

Is the source the TEXT in your first data step? Or do you only have the actual DATASET as the source?

If it is TEXT then this looks like a simple data step to read. Just use a bare INPUT statement so you can check if the line starts with a hyphen.

data want;
  infile text truncover ;
  input @;
  if _infile_ =: '-' then do;
* statements to handle the lines that start with hyphen;
  end;
  else do;
* statements to handle the other lines ;
  end;
run;

ballardw · Posted 01-27-2023 11:03 AM