BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
serge68
Calcite | Level 5

I'm parsing a file and trying associate the entries there into one of two categories, complex or simple. Then ultimately save those off in separate output files.

 

My test data/script is as follows:

 data have;                                      
     infile cards;                               
     input msg $ 01-80;                          
     cards;                                      
-  23025  110400    LSCHD,LIST=SCHD,JOB=HAWKEYE  
FRED     NDAY=250                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=HOTLIPS  
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=KLINGER  
WILMA     DAY=213                                
FRED      DAY=051                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=BJ       
BARNEY    DAY=213                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=CHARLES 
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025 
;                                               
run;                                            
                                                
data want;                                      
  set have;                                     
    if msg =: '-' then do;                      
       var1 = (substr(msg, 41,08));             
       keep var1;                               
    end;                                        
    else if msg =: 'FRED'    or                 
            msg =: 'WILMA'   or                 
            msg =: 'BARNEY'  then do;           
                file complex;                   
                put @1 var1;                    
            end;                                
    else do;                                    
            file simple;    
            put @1 var1;  
         end;             
                          
    ;                                         

The lines starting with a dash(-) I use to get var1. If a subsequent line starts with a specific value(fred, Wilma, barney), then var1 gets classified as complex. Otherwise var1 gets classified as simple.

 

Although it indicated that records are written to both files, they are blank records.

NOTE: 6 records were written to the file COMPLEX.  
NOTE: 5 records were written to the file SIMPLE.   

My Complex file should consist of:

HAWKEYE

KLINGER

BJ

 

While the Simple file should consist of :

HOTLIPS

CHARLES

 

Appreciate the assistance.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Your rules don't really clearly describe what output should come from

SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=KLINGER  
WILMA     DAY=213                                
FRED      DAY=051                                
FRED      DAY=051      

You have both WILMA and FRED without reading a new var1. Your rules did not state "first only following" or "last" or some other rule as to exactly which of these triggers the output.

 

This matches your desired output for the given example text:

 data complex (keep=Var1) simple (keep=var1);                                      
     infile cards truncover; 
     length var1 word $ 15; 
     retain var1;
     input @;                          
     if _infile_ =:'-' then do;
        input @'JOB=' var1;
     end;
     else if not missing(var1) then do;
        word=(scan(_infile_,1));
        if word in ('FRED' 'WILMA' 'BARNEY') then do;
           output complex;
           call missing(var1);
        end;
        else do;
           output simple;
           call missing(var1);
        end;

     end;
     cards;                                      
-  23025  110400    LSCHD,LIST=SCHD,JOB=HAWKEYE  
FRED     NDAY=250                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=HOTLIPS  
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=KLINGER  
WILMA     DAY=213                                
FRED      DAY=051                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=BJ       
BARNEY    DAY=213                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=CHARLES 
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025 
;                                               
run;        

The first Infile statement with the @ is there to populate the automatic variable _infile_ which has the current input line.

If you have not seen it before the =: is "begins with" and the @'text string' on an input statement says  "go to the position where the string is and start reading. So there isn't a need to hard code in column count where JOB= occurs.

This assumes that you want the first value like Fred, Wilma or Barney only to output. Setting the Var1 value to missing after it is written once and then testing to see if a value of the variable is available to write is one way to control how many tests are needed.

To send output to two different sets you need the names on the Data statement and then an explicit Output <data set name> only writes to that one set.

 

More complex values may need to modify  the input statements as your example data all consists of one word. More words will require additional code.

View solution in original post

9 REPLIES 9
serge68
Calcite | Level 5

Not sure if its how I represented the data here, but the substr does work. Running this finds the entries I'm after there - 

 

data want;                          
  set have;                         
    if msg =: '-' then do;          
       var1 = (substr(msg, 41,08)); 
       keep var1;                   
    end;                            
                                    
    ;                               
                                    
 proc print data=want;              

returns this - 

Obs    var1      
                 
  1    HAWKEYE   
  2              
  3              
  4    HOTLIPS   
  5              
  6    KLINGER   
  7              
  8              
  9              
 10              
 11    BJ        
 12              
 13              
 14              
 15    CHARLES   
 16              
PaigeMiller
Diamond | Level 26

It really helps if you LOOK AT your data to see what is happening. (Maxim 3, Know your Data)

 

For all records where NOT msg=:'-' (these are the ones that will potentially be written out, the records that begin with a dash never get to the rest of the code), var1 is always blank.

 

PaigeMiller_0-1674831250100.png

 

--
Paige Miller
serge68
Calcite | Level 5

Right, and its probably just my ignorance, but I'm not understanding why that is. 

PaigeMiller
Diamond | Level 26

@serge68 wrote:

Right, and its probably just my ignorance, but I'm not understanding why that is. 


The simple answer is that you wrote code which produces blank VAR1. All records which begin with '-' get var1 computed, records that do not begin with '-' will not have a VAR1 computed and only the records that do not begin with '-' will be sent to the output files.

 

How to fix it? This is untested, I don't know if it gets the desired results, but you can test it ... add a RETAIN statement so that the value in VAR1 is carried forward to the next record. First few lines:

 

data want;
    retain var1;
    set have;

  

--
Paige Miller
serge68
Calcite | Level 5

Thanks, the retain is helping. 

 

My complex file looks good.

 

Though the simple file gets all var1 entries added to it(both those that are simple & complex). Struggling with how to not have the complex entries included there. 

Tom
Super User Tom
Super User

Is the source the TEXT in your first data step? Or do you only have the actual DATASET as the source?

 

If it is TEXT then this looks like a simple data step to read.  Just use a bare INPUT statement so you can check if the line starts with a hyphen.

data want;
  infile text truncover ;
  input @;
  if _infile_ =: '-' then do;
* statements to handle the lines that start with hyphen;
  end;
  else do;
* statements to handle the other lines ;
  end;
run;
ballardw
Super User

Your rules don't really clearly describe what output should come from

SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=KLINGER  
WILMA     DAY=213                                
FRED      DAY=051                                
FRED      DAY=051      

You have both WILMA and FRED without reading a new var1. Your rules did not state "first only following" or "last" or some other rule as to exactly which of these triggers the output.

 

This matches your desired output for the given example text:

 data complex (keep=Var1) simple (keep=var1);                                      
     infile cards truncover; 
     length var1 word $ 15; 
     retain var1;
     input @;                          
     if _infile_ =:'-' then do;
        input @'JOB=' var1;
     end;
     else if not missing(var1) then do;
        word=(scan(_infile_,1));
        if word in ('FRED' 'WILMA' 'BARNEY') then do;
           output complex;
           call missing(var1);
        end;
        else do;
           output simple;
           call missing(var1);
        end;

     end;
     cards;                                      
-  23025  110400    LSCHD,LIST=SCHD,JOB=HAWKEYE  
FRED     NDAY=250                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=HOTLIPS  
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=KLINGER  
WILMA     DAY=213                                
FRED      DAY=051                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=BJ       
BARNEY    DAY=213                                
FRED      DAY=051                                
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025  
-  23025  110400    LSCHD,LIST=SCHD,JOB=CHARLES 
SLIG-00 REQUEST COMPLETED AT 11:04:01 ON 23.025 
;                                               
run;        

The first Infile statement with the @ is there to populate the automatic variable _infile_ which has the current input line.

If you have not seen it before the =: is "begins with" and the @'text string' on an input statement says  "go to the position where the string is and start reading. So there isn't a need to hard code in column count where JOB= occurs.

This assumes that you want the first value like Fred, Wilma or Barney only to output. Setting the Var1 value to missing after it is written once and then testing to see if a value of the variable is available to write is one way to control how many tests are needed.

To send output to two different sets you need the names on the Data statement and then an explicit Output <data set name> only writes to that one set.

 

More complex values may need to modify  the input statements as your example data all consists of one word. More words will require additional code.

serge68
Calcite | Level 5

Awesome. Thanks for this. It looks like what I'm after. Will continue to test here.  

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2969 views
  • 0 likes
  • 5 in conversation