Re: Conditional Infile

JoshB · Posted 08-06-2020 02:18 PM

I have a tab delimited flat file that periodically contains structure shifts due to limitations of the software it is exported from. Example lines are below. The last line of the example data shows the problem. When data is missing (outside of the "header" columns), there is not a carriage return to go the next line. The next line just begins where the previous left off.

Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	27-JAN-2020	1	12-OCT-2019 / 2	 8.000	A	SMITH, JOHN P (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	29-JAN-2020	2	05-MAY-2018 / 2	 8.000	A	SMITH, JOHN R (999999)	EMPLOYEE RELATIONS [3]	PROCESSING [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	21-JAN-2020	3	13-JUN-2009 / 6	 8.000	A	SMITH, JOHN J (999999)	MECHANICAL [3]	MECHANICAL [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	20-JAN-2020	4	07-DEC-2006 / 2	 8.000	A	SMITH, JOHN O (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	19-JAN-2020	5	24-MAR-2001 / 8	 8.000	A	SMITH, JOHN C (999999)	PROCESSING [3]	PROCESSING [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	30-JAN-2020	6	03-NOV-2002 / 4	 8.000	A	SMITH, JOHN G (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	10-FEB-2020	Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	11-FEB-2020	1	22-NOV-2001 / 8	 8.000	A	SMITH, JOHN S (999999)	PACKING [2]	PACKING [2]

I would like to be able to handle these situations within the infile read, but I have not been able to consistently / sustainably capture the issue. I am hoping there is a way detect that the current infile line is longer than expected, and thus "move" SAS to the next observation, e.g. the final SAS table should look like..

Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	27-JAN-2020	1	12-OCT-2019 / 2	 8.000	A	SMITH, JOHN P (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	29-JAN-2020	2	05-MAY-2018 / 2	 8.000	A	SMITH, JOHN R (999999)	EMPLOYEE RELATIONS [3]	PROCESSING [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	21-JAN-2020	3	13-JUN-2009 / 6	 8.000	A	SMITH, JOHN J (999999)	MECHANICAL [3]	MECHANICAL [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	20-JAN-2020	4	07-DEC-2006 / 2	 8.000	A	SMITH, JOHN O (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	19-JAN-2020	5	24-MAR-2001 / 8	 8.000	A	SMITH, JOHN C (999999)	PROCESSING [3]	PROCESSING [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	30-JAN-2020	6	03-NOV-2002 / 4	 8.000	A	SMITH, JOHN G (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	10-FEB-2020	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	11-FEB-2020	1	22-NOV-2001 / 8	 8.000	A	SMITH, JOHN S (999999)	PACKING [2]	PACKING [2]

I've tried combinations of missover / truncover / flowover, as well as _infile_ variable, but I can't seem to get the pointer to move along to the next SAS observation while still within 1 row of the input file.

ballardw · Posted 08-06-2020 03:19 PM

Can you describe any rules based on the values in the data that allow you to identify when this has happened? Something like a value, or pattern of values in an unexpected column. It is very hard to test for something that is not there, such as a "missing carriage return"

It is possible to examine a line of data. Consider this code:

data _null_;
   infile <options>;
   input @;
   put _infile_;
run;

The Input statement always creates a SAS automatic variable called _infile_. So you can examine the contents of that variable just like any long string variable. The above code just writes the line to the log so you could test with a small text file.

It is a moderately common problem with data that originates as a report.

For example I have written code that parsed lines until a key value like "Clinic: ", which tells me I am at a new start of a report section, and then looking for specific other words to read selected rows for values.

You can use code like:

If _infile_ =: "Keyvalue" then input @23 Var1 @55 var2;

as an example, which checks to see if the line starts with the key value then read two specific variables starting at specified column positions.

So, rules?

JoshB · Posted 08-06-2020 03:46 PM

@ballardw the only 'rule' here would be that the 8th encountered value would be "Name" instead of 1 or 2 digit number.

If I'm understanding your reply correctly, I'd need to first check if the current _infile_ record had a string "Name" or not in the 8th tab delimited column. However, I'm still not sure how I would tell SAS, if "Name" then keep the previous 7 read-in values as one observation, and continue reading the current _infile_ but outputting the results to a new observation.

Shmuel · Posted 08-06-2020 04:18 PM

Next tested code deals with any shifting, not only when var8='Name'.

All variables saved as char type. you may need add a step to assign your names with appropriate length and format:

data temp;
  infile cards dlm='09'x truncover;
  input a_line $250.;
cards;
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	27-JAN-2020	1	12-OCT-2019 / 2	 8.000	A	SMITH, JOHN P (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	29-JAN-2020	2	05-MAY-2018 / 2	 8.000	A	SMITH, JOHN R (999999)	EMPLOYEE RELATIONS [3]	PROCESSING [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	21-JAN-2020	3	13-JUN-2009 / 6	 8.000	A	SMITH, JOHN J (999999)	MECHANICAL [3]	MECHANICAL [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	20-JAN-2020	4	07-DEC-2006 / 2	 8.000	A	SMITH, JOHN O (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	19-JAN-2020	5	24-MAR-2001 / 8	 8.000	A	SMITH, JOHN C (999999)	PROCESSING [3]	PROCESSING [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	30-JAN-2020	6	03-NOV-2002 / 4	 8.000	A	SMITH, JOHN G (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	10-FEB-2020	Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	11-FEB-2020	1	22-NOV-2001 / 8	 8.000	A	SMITH, JOHN S (999999)	PACKING [2]	PACKING [2]	
; run;

%let nvars = 14;
data tmp2;
set temp;
    length var1-var&nvars varx $30;
    array vx {*} $ var1-var&nvars;
    do i=1 to dim(vx);
       varx = scan(a_line,i,'09'x);
       if i>1 and varx = 'Name' then do;  /* issue line */
          output; 
          do j=1 to dim(vx); 
               vx(j) = scan(a_line,i,'09'x);
               i+1; 
          end;
          output;
       end;
       else do; /* normal line */
          vx(i) = varx;
          if i = dim(vx) then output;
       end;
    end;
    drop a_line varx i j;
run;

Shmuel · Posted 08-07-2020 05:54 AM

You may prefer next code (more simple then my previous solution):

data temp;
  infile cards dlm='09'x truncover;
  input a_line $250.;
cards;
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	27-JAN-2020	1	12-OCT-2019 / 2	 8.000	A	SMITH, JOHN P (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	29-JAN-2020	2	05-MAY-2018 / 2	 8.000	A	SMITH, JOHN R (999999)	EMPLOYEE RELATIONS [3]	PROCESSING [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	21-JAN-2020	3	13-JUN-2009 / 6	 8.000	A	SMITH, JOHN J (999999)	MECHANICAL [3]	MECHANICAL [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	20-JAN-2020	4	07-DEC-2006 / 2	 8.000	A	SMITH, JOHN O (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	19-JAN-2020	5	24-MAR-2001 / 8	 8.000	A	SMITH, JOHN C (999999)	PROCESSING [3]	PROCESSING [3]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	30-JAN-2020	6	03-NOV-2002 / 4	 8.000	A	SMITH, JOHN G (999999)	MECHANICAL [2]	MECHANICAL [2]	
Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	10-FEB-2020	Name	Code	Hr	Bid Department	Called-In Abs Dept	Seniority	11-FEB-2020	1	22-NOV-2001 / 8	 8.000	A	SMITH, JOHN S (999999)	PACKING [2]	PACKING [2]	
; run;

data tmp2 (keep=var1-var14);
     length line_out $200  var1-var14 $30;
     array vx {*} $ var1-var14;
 set temp;
     split_pos = index(substr(a_line,5),"Name"); 
     if split_pos = 0 then do;
        line_out = strip(a_line);
        link out;
     end;
     else do;
        line_out = substr(a_line,1,split_pos+3);
        link out;
        line_out = substr(a_line,split_pos+4);
        link out;
     end;
return;
out:
   do i=1 to 14;
      vx(i) = scan(line_out,i,'09'x);
      if missing(vx(i)) then leave;
   end;
   output tmp2;
return;
run;

Conditional Infile