I have a tab delimited flat file that periodically contains structure shifts due to limitations of the software it is exported from. Example lines are below. The last line of the example data shows the problem. When data is missing (outside of the "header" columns), there is not a carriage return to go the next line. The next line just begins where the previous left off.
Name Code Hr Bid Department Called-In Abs Dept Seniority 27-JAN-2020 1 12-OCT-2019 / 2 8.000 A SMITH, JOHN P (999999) MECHANICAL [2] MECHANICAL [2] Name Code Hr Bid Department Called-In Abs Dept Seniority 29-JAN-2020 2 05-MAY-2018 / 2 8.000 A SMITH, JOHN R (999999) EMPLOYEE RELATIONS [3] PROCESSING [3] Name Code Hr Bid Department Called-In Abs Dept Seniority 21-JAN-2020 3 13-JUN-2009 / 6 8.000 A SMITH, JOHN J (999999) MECHANICAL [3] MECHANICAL [3] Name Code Hr Bid Department Called-In Abs Dept Seniority 20-JAN-2020 4 07-DEC-2006 / 2 8.000 A SMITH, JOHN O (999999) MECHANICAL [2] MECHANICAL [2] Name Code Hr Bid Department Called-In Abs Dept Seniority 19-JAN-2020 5 24-MAR-2001 / 8 8.000 A SMITH, JOHN C (999999) PROCESSING [3] PROCESSING [3] Name Code Hr Bid Department Called-In Abs Dept Seniority 30-JAN-2020 6 03-NOV-2002 / 4 8.000 A SMITH, JOHN G (999999) MECHANICAL [2] MECHANICAL [2] Name Code Hr Bid Department Called-In Abs Dept Seniority 10-FEB-2020 Name Code Hr Bid Department Called-In Abs Dept Seniority 11-FEB-2020 1 22-NOV-2001 / 8 8.000 A SMITH, JOHN S (999999) PACKING [2] PACKING [2]
I would like to be able to handle these situations within the infile read, but I have not been able to consistently / sustainably capture the issue. I am hoping there is a way detect that the current infile line is longer than expected, and thus "move" SAS to the next observation, e.g. the final SAS table should look like..
Name Code Hr Bid Department Called-In Abs Dept Seniority 27-JAN-2020 1 12-OCT-2019 / 2 8.000 A SMITH, JOHN P (999999) MECHANICAL [2] MECHANICAL [2] Name Code Hr Bid Department Called-In Abs Dept Seniority 29-JAN-2020 2 05-MAY-2018 / 2 8.000 A SMITH, JOHN R (999999) EMPLOYEE RELATIONS [3] PROCESSING [3] Name Code Hr Bid Department Called-In Abs Dept Seniority 21-JAN-2020 3 13-JUN-2009 / 6 8.000 A SMITH, JOHN J (999999) MECHANICAL [3] MECHANICAL [3] Name Code Hr Bid Department Called-In Abs Dept Seniority 20-JAN-2020 4 07-DEC-2006 / 2 8.000 A SMITH, JOHN O (999999) MECHANICAL [2] MECHANICAL [2] Name Code Hr Bid Department Called-In Abs Dept Seniority 19-JAN-2020 5 24-MAR-2001 / 8 8.000 A SMITH, JOHN C (999999) PROCESSING [3] PROCESSING [3] Name Code Hr Bid Department Called-In Abs Dept Seniority 30-JAN-2020 6 03-NOV-2002 / 4 8.000 A SMITH, JOHN G (999999) MECHANICAL [2] MECHANICAL [2] Name Code Hr Bid Department Called-In Abs Dept Seniority 10-FEB-2020 Name Code Hr Bid Department Called-In Abs Dept Seniority 11-FEB-2020 1 22-NOV-2001 / 8 8.000 A SMITH, JOHN S (999999) PACKING [2] PACKING [2]
I've tried combinations of missover / truncover / flowover, as well as _infile_ variable, but I can't seem to get the pointer to move along to the next SAS observation while still within 1 row of the input file.
Can you describe any rules based on the values in the data that allow you to identify when this has happened? Something like a value, or pattern of values in an unexpected column. It is very hard to test for something that is not there, such as a "missing carriage return"
It is possible to examine a line of data. Consider this code:
data _null_; infile <options>; input @; put _infile_; run;
The Input statement always creates a SAS automatic variable called _infile_. So you can examine the contents of that variable just like any long string variable. The above code just writes the line to the log so you could test with a small text file.
It is a moderately common problem with data that originates as a report.
For example I have written code that parsed lines until a key value like "Clinic: ", which tells me I am at a new start of a report section, and then looking for specific other words to read selected rows for values.
You can use code like:
If _infile_ =: "Keyvalue" then input @23 Var1 @55 var2;
as an example, which checks to see if the line starts with the key value then read two specific variables starting at specified column positions.
So, rules?
@ballardw the only 'rule' here would be that the 8th encountered value would be "Name" instead of 1 or 2 digit number.
If I'm understanding your reply correctly, I'd need to first check if the current _infile_ record had a string "Name" or not in the 8th tab delimited column. However, I'm still not sure how I would tell SAS, if "Name" then keep the previous 7 read-in values as one observation, and continue reading the current _infile_ but outputting the results to a new observation.
Next tested code deals with any shifting, not only when var8='Name'.
All variables saved as char type. you may need add a step to assign your names with appropriate length and format:
data temp;
infile cards dlm='09'x truncover;
input a_line $250.;
cards;
Name Code Hr Bid Department Called-In Abs Dept Seniority 27-JAN-2020 1 12-OCT-2019 / 2 8.000 A SMITH, JOHN P (999999) MECHANICAL [2] MECHANICAL [2]
Name Code Hr Bid Department Called-In Abs Dept Seniority 29-JAN-2020 2 05-MAY-2018 / 2 8.000 A SMITH, JOHN R (999999) EMPLOYEE RELATIONS [3] PROCESSING [3]
Name Code Hr Bid Department Called-In Abs Dept Seniority 21-JAN-2020 3 13-JUN-2009 / 6 8.000 A SMITH, JOHN J (999999) MECHANICAL [3] MECHANICAL [3]
Name Code Hr Bid Department Called-In Abs Dept Seniority 20-JAN-2020 4 07-DEC-2006 / 2 8.000 A SMITH, JOHN O (999999) MECHANICAL [2] MECHANICAL [2]
Name Code Hr Bid Department Called-In Abs Dept Seniority 19-JAN-2020 5 24-MAR-2001 / 8 8.000 A SMITH, JOHN C (999999) PROCESSING [3] PROCESSING [3]
Name Code Hr Bid Department Called-In Abs Dept Seniority 30-JAN-2020 6 03-NOV-2002 / 4 8.000 A SMITH, JOHN G (999999) MECHANICAL [2] MECHANICAL [2]
Name Code Hr Bid Department Called-In Abs Dept Seniority 10-FEB-2020 Name Code Hr Bid Department Called-In Abs Dept Seniority 11-FEB-2020 1 22-NOV-2001 / 8 8.000 A SMITH, JOHN S (999999) PACKING [2] PACKING [2]
; run;
%let nvars = 14;
data tmp2;
set temp;
length var1-var&nvars varx $30;
array vx {*} $ var1-var&nvars;
do i=1 to dim(vx);
varx = scan(a_line,i,'09'x);
if i>1 and varx = 'Name' then do; /* issue line */
output;
do j=1 to dim(vx);
vx(j) = scan(a_line,i,'09'x);
i+1;
end;
output;
end;
else do; /* normal line */
vx(i) = varx;
if i = dim(vx) then output;
end;
end;
drop a_line varx i j;
run;
You may prefer next code (more simple then my previous solution):
data temp;
infile cards dlm='09'x truncover;
input a_line $250.;
cards;
Name Code Hr Bid Department Called-In Abs Dept Seniority 27-JAN-2020 1 12-OCT-2019 / 2 8.000 A SMITH, JOHN P (999999) MECHANICAL [2] MECHANICAL [2]
Name Code Hr Bid Department Called-In Abs Dept Seniority 29-JAN-2020 2 05-MAY-2018 / 2 8.000 A SMITH, JOHN R (999999) EMPLOYEE RELATIONS [3] PROCESSING [3]
Name Code Hr Bid Department Called-In Abs Dept Seniority 21-JAN-2020 3 13-JUN-2009 / 6 8.000 A SMITH, JOHN J (999999) MECHANICAL [3] MECHANICAL [3]
Name Code Hr Bid Department Called-In Abs Dept Seniority 20-JAN-2020 4 07-DEC-2006 / 2 8.000 A SMITH, JOHN O (999999) MECHANICAL [2] MECHANICAL [2]
Name Code Hr Bid Department Called-In Abs Dept Seniority 19-JAN-2020 5 24-MAR-2001 / 8 8.000 A SMITH, JOHN C (999999) PROCESSING [3] PROCESSING [3]
Name Code Hr Bid Department Called-In Abs Dept Seniority 30-JAN-2020 6 03-NOV-2002 / 4 8.000 A SMITH, JOHN G (999999) MECHANICAL [2] MECHANICAL [2]
Name Code Hr Bid Department Called-In Abs Dept Seniority 10-FEB-2020 Name Code Hr Bid Department Called-In Abs Dept Seniority 11-FEB-2020 1 22-NOV-2001 / 8 8.000 A SMITH, JOHN S (999999) PACKING [2] PACKING [2]
; run;
data tmp2 (keep=var1-var14);
length line_out $200 var1-var14 $30;
array vx {*} $ var1-var14;
set temp;
split_pos = index(substr(a_line,5),"Name");
if split_pos = 0 then do;
line_out = strip(a_line);
link out;
end;
else do;
line_out = substr(a_line,1,split_pos+3);
link out;
line_out = substr(a_line,split_pos+4);
link out;
end;
return;
out:
do i=1 to 14;
vx(i) = scan(line_out,i,'09'x);
if missing(vx(i)) then leave;
end;
output tmp2;
return;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.